Internet 101.101.101.101

There's a lot of confusion out there. Many people equate the World Wide Web with the Internet. Others think that e-mail is the internet. Still others think that the internet is one big library. It's really none of these things at all. The internet doesn't care what the purpose of the data is at all. The internet doesn't care if the data flows over copper wires, optical fibers, or invisibly via radio waves. It has nothing to do with the CONTENT of the material that is transmitted from computer to computer. It doesn't care if you have a Mac, a PC, or a UNIX-based computer. Well, what then is the internet? Simply put, the internet is really just a set of rules that we all use to communicate. We call these rules "protocols."

We're quite used to these kinds of rules. We use them all the time. We follow traffic laws because they make traffic flow smoothly (at least when the roads have enough capacity for the cars...) In official meetings we use "Robert's Rules of Order" to ensure a meeting that progresses in an orderly manner. When we send paper mail, we have a set of rules - the envelope must have "To" and "From" addresses that include recipient, street names and numbers, city, state, and zip codes. The envelope must have a stamp. Diplomats have a set of protocols they follow which standardize the expectations and behaviors of people from vastly different backgrounds and cultures.

The internet is really a set of protocols that everyone agrees on and follows to facilitate the transfer of information from one point to another, and sometimes to multiple points. Just as it has taken civilization thousands of years to evolve the current sets of laws that govern us, the internet protocols have also taken years to evolve. "Evolve" is the operational word here. Just as civil laws are created, change, evolve, and sometimes become outdated, so to do the internet protocols. Another important aspect of the internet is that all of the protocols are "open." By open, we mean that they are available to all. There is nothing secret or propietary about how information is sent on the internet. All of the protocols are contained within a series of documents called "Request for Comments" or RFC's. This may sound informal, but the RFC's are the internet bible. They are submitted by a formal process such that they are heavily discussed, criticized, and modified until agreed upon. No company has control over these RFC's and they are all public domain. The RFC's tell us how to send information on the internet and, as mentioned above, they are always evolving with new RFC's updating or replacing older RFC's.

A Little History

In response to the launch of Sputnik in 1958, the Advanced Research Projects Agency (ARPA) was founded in the U.S. In the mid 1960's, ARPA was asked to supply very expensive computers to many scientists all over the country. ARPA thought that they could service more scientists with fewer of these pricey machines if it could connect them together into a communications network. Much of the theoretical work had been done, but no one had ever hooked computers together in a network. In 1969 the first network was created and by December of that year computers at four institutions (UCLA, UC Santa Barbara, the Stanford Research Institute, and the University of Utah) were talking to each other and thus was the ARPANET born.

Before the internet could become the rage it is today, a robust, standardized, and open set of communication protocols needed to come into existance. The original ARPANET could only accomodate 64 computers. Clearly this would have to change. In the mid-1970's to the early-1980's a set of protocols were hashed out that provided for reliable messaging between millions of computers. This was (as is) the TCP/IP protocol.

TCP/IP Anybody who has hooked their computer up to the internet has encountered this mysterious acronym and it's associated set of strange numbers. We've all had to enter sets of digits like 192.168.100.123 into some dialog box to become part of the internet community. Just what do they mean and what is this TCP/IP thing anyway?

First we need to take a simplified look at how data flows across the internet. All data on the internet is broken up into small packages or "packets" of information.

One of the first tasks is to determine how to get a packet from where you are to where you want it to go. Just as when you send a paper mail message, you put your letter in an envelope and then put the destination address on the outside (along with a return address in case of trouble!), all packets of data have added to them a "header" of information about the data they contain. This information includes the type of data, the length of the data, the source and destination of the packets and a lot more. Most internet communication consists of messages that require many packets, therefore you need to number the packets to make sure they all arrive at the far end and that they get put together in the correct order. You also need some way of checking that the data gets through without corruption to the destination. Data is included in the header for this too. But I'm getting ahead of myself.

To break up the tasks we've outlined above, the folks who invented TCP/IP defined a layered set of tasks, each of which are handled independently of the other. A quick look at the layers will illustrate the wisdom of this approach.

Application Layer

The top layer of tasks are the applications you use every day, or at least the methods they use to send data. These might include email, the web, file transfer applications and others. Each of these applications speak a standardized and open protocol. The web uses HTTP (HyperText Transfer Protocol - RFC 1945) to communicate. Email uses POP (Post Office Protocol - RFC 1725), IMAP (Internet Message Access Protocol - RFC 1730), and SMTP (Simple Mail Transfer Protocol - RFC 821). You've probably moved files around using FTP (appropriately, File Transfer Protocol - RFC 959). Given that these are all applications, they should not be responsible for dicing up data into packets or deciding how these packets get to their destination. All they need to know is how to give their information to the data-dicer and the data-transporter. You begin to see the beauty of the layered approach. The application doesn't care about the stuff below, and the methods in the lower layers can be changed and updated without affecting the applications.

Transport Layer (TCP)

The next layer down in the stack is the transport layer. Here is where TCP does its job (yes, part of TCP/IP and RFC 793!) TCP stands for Transmission Control Protocol. TCP is the layer that is responsible for slicing the data into packets, and making sure that the packets not only get to the destination, but get there with no errors. To do this, lots of communication goes on between the machine sending the data and the machine receiving the data. The sending machine sends a quick packet to the receiving machine and waits ro get an acknowledgment back. It then starts to send the packets. Each packet of information contains not only the user's data, but a header of about 20 bytes that contain information about the application that created the data (the "port number"), a "sequence number" that gives the order of the packets in a series, and other data that help the receiving end determine if the data is received intact. The receiving machine sends back confirmation to the sender for every received packet. It can also ask the sending machine to resend a corrupted packet. If the sending machine doesn't hear back from the far end that a packet was received, it assumes that the information was lost along the way and the packet is retransmitted. As you can see, this is an amazingly robust protocol.

Note that the TCP layer does not know anything about HOW to transmit the data from the sending machine to the receiving machine. All it knows is that it can hand off the data to the next layer, and that this next layer knows all about transmitting and receiving data and will handle the details.

Internet Protocol (IP)

This next layer down we can call the "Internet" layer. Here, something called "Internet Protocol" or IP takes over. Yes, this is the "IP" part of TCP/IP (RFC 791). This is the layer that knows about all those long numerical addresses. Some interesting stuff happens here too. Remember that this layer gets a packet of data that has had a TCP header attached to the front of it. It also gets data about the source and destination address. These are the addresses that we have all encountered. They are called "IP addresses." They are always 4 numbers between 0 and 255 seperated by periods. An example might be the Exploratorium's email server which has the numbers 192.174.2.1 as its IP address. Every machine connected to the internet must have a unique IP address, just as you must have a unique phone number.

I somewhat over simplified things above. You cannot actually use the numbers 0 and 255 as these are reserved for other special tasks, but this still leaves lots of numbers to use. So, there are about 4 billion possible numbers we can use for machines. This sounds like plenty, but actually we are running out. Many devices are being connected to the internet besides computers. Soon your refrigerator may be "plugged in" so that it can order groceries that are running low. Your house security system will certainly be able to communicate to you and the police and fire department. If you put your mind to it, I'm sure that you can come up with many devices in your home that will need their own IP addresses in the near future. Think of the possibilities when your TV and CD player is plugged in. A new version of IP (IPv6 - RFC 1883) is being worked on right now that will increase the number of available addresses to 34,000,000,000,000,000,000,000,000,000,000,000,000. Probably enough to last a little while longer... This would be enough for every square meter on the face of the earth to have 166 sextillion addresses (1.66x10^23 addresses)!

Just as the TCP layer above, the IP layer adds about another 20 bytes of header info on to the packet passed to it. This additional info contains the source IP address, the destination IP address, the size of the total packet, some error correction information and a few other goodies. Now we have two headers on our packet of data.

Now, it's difficult for humans to remember all those dotted-quads of numbers, so the clever inventors of protocols came up with the Domain Name Service (DNS - RFC 1035). DNS assigns more memorable names to those numbers. For instance, the Exploratorium's mail server mentioned above, 192.174.2.1, has been assigned the name isaac.exploratorium.edu. Note the three parts of the name. The "edu" is called a "top level domain". Other top levels (as of June 2000) are com, net, org, mil, and a set of international designations (us, for the United States, uk for the United Kingdom, fr for France, and so on.) We don't get to choose these, they are determined by international agreement. We get to choose (and sometimes fight over...) the next level. For our museum, we registered "exploratorium" in the edu, com, net, and org domains. The last part, isaac, is the name of a specific machine within the exploratorium domain. The Domain Name System is designed to map these domain names to the numerical IP addresses. It is, by the way, one of those level 1 applications that make use of TCP/IP.

Having an address doesn't get the message delivered though. That's the responsibility of a set of specialized computers (all with their own IP addresses) that hook all of us together. Here at the Exploratorium, we have 3 internal networks, two of which need to talk to the outside world. For this purpose we have one of those specialized computers, called a "router," whose sole purpose in life is to take data packets destined for places outside our networks, and move them from the source computer to the destination computer. These routers use another whole set of protocols to talk with each other. These routers are able to determine the best way to get your packet from one place to another, passing the data from one router to another until the destination is reached. For me this is real magic. Somehow, the routers figure out the most efficient way to send your data and then do it without any worry on our part. As a matter of fact, some of your data may go by one route and the other part of our data may go by a completely different route. Remember that it's TCP's job to make sure that the individual packet all get there in one piece and to re-assemble them for the receiving application. If that's not a miracle, I don't know what is! Using some clever utilities, you can see the routes the data takes. To go from my home computer to the Exploratorium's web server, here's what we get:

Hop   Name  IP Address
1      192.168.1.1 (192.168.1.1 )
2      adsl-63-192-210-254.dsl.snfc21.ppbi.net. (63.192.210.254 )
3      core3-g2-0.snfc21.pbi.net. (206.171.134.130)
4      edge1-ge2-0.snfc21.pbi.net. (209.232.130.71 )
5      edge2-g1-0.snfc21.pbi.net. (209.232.130.76 )
6      pbnap2.above.net. (198.32.128.64 )
7      main-core1-3.sjc.above.net. (209.249.0.206 )
8      www.exploratorium.edu. (207.126.113.30 )

My machine (192.168.1.1) send data to a router at Pac Bell's DSL facility which passes it along to another 3 routers within the Pac Bell network. The data finally gets to the interface router between Pac Bell and AboveNet (the ISP for our web server) and through one more router at AboveNet before getting to our web server machine. A total of 8 "hops." Like I said, a miracle (or rather some VERY smart people!)

Network Access Layer

The last and lowest layer, the network access layer, is relatively simple. It consists mainly of software that speaks directly to the hardware that places the signal on the physical network. This software knows how to communicate with the different types of interface cards and network systems such as Ethernet, Token Ring, ATM, PPP, SLIP, X.25, and others. Each of these systems has a different method of putting the signal on a wire, optical cable, or via wireless radio wave. Again, it's important to isolate this layer from the internet layer above. That way the IP protocol does not need to know (or care) what type of physical network the data will be flowing over. As a matter of fact, the data may flow over several different systems on its way! Even within the walls of the Exploratorium, the data from our computers start out as ethernet, are converted to ATM over optical cable, back to ethernet to our router and then... I think you get the point.

Fortunately, you do not have to know anything about the real inner workings of TCP/IP for it to work for you. But knowing a little bit about what's under the hood sure makes one appreciate what it's doing for you!

Links:

Hobb's Internet Timelinehttp://www.isoc.org/guest/zakon/Internet/History/HIT.html
History of the Internet Siteshttp://www.isoc.org/internet/history/
The Internet Engineering Task Force (IETF)http://www.ietf.org/
Internet Assigned Numbers Authority (IANA)http://www.iana.org/
Internet Architecture Board (IAB) http://www.iab.org/iab/
Internet RFC Archivehttp://www.faqs.org/rfcs/index.html
Acronym Finderhttp://www.acronymfinder.com