This is a research paper I did for a beginning coding class, so the information is a bit dated, but still mostly accurate.
It’s a cold, rainy Saturday morning. And after a long, sleepless night with a teething baby, you feel perfectly entitled to a plate full of white, refined sugar. But you know that’s wrong, so you do a very respectable online search for dessert recipes when somehow Google interprets “chocolate chip cookies” as “a nasty image I don’t want my baby’s six-year-old-sister-who-of-course-is-peering-over-my-shoulder to see.”
But big sister is still asleep, so you soldier on until you find a promising recipe on what appears to be a family-friendly site: CookieRecipes2Die4.com. You click on the Google link, and your computer sends a request to the cookie website through your Internet Service Provider (ISP) for a delicious recipe page written in Hypertext Markup Language, or HTML. This language, custom-made for applications called “web browsers,” is necessary because computer applications don’t speak any English at all.
The request for your cookie-recipe HTML page goes out in what is called an “IP packet,” a little bundle of data, somewhat like a Christmas card to your mother-in-law. IP, or “Internet Protocol,” is a set of rules that govern transmissions through the Internet. Just as the US Postal Service maintains rules for addressing, sending, and paying for Christmas cards, the Internet also uses a set of standards for its transmissions.
The IP packet contains a header, which is a bit like your mother-in-law’s address. Routers, special computers that connect to other computers via cable, wireless links, and satellite, inspect the packet to determine where it should go, just as a post office would.
But unlike postal employees, browsers can’t understand “CookieRecipes2Die4” because, as we’ve discussed, it’s in English, sort of. So yet another computer is needed to translate the bad English into the actual IP address for the server holding the cookie recipe. This IP address is a very computer-friendly series of numbers and periods, and it works just like a street address.
But let’s say your baby’s six-year-old sister made an adorable Christmas card for your mother-in-law, and you’re in a hurry because it’s December 23rd. You rush through your address and instead of writing “Grandma Jones, 5342 Jack Rabbit Road, Ann Arbor, MI 48105,” you write “Grandma Jones, Jack Rabbit Road, Ann Arbor, MI 48105.” Hopefully, the spirit of Christmas will right all wrongs, but probably, your card will come back to you with a “return to sender - address unknown” stamp. A port number in an IP address is just as vital as a street number. One domain may host several services, like both email and a website. To get a cookie recipe, you don’t want the domain’s email service, you want their website service. The port number tells the server which service you are requesting - just as in our example, the street number tells us which house on Jack Rabbit Road contains your mother in law.
Most of us can handle holiday greetings without any help, but we need our ISPs to translate domain names into IP addresses, which are unreasonable to memorize. ISPs do this with a special computer called a Domain Name Servers, or DNS. Of course, no computer can store all twenty-five trillion web addresses in existence. So pages under requested domain names are sorted by a nomenclature hierarchy. “Top level Domains,” or TLDs, are indicated by suffixes tagged onto the end of the name, like “.com” and “.edu.” The DNS that sorts through these suffixes is called a Root Server. It’s a rough sorting - like putting something addressed to Great Britain in the “air mail” slot.
The html request is then sent to “recursive resolvers,” other servers that keep sorting and re-distributing requests until a match is found on what’s called an “authoritative name server.” This is the server responsible for a specific domain name. It’s like the system theme parks use for lost children. Weeping children searching for their parents are directed to a “lost kids” building where employees take steps to locate a child’s “matching” supervisor.
Once the proper IP address is located, a connection is made to the Web server hosting the domain, and a request is issued for the chocolate-chip-cookie-recipe html page you found on Cookies2Die4.com. Faster than Tinker Bell can can turn broccoli into marshmallows, the html page is sent to your kitchen laptop.
But let’s say that after all that effort, Cookies2Die4 turns out not to be quite the family-friendly recipe site you thought it was. And let’s say that not-a-recipe-at-all html page appears just as your baby’s-six-year-old-sister is, of-course, peering over your shoulder. Then you have no choice but to panic, slam shut your laptop (a bad thing to do), and quickly change the subject. You may even decide to skip the cookies.
But not everybody wants their laptops slammed shut, so to speak. The relative newness of the internet has posed a variety of legal conundrums. One of the most well-known challenges occurred in 1988 when a federal district court in Virginia ruled that mandatory internet filtering at a public library violated the First Amendment. As a society, we have overtly decided that our freedom of speech should be guarded at all costs. But as individuals, freedom in the context of pornography and other social ills is often patently rejected. The burden of internet filtering is on whomever wants it.
So what is an internet filter? How does it work? Let’s take a look at the various ways to avoid the cookie-crumbling disaster described above.
Technological
Blacklisting
By Header
Routers
The request for a cookie recipe that isn’t actually a cookie recipe can be stopped as early in the process as the routers that receive your initial request. Routers can be configured to simply drop any packets whose IP headers designate an IP address on a specified blacklist. This is a broad stroke for a filter, blocking not only the “cookie recipe,” but also any other websites or email servers under a blacklisted domain.
A more refined approach is to block packets requesting blacklisted domains that are headed for a specific server port. That ensures that other services under a particular domain are still available.
DNS
Blacklisting can occur at the DNS phase as well. You simply configure an ISP’s DNS server to not accept any domain names on a blacklist. The user will simply get an error message.
By Content
Censoring headers, however, is a bit of a placebo. Nefarious websites multiply faster than enteroviruses, and keeping a complete, up-to-date blacklist for their domains is impossible. Happily, filters based on content are available at just about every juncture in the transmission path.
Routers
In content filtering, the entire IP packet is inspected, this time for keywords. This is not in a router’s job description, however, so extra equipment must be used, increasing the cost for the filter. Moreover, IP packets have a maximum size before they get spliced by surly Network Administrators who must minimize their IP packet SAR (Segmentation and Reassembly) times to keep web traffic flowing properly. So if the packet for your cookie recipe just rambles on and on, it will get spliced into smaller packets, forcing the router to read only one portion at a time. And what could happen is that the first portion of the packet is key-word safe, but the second portion is not, but by the time the filter figures this out, it’s too late, and the entire transmission has gone through. Or you could have a single key-word split up, and it could be the very word that would have been censored.
Content-filtered ISPs
These are ISPs that only offer limited access to the Internet based on content parameters. In this case, the ISP decides what it will or will not transmit to and from your computer.
Proxy Servers
Like ghost writers for politicians who lack the time and/or skills to write a pre-election memoir, proxy servers quietly step in between a web user and a web content provider to do the communicating for one or both of them. This is often used in the context of an internal network, like at a company. A school, for example, may wish to use a proxy server to negotiate students’ web transmissions. The server can stop and search all requests for any blacklisted keywords or headers. Complex rules prevent any transmission to the client until the packet is thoroughly inspected. Likewise, a proxy server can read transmissions from the outside and block any blacklisted content.
Client-Side Filters
This is software anybody can purchase or download for free on his/her computer. These applications are perfect for parents. It should be noted, however, that private companies that create these applications can choose to filter sites in any way they choose, according to their political and/or religious leanings, for example. A variety of popular client-side filters have been shown to block sites like the National Organization for Women, Quaker sites, Amnesty International, and The Heritage Foundation. There is no way for the government to check this power as it held in the private sector.
Browser Extensions
A browser extension is a program that enhances a browser’s abilities in some way. Extensions can be created through normal web coding which in the case of filters, are designed to block content that is blacklisted. AdBlock is an example of a browser extension filter.
E-Mail Filters
These applications search for banned content in an email’s body, header, sender, subject, and attachment.
Search-Engine Filters
These filters are available on search engines, like Google, but they must be activated. They only filter web searches, however. The URL for a known site can still be used if it is known.
Content Labeling
This is the “fox guarding the henhouse” method. In 1994, the Internet Content Rating Association (ICRA), developed an online questionnaire for web masters to describe the nature of a website’s content. This information was then packaged into the IP packet to be read by content-filtering software.
There are other tagging systems, but as long as they are voluntary, it’s hard to see how they could be terribly helpful. The problem isn’t the good guys - it’s the bad guys.
Denial of Service
In this not-so-civilized approach, a party who lacks the authority to filter can simply render Web sites inaccessible by overloading either the server or the network connection with an insane amount of requests. A very fast computer or a team of fast computers and a very fast connection are needed to accomplish this “filter,” better known as a Denial-of-Service (DoS) attack. Think of this as the “death-by-nagging” approach so often favored by 4th graders angling to watch Terminator movies.
Domain Deregistration
If you happen to be a ruthless dictator, this is the filter for you. Your country’s TLD, that dot-suffix tacked onto your web address, is most likely operated by you, your royal majesty! So anything subversive and/or disloyal can be deregistered right at the root servers, no problem.
Psychological
Surveillance
Monitoring an individual’s web surfing, and threatening said individual with legal action should he/she attempt to access prohibited content, is a psychological method of filtering. There might be a few scenarios where this method could be of some use, although the individual could only be held accountable on devices where surveillance has already been set up.
Public View
Positioning computers in a library or workplace so that anyone can see what’s on anyone else’s screen is a great way to keep people off Facebook.
Physical
Server Takedown
This is where the rubber hits the road: just unplug the server holding the undesired content. If that doesn’t work... well, that’s kind of creepy.
The delicate balance of freedom of speech and civilized communication is fraught with high emotion and anxiety on both sides. As with so many of mankind’s inventions, the power of the internet forces us to sort out our priorities. What are we willing to sacrifice for safety? Or for freedom? In the end, what and with whom we communicate has never been as fully in our control as we would like to believe. It is the speed and volume of web communication that forces us to confront these questions.
Web filters cannot sooth teething babies, bake cookies, or stop your daughter from occasionally seeing something she shouldn’t. But filters are certainly worth understanding, and in many cases, using. Total control may be impossible, but knowledge is always power.