google.com <?ENTER>... And then?
Photo by Leon Seibert on Unsplash

google.com <ENTER>... And then?

Ever wondered what happens when you type in your web browser any address and then press enter? Well, there's the fun 23 minute or so version, and the long version which I'll try to explain as concisely and clearly as possible. And it's fun. Not TV show fun, but fun nonetheless.

The 23 minute version I'm pretty much sure you've watched that already when you were a kid on Nickelodeon starring a 10 year-old kid with a pink cap with 2 magical godparents. No? Well Below an image as a refresher of the episode.

Aaaah, the good ol'? 2000s

Maybe this episode is a little bit too goofy in its representation of the Internet, but it has its similarities with the real thing. So let us begin the quest of a request. Get it?

0. How computers communicate with each other?

But first, a really brief introduction of how the Internet is layered, because Internet is just an abstraction of several things going on almost at the same time.

Maybe you know, maybe you don't, but the Internet is comprised by a lot of interconnected computers from all around the world and used by lots of different people: from government officials, to private companies, academics, and you, dear reader. The cloud as we call it is just an extension of this concept in the sense that lots of computers are interconnected to offer you a service. But this is beyond the point.

Some of those computers are used exclusively to serve content, while others as yours are used to consume content. Now you know why those computers serving content (from your regular website to a fully fledged web application) are called servers: their job is to serve content. By the way this is not an either or situation. Computers can serve and consume content at the same time, just a tip for your next geek conversation with your friends about these matters.

How these computers are connected? You may ask. Enter protocols. Because almost everything regarding computer networking is based on protocols. And sure there are. Lots of them. But today I'll write about three: TCP/IP, DNS, and HTTP (with its secure variant).

TCP/IP

Although all protocols are fundamental to the Internet experience it as we know it, the TCP/IP protocol is the most important of them all, because without them we wouldn't have Internet.

TCP/IP is actually two protocols working in tandem. TCP (Transmission Control Protocol) and IP (Internet Protocol).

TCP ensures three things: data reaches the destination, reaches it on time, and without duplication. For TCP to work it needs two ends: a sending end and a receiving end (a connection) before data transfer. And when a connection is established TCP breaks data down into packets before transmitting them over the IP.

IP transmits the data over the wire (cables and routers) based on the destination address. Yeah. An address like your home one, but here is comprised of 4 numbers (IPv4) ranging from 0 to 255 and separated by dots, like this one: 127.0.0.1 (this is the address of your computer if you want to access to it from that same computer, also known as localhost), or by eight groups of 4 hexadecimal digits separated by a semicolon (IPv6).

DNS

As you know, computers don't understand human language, and we, in turn have difficulty remembering web sites by their IP address. After all, it's more meaningful to us remembering google.com than 8.8.8.8, don't you think? Here comes DNS into play, which stands for Domain Name System, and its a protocol that translates host names we type in our search bar like google.com to their respective IP addresses like 8.8.8.8.?

More about the details of DNS later, because the voyage when a request is made (typing google.com and then Enter) begins with how DNS works.

HTTP/HTTPS and SSL

So how does your browser communicate with the server and how does the server know what do you want? Easy: via HTTP (HyperText Transfer Protocol). HTTP is synonymous with the Internet as we know it. And it's just that, text. Nothing strange or encrypted. Just plain text describing the type of request made, the status of the server and a bunch of other information. The protocol jointly with HTML (HyperText Markup Language) was invented in 1989 by Tim Berners-Lee and his team at the CERN.

The secure version, named HTTPS (HyperText Transfer Protocol Secure) uses TLS (Transport Layer Security) and formerly SSL (Secure Socket Layer) to encrypt the communication layer. The motivation of creating a secure version is because HTTP, being a plain text protocol, can be read by a malicious hacker intercepting valuable information such as login credentials or bank account passwords. Obviously, nobody wants that.

This secure version is so used nowadays that it's really difficult to hit a non HTTPS site on search engines. Secured sites with verified SSL certificates are ranked highly by search engines. Even your browser alerts you if you ever stumble across a non HTTPS website. So be alert. Don't send any valuable information to a non secure site: you could be a target of a man-in-the-middle attack.

The voyage

Now we're ready to begin the journey of a request. This journey will be separated in two whole parts: the DNS lookup, and the actual request and response to and from the web server. Here's a graphic of the two parts:

No hay texto alternativo para esta imagen

1. DNS Lookup

The following is a zoomed in view of the above picture showing all the process of a DNS lookup:

No hay texto alternativo para esta imagen

When you press enter on your browser after typing a website, it first checks if the website is cached (saved) in its history. If not, then the OS will do the same. If both don't have the site name archived, then the OS will call the resolver, and thus, the journey begins.

So your OS asks the resolver if it has in its records the IP address of the site name you entered, and if it isn't, then it will locate the root server and pass your request to it. The resolver is usually your ISP (Internet Service Provider) which is a server dedicated to these kind of requests.

So now the resolver reaches its nearest root server and asks your request. If your site requested is in its records, then it will pass it to the resolver. If not, then it will point the resolver to the TLD (Top-Level Domain) server. There are 13 root name servers located all over the world and they are operated by 12 independent organizations, each with a letter, named from a to m, and with a total of 1394 instances as of the date of this writing (September 5). The instances are likely to change, mostly increasing in number as time passes. On the following link you can view more information about the location of these servers.

Continuing the journey, then. If the root server doesn't have the info related to your request, then the resolver asks the respective TLD server the location of the server you requested. For example, the TLD of google.com is .com. So the resolver asks the .com TLD server the location of Google's server. The TLD then, if it doesn't have the IP requested, it will instead pass the repeater the authoritative name servers which are the servers that actually have the IP address of the server we're looking for.

The diverse TLD servers belong to the ICANN (Internet Corporation for Assigned Names and Numbers) which is an international nonprofit corporation that regulates these kinds of matters of the internet. The .com TLD was one of the first TLDs created, in 1985. There are different types of TLDs such as those created from country codes (2 letter ISO code), internationalized country codes (written in other alphabets different than the English language one), generic ones (.net, .org, .edu) and many other types. The TLDs always have the latest information of the authoritative name servers thanks to the Domain Registrars whose job is to save the info of the servers associated to a purchased domain name and communicating to the TLD registries these servers.

The following is the website of the ICANN if you want to explore more:

Now, after the TLD passes the authoritative name servers to the repeater, it will then again, and for the last time, request the IP address of google.com having as an answer then the IP address of google.com 8.8.8.8, or a list in no particular order if there's more than one server attached to the domain.

With this information, finally, the repeater responds to your OS request with the IP address you requested, your OS communicates in turn the IP to your browser and now, your browser will make another request, but now to the IP address 8.8.8.8.

2. Request and response

At last, you can connect to the destination web server, but depending on how the web site is created, you may be connecting to the "front door" of the infrastructure.

Enter load balancers which are either a physical device or a software installed in a server whose job is to balance the incoming traffic (hence the name) and distribute it to the various servers it may have connected to it. This is great because if, for any cause, a web server might be down, the website's performance won't be affected because the other servers will do their work just fine.

But web developers have to be conscious about the user's security, so it's very likely you'll be connecting with the HTTPS protocol (remember, the S stands for secure). This to mitigate sniffing attacks from malicious hackers.

So, your doorman (load balancer), depending on how he's instructed, lets you pass to a web server.

Talking about servers, they can be categorized depending on their job. For example, servers that are used to provide regular web content, like Wikipedia for example, are called web servers, whereas servers used to serve a specific application like, for example, your banking web app, those are called application servers.

And again, depending on how the website is constructed, if you need to, say, sign in, the website needs to have some sort of dynamic storage where it has a registry of your credentials to access its services. Like your Google account. That storage is called a database and any web application needs at least one. It can be stored in any server, better separated from the main application server (in case there's a failure of some sort).

So, then depending on the type of request you made (for example, click a link or fill a form), the website sends the corresponding information from its servers back to you, and that's how the Internet works in a general sense.

I hope this information is of value to you. If you liked it, consider sharing this article to someone who might benefit from it. Thanks for reading!!

Sources:

  • DNSimple. "How DNS works". Accessed August 22, 2021.


要查看或添加评论,请登录

Alfredo Delgado Moreno的更多文章

  • My first postmortem!

    My first postmortem!

    Ok, so this is my first postmortem. And I'm writing this after a very interesting case of a server outage after…

  • IoT - more acronyms??

    IoT - more acronyms??

    I know the pain. Almost every concept or technology in the Information Technology world (IT for short) has an acronym.

  • Recursion... the pain of every CS student (but it doesn't need to be)

    Recursion... the pain of every CS student (but it doesn't need to be)

    Hello everyone, and welcome to yet another blog post. This time, we'll take matters seriously and delve into recursion.

  • Python: Everything is an object

    Python: Everything is an object

    Hello everyone. And welcome to today's blog post where I'll be sharing my findings of Python with you.

  • Class and instance attributes... what's that?

    Class and instance attributes... what's that?

    Hello everyone to another blog post. This time I'll be explaining about what's a class and what's an instance attribute…

  • Dynamic libraries!!!

    Dynamic libraries!!!

    Hello everyone and welcome to yet another blog post. Today I'll be talking about dynamic libraries.

  • Journey to the Center of the Shell - or what happens when you type ls -l *.c on your shell

    Journey to the Center of the Shell - or what happens when you type ls -l *.c on your shell

    “It is only when you suffer that you truly understand.” Jules Verne - Journey to the Center of the Earth (1867) As the…

  • What about negative numbers in binary?

    What about negative numbers in binary?

    Hello everyone, and welcome to another blog entry. This time I'll talk about the representation of negative numbers in…

  • Morse code as a macro?!?

    Morse code as a macro?!?

    Hey everyone! How are you doing? Today I ran into an interesting piece of code written in C looking like this: #define…

  • Static libraries!!!

    Static libraries!!!

    Full Metal Jacket jokes aside. This topic is pretty important for us developers, so bear with me while I'll explain it…

社区洞察

其他会员也浏览了