What happens when you click www.google.com and press enter
The WEB infrastructure

What happens when you click www.google.com and press enter

No alt text provided for this image
The display for www.google.com

This might be a fascinating question that attracts the attention of anyone. Everybody uses the web every day, but do you bother to know what happens in the background when different activities on the web server are carried out? The Picture above shows the landing page of google, Specifically when you type www.google.com and press enter. So many activities are carried out in the background to ensure the display comes to you.

This article is meant to adumbrate the different things that happen in the background before you get the output above.

Before I begin, let me explain some important terms that will be relevant to this article;

Server: Another word for a computer or computer program can refer to either hardware or software. Servers provide functionality for other devices. In the context of this article, usage of the term “server(s)” will refer to the computer system(s) hosting?https://www.google.com. Servers are of many types, but let us cover the ones within the scope of this article.

  1. Load Balancer - Acts as an intermediary between a client and a server, accepting incoming traffic from the client and sending it to the server. Reasons for doing so include content control and filtering, improving traffic performance, preventing unauthorized network access or simply routing the traffic over a large and complex network.
  2. Web Server - Hosts web pages. A web server is what makes the World Wide Web possible. Each website has one or more web servers. Also, each server can host multiple websites.
  3. Application Server - Hosts web apps (computer programs that run inside a web browser) allowing users in the network to run and use them, without having to install a copy on their own computers. Unlike what the name might imply, these servers do not need to be part of the World Wide Web; any local network would do.
  4. Database Server - Maintains and shares any form of database (organized collections of data with predefined properties that may be displayed in a table) over a network.
  5. DNS Server - The domain name system (i.e., “DNS”) is responsible for translating domain names into a specific IP address so that the initiating client can load the requested Internet resources. The domain name system works much like a phone book where users can search for a requested person and retrieve their phone number.

Client: Also a computer or computer program, but one that can access services and functionalities hosted on a server. Most familiarly, clients are the personal devices — laptops, smartphones, etc. — that we use to access services through the internet, among other things. In the context of this article, usage of the term “client” will refer to the web browser.

No alt text provided for this image
Clients-Internet-Server Interface

Protocol: Or, more specifically, communication protocol — a general term for a system of rules, or methods, for transmitting data between two devices. The Open System Interconnections (OSI) model, the conceptual model used to describe telecommunications between computers, consists of a myriad of protocols.

Even if there are many protocols, I want to explain two of them TCP/IP (which stands for Transmission Control / Internet Protocol).

TCP/IP is a set of standardized rules that allow computers to communicate on a network such as an internet.

Firewall:?a network security device that monitors and filters incoming and outgoing network traffic based on an organization’s previously established security policies. At its most basic, a firewall is essentially the barrier that sits between a private internal network and the public Internet. A firewall’s main purpose is to allow non-threatening traffic in and to keep dangerous traffic out.

HTTPS/SSL:?Hypertext Transfer Protocol Secure is an extension of the Hypertext Transfer Protocol. It is used for secure communication over a computer network and is widely used on the Internet. In HTTPS, the communication protocol is encrypted using Transport Layer Security or, formerly, the Secure Sockets Layer(SSL).

Now, let us dive into the main objective of this article. First, what does www.google.com stand for?

Step 1: URL Parsing

protocol://hostname: port/path_of_filename === https://www.google.com/

From these terms:

  • Port?- the port (think of it like the server’s mailbox) where our request will be sent. Empty in our example URL, but correspondingly implied by the web browser based on the protocol — HTTPS uses port?443.
  • Path_of_filename?- the name of the file requested and its location in the server’s directory. Also left empty in our example URL, thus implying that we are querying the server at the root /.

In our case:

  • the protocol?is HTTP (HTTP secure)
  • hostname?is?www.google.com
  • the port?is 443 because the protocol is HTTPS. But if the protocol is HTTP(without ‘s’), the port number is 80. This means the browser requests www.google.com?through the server port number 443 to access the homepage of google.
  • path_of_filename?is empty for our case.

Step 2: DNS Lookup

The browser sends www.google.com?to the DNS server to request the IP address of?www.google.com. By using the dig command from Linux os the IP address of?www.google.com?is 142.250.179.164. By using this IP address, the browser changes https://www.google.com?to?https://142.250.179.164.

https://www.google.com?===?https://142.250.179.164?– has similar output.

Step 3 – TCP/IP

Finally, our web browser is ready to go. Having resolved the IP address associated with www.google.com, the browser proceeds to begin communication with the corresponding server through port number 443. The communication between the browser and server occurs over what is referred to as Transmission Control Protocol/Internet Protocol (TCP/IP). This communication protocol is not mandatory — any working protocol goes — but is a standard when it comes to web infrastructure and the OSI model. An alternative transport-layer protocol,?User Datagram Package?(UDP) is faster but less reliable — packet delivery is not double-checked. UDP is typical of streaming services where instant content takes priority; TCP is used most everywhere else.


Step 3 – SSL

The first thing the web browser sends to the resolved IP address of?www.google.com?is a message containing its?Transport Layer Security?(TLS) version along with a list of supported cypher algorithms and compression methods. TLS is a symmetric cryptography encryption method used to keep communicated data?private, authenticated, and?reliable.

Upon receiving this initial communication, the server chooses its preferred TLS algorithm and method and responds with a certificate and security approval including the server’s TLS public key. Back at the client side, the browser uses this public key to encrypt a pre-master key that is sent back to the server.

If the public key sent to our browser was authentic, then the server is able to decrypt the pre-master key with its TLS private key. Upon proof of successful decryption, the browser and server have effectively established a trusted connection and symmetric method of sending messages back and forth.

Step 4 – Load Balancer

A load balancer is an intermediary responsible for handling this traffic-splitting work. A load balancer is software that can be configured either on the same server as that hosting web content or on a server all its own. One such common and free load balancer software is HAProxy. HTTP request traffic is split up by a program such as HAProxy according to a load-balancing algorithm. There are various types of load-balancing algorithms, each with its own advantages and disadvantages.?

Backtracking in our example, the resolved IP address of?www.google.com?was truly the IP address of the load balancer server. The web browser completed the TLS handshake with this load balancer server, thus making it the?TLS termination proxy. Almost like a post office, this server, which we’ll imagine is configured with a round-robin algorithm on HAProxy, was the receiver of our HTTP GET request. HAProxy took the request, pulled up the IP address of the next web server in its queue, and sent it off that way.

Step 5 – Firewall

Through the TLS handshake, our browser came to an agreement with the load balancer server as to how to encrypt messages as they are passed back and forth. TLS achieves three crucial security purposes — privacy, integrity, and identification — yet it fails to account for a fourth — honesty. Contextualizing firewalls in our example, at this point, our GET request has already passed one firewall, installed on the load balancer. It will next pass another installed on whichever host server it is distributed to.

Step 6 – Hosts Server

The host server is a web stack consisting of multiple parts that is traditionally set up along the lines of what is termed the LAMP(Linux Apache MySQL Python/PHP) model.

Delivery of a web page works as follows:

  • A GET request is received by the web server. The web server pulls up the file configured at the given location (in our example, the HTML file configured at the root (/) of the machine).
  • If the file contains dynamic content, the application server is run (ie. the corresponding Python scripts are run). The result of these scripts is inserted into the web page.
  • If the dynamic content involves stored data, the Python scripts query from the database server (probably through Python libraries such as?MySQLdb?or?SQLAlchemy).
  • The web server delivers the web page.

Step 7 – Page Rendering

It’s been a long journey, but our web browser has finally received the web page we requested. After pulling up the HTML file configured at the root of?www.google.com, the host server sent it back to the web browser in an HTTP response message.

The initial status line of this response message includes a status code indicating the success of the handled request. Upon successful retrieval and delivery of the web page, the host server signals?200. Other common status codes include?301?(page redirection) and?404?(page not found).

In the response header, the host server states information about the delivered page such as its type (HTML, in our case) and size.

Finally, in the response message body, the host server delivers the actual, entire HTML code itself. This is what the browser has been looking for since the start! Now it shows off, utilizing its HTML and CSS engines to parse the code, break it down into its Document Object Model, and render the page. Any JavaScript scripts written in the file are run. When it's all said and done, Firefox displays a beauty, a joy, a realization of our dream — the Google home page.

A quick rundown of the process just described:

  • The browser receives the URL?https://www.google.com?and parses it into its protocol (HTTPS), hostname (www.google.com), port (explicitly,?443), and location (explicitly, root?/).
  • The browser checks if the hostname has already been resolved in its own or the OS’s cache. If so, the corresponding IP is retrieved right there and then.
  • Otherwise, the hostname is resolved through the Domain Name System.
  • The browser completes a TLS handshake with the load balancer specified at the resolved IP. This communication occurs over TCP/IP.
  • Having established an encrypted connection method, the browser sends the load balancer a GET request for the file located at the root of?www.google.com.
  • The GET request is passed through a firewall on the load balancer.
  • The load balancer distributes the GET request to the next available host server, as determined by its configured load balancing algorithm.
  • The GET request is passed through a firewall on the host server.
  • The host server retrieves the file located at its root directory and returns its content, served dynamically by the application and database servers.
  • The browser receives the HTTP response message containing the file content and renders the HTML page to the user.

Now you are able to fathom what actually happens when you type www.google.com. Its all web infrastructure. Kindly share for others to learn too.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了