What happens when you type google.com in your browser and press Enter
Ferdinand Charles
Full Stack Software Engineer | AI/ML Engineer | Prompt Engineering Specialist| Large Language Model Whisperer | Digital Transformation Consultant | Cloud Solutions Architect | DevOps Practitioner | TensorFlow | AWS
Have you ever wondered what goes on behind the scene, whenever open your favorite browser to search for your favorite website, let’s say www.google.com for example? Have you ever wondered out of curiosity the processes that go on before your favorite website is displayed? Even though the website takes only a few milliseconds to load over a healthy internet connection, but still, there are processes, communications, and interactions that ensure the results are fast and reliable.?
Of a truth, I must tell you 99% of internet users don’t care about what goes on behind the scenes, all we want are fast and reliable connections. However, if you are reading this article is either one of two things, you care to know or you are a little bit curious, maybe both, whichever it is, I gat you.
IMPORTANT NOTES: -
Now, let's jump right into it:
When you search for www.google.com in your browser (Which will be referred to herein as client), the client makes a series of requests (to what we all know and call the internet) and receives a bunch of responses in return. Some of these requests and responses go back and forth between the client and the internet even before the actual website is displayed to us as the end user. These requests and responses happen so fast, we as end users barely noticed them, even when it returns error we don’t take notice except if we have a special kin interest out of curiosity or we have developed a necessary need to know (so to speak).
Having done a little bit of introduction, let's dive a little deeper into these requests and responses, break them down stage by stage, and examine how they move back and forth till the website is fully displayed to us as end users.
In this article, we will be explaining briefly the concepts outlined below, and the role they play from when we click www.gooogle.com to when the full website is displayed.
(You can make further research to attain more knowledge about these concepts as explaining them in full detail is beyond the scope of this article.)
When we enter www.google.com on our browser, the browser first and foremost sends a request to the DNS requesting for the IP address of www.google.com, the DNS will in return send a response containing the IP address of www.google.com back to the browser. Hold on, now at this stage nothing has been displayed yet, as a website (in other words what the browser has done, can be described as you requesting for your particular car key from a stack of car keys, not necessarily driving the car yet).
Ok but, what does this even mean and what is DNS (Domain Name Server)?
To answer this question, let me paint a familiar picture of phone numbers and telecommunications network providers. The telecommunications network providers identify each person with their phone numbers, but we as end users can't possibly identify each person we know or meet every day by their phone numbers hence, the need to store each person's phone number with a particularly unique name in an app called Phonebook on our mobile phone, this phonebook app will help us to identify who is calling or who exactly we want to call.
DNS also known as Domain Name Server is the phonebook of the Internet, it houses millions of domain names associated with unique IP addresses, DNS exists so we don’t have to remember each website by its IP address, I mean you have to agree with me that, it is easier to remember website names like www.google.com than remember IP address like 216.58.217.206 (this is the real the IP address of www.google.com). The whole process of the client sending a request to the DNS and the DNS successfully responding with an IP address is known as DNS Resolution.
The DNS Resolution comprises of
1.??????DNS Recursor: - This serves as a middleman between the client(browser) and other DNS servers It follows a chain of referrals from each one until it locates the requested host’s IP address. The Recursor will also cache information to respond faster to subsequent client requests.
2.??????Root Name-server: - Once the DNS Recursor receives a request from the client, the DNS Recursor sends a request to the Root Name-server, and in response, the Root Name-server sends back the appropriate Top-Level Domain (TLD) server, based on the queried host’s domain extension.
3.??????Top Level Domain (TLD) server: - Once the DNS Recursor receives the response from the Root Name-server it sends another request to the TLD server, which Maintains information for all domain names with the same domain extension, and in response, the TLD sends the appropriate Authoritative Name-server back to the DNS Recursor.
4.??????Authoritative Name-server: - After receiving the response from the TLD server, the DNS Recursor once again sends another request to the Authoritative Name-server that stores the DNS records that map domain names to IP addresses. The Authoritative Name-server responds to a DNS Recursor’s final request with the queried hostname’s IP address. If the IP address is not available, the Name-server will throw an error.
As a final step in the DNS resolution process, the DNS Recursor sends the IP address back to the client, allowing it to connect to and load the appropriate website or application (not so fast…).
Having given a basic explanation of DNS let's retrace our steps, so we don’t create room for confusion. When we click on www.google.com the browser sends a request to the DNS and the DNS responds with an IP address, the browser again sends another request to the web server and the web server responds with the actual website this time around.
The browser and web server use a special set of rules, and protocols called TCP/IP to communicate with each other (that is to send a request and receive a response). The client will send a request through TCP/IP and the web server will respond to the request through TCP/IP.
Understood but, what is TCP/IP and what are the processes involved?
Transmission Control Protocol (TCP) is one of the most commonly used protocols within digital network communications, it enables application programs and computing devices to exchange messages over a network by sending packets (more on this later) of data across the internet and ensuring successful end-to-end delivery.
(NOTE: - The IP here should not be mistaken for the IP address we discussed earlier. In simple words, while TCP ensures that packets(data) are been sent and received successfully, the IP ensures that the data are been sent to the right destination. So IP in this case is not the IP address of the website but the IP of the device from which the request is been sent.)
Just like DNS, TCP undergoes a process of its own within its layers. In other words, the TCP is divided into layers such as
Let's discuss these layers briefly, with some sort of focus on the Application layer.
The application layer is responsible for end-to-end communication between the client and web server and one of the ways it establishes this communication is through a protocol known as HTTP or HTTPS.
Alright, very simple but What is HTTP (Hyper Text Transfer Protocol), and what does have to do with this?
Hypertext Transfer Protocol (HTTP) is the primary protocol for encoding and transmitting information across the Internet between a client and a web server. It follows a request-response paradigm in which the client makes a request and the web server issues a response that includes not only the requested content but also relevant status information about the request. This exchange of Information between the clients and web servers is done in the form of Hypertext documents, from which HTTP gets its name.
What? Wait Hypertext…? What is that?
Hypertext is a structured text that uses logical links, or hyperlinks, between nodes containing text. Hypertext documents can be manipulated using the Hypertext Markup Language (HTML). Using HTTP and HTML, clients can request different kinds of content (such as text, images, video, and application data) from web and application servers (more on this later) that host the content.
Very Interesting, so what is HTTPS then?
HTTPS is the secure form of HTTP, the S in HTTPS stands for secure, which indicates when a connection is secured by a Secure Socket Layer (SSL).
SSL? What is that? Is it like firewalls?
No, the Secure Socket Layer (SSL) is not a firewall there are entirely two different things and we will discuss them both before long.
A secure Socket Layer (SSL) is the standard technology for keeping a connection secure and safeguarding any sensitive data that is being sent between two systems, it does this by using encryption algorithms to scramble data in transit preventing hackers from accessing, reading, or modifying information as it is being sent over a connection. Some of this information may include sensitive or personal information such as names, addresses, Credit Card numbers, and other financial information.
The two systems can be a web server and a client or server-to-server (for example, an application with personally identifiable information or with payroll information).
领英推荐
I get it, the HTTPS or SSL should not be confused with a firewall I know, but what are firewalls?
A firewall is a network security device that monitors incoming and outgoing network traffic and permits or blocks data packets based on a set of security rules. At its most basic, a firewall is essentially the barrier that sits between a private internal network and the public Internet.
A firewall’s main purpose is to allow non-threatening traffic in and to keep dangerous traffic out. It is a necessary part of any security architecture and takes the guesswork out of host-level protections and entrusts them to your network security device. It focuses on blocking malware and application-layer attacks, along with an integrated intrusion prevention system (IPS).
Types of Firewalls
There are different types of firewalls, and we will take a quick look at some of them.
●???????Packet filtering: - A small amount of data is analyzed and distributed according to the filter’s standards.
●???????Proxy service: - A network security system that protects while filtering messages at the application layer.
●???????Stateful inspection: - Dynamic packet filtering monitors active connections to determine which network packets to allow through the Firewall.
●???????Next-Generation Firewall (NGFW): - Deep packet inspection Firewall with application-level inspection.
I know right now, it feels like we are a little bit out of course, but let me reassure you that we are still very much on course.
In a nutshell, this is how the TCP/IP layers all work together
First, the Application layer sends encrypted data or streams of data to the Transport Layer Protocols. These protocols receive the data from the Application layer, divide it into smaller pieces called packets, add a destination address, and then pass the packets along to the next protocol layer, which is the Network layer. The Network layer encloses the packet in an Internet Protocol (IP) datagram, puts in the datagram header and trailer, decides where to send the datagram, and transmits them as frames over specific network hardware (Physical layer), such as Ethernet or Token-Ring networks, which is received by the web server, and in return the web server sends a response back to the client through the same pipeline from which it received the request from (that’s from the physical layer back up to the Application Layer).
Now, a quick recap of our www.google.com search
On clicking www.google.com, the client sends a request to DNS, and DNS responds with the IP address of google.com, armed with the IP address, the client makes an HTTPS request that is secure by an SSL certificate, monitored by firewalls for malicious content, this HTTPS request will pass through all the layers of TCP/IP where it will be broken into pieces called packets and delivered to the web server. In return, the web server will send its response to the client via the same channel it received from.
Now you might be wondering what happens if the receiver did not receive all these pieces called packages. Well… Is simple the TCP/IP will resend it up till it completes (which is why your browser will never load half of a picture or any other form of data).
?
Moving ahead, what is a web server?
A web server stores and delivers the content for a website – such as text, images, video, and application data – to clients that request it. A web server should not be confused with an Application server or Database
Ooh!! Really?? What is an Application server?
An application server provides web pages with application content that is dynamic and allows complex user interactions. Dynamic content is more interactive and involved than static content. Application servers work as an intermediary between databases, which store application data, and web clients. They also communicate with web servers, which deliver content to the web client.
Now that you have mentioned it, what is a Database?
A database is an organized collection of structured information, or data, typically stored electronically in a computer system. It is usually controlled by a database management system (DBMS). Together, the data and the DBMS, along with the applications that are associated with them, are referred to as a database system, often shortened to just a database. Data within the most common types of databases in operation today is typically modeled in rows and columns in a series of tables to make processing and data querying efficient. The data can then be easily accessed, managed, modified, updated, controlled, and organized. Most databases use structured query language (SQL) for writing and querying data while others don’t.
It is important to note that when a web server receives a request it responds first with static web content—e.g., HTML pages, files, images, video—primarily in response to hypertext transfer protocol (HTTP) requests from a web browser, while it queries the Application Server to update it with dynamic contents that allows for complex user interaction amongst other things, the Application server while updating the webpages will also query the database for content stored in it.
This is basic knowledge of how our connection works, but there are other problems, take for example, we all know millions of people around the world are trying to access www.google.com every given point in time, and with such high traffic one server, will be insufficient for everyone trying to access the website and of course, there are several servers scattered around the world, but how do we make sure one server is not overloaded while another is idle? How do we check and balance this high traffic and make sure everyone gets served with the same high-quality images, videos, data, etc. in a fast and reliable manner?
Well ... this is one of the reasons the load balancer was introduced in the first place.
Load Balancer? What Is that, please?
First of all, let’s understand the concept of Load balancing. Load balancing refers to efficiently distributing incoming network traffic across a group of backend servers, also known as a server farm or server pool.
A load balancer acts as the “traffic cop” sitting in front of your servers and routing client requests across all servers capable of fulfilling those requests in a manner that maximizes speed and capacity utilization and ensures that no one server is overworked, which could degrade performance. If a single server goes down, the load balancer redirects traffic to the remaining online servers. When a new server is added to the server group, the load balancer automatically starts to send requests to it. In this manner, a load balancer performs the following functions:
●???????Distributes client requests or network load efficiently across multiple servers
●???????Ensures high availability and reliability by sending requests only to servers that are online
●???????Provides the flexibility to add or subtract servers as demand dictates
?
Load Balancing Algorithms. What? Does Load Balancer have those too?
Yes, it does and there are different Load Balancer algorithms. And these different load-balancing algorithms provide different benefits, which means the choice of load-balancing method depends on one's needs.
Let’s a glance at some of the common methods
In conclusion
There are several things left unsaid and untouched, not because they are not important but because they are beyond the scope of this article. Maybe my subsequence articles will shine some light in those areas but for now, this is as far as the scope of this article can go, and even at that this is barely a scratch on the surface as long as this topic is concerned.
But before we go a quick recap,
On clicking www.google.com, the client will send a request to DNS, and the DNS will respond with an IP address of google.com, armed with the IP address, the client make an HTTPS request that is secure by an SSL certificate, monitored by firewalls for malicious content, this HTTPS request will pass through all the layers of TCP/IP where it will be broken into pieces called packets, before getting to a Load Balancer, that will decide which web server gets the request and in return the web server will respond first with static content, before querying the Application server to update it with dynamic contents, the Application server will also query the database for saved contents to update the browser with, via the same pipeline it received the request from.!!
I believe this is as simple as it can get.
Thank you for your time and thank for reading. Bye for now.
Software engineer - Python Django | JavaScript ReactJS
7 个月Goot work