What happens when you type google.com in your browser and press Enter?
Kudzanai Gomera
Cybersecurity Analyst @ TransUnion || Defensive Security Engineer || SWE
The internet is so present in our days, we use it so often that we do not realize many of the things that happen behind something as common as visiting a website.
One thinks that it is something as simple as typing the web address of the site we want to enter, pressing <enter>, and voila, we have the content in front of our eyes, but it is not as simple as that.
In this blog, we will bare what happens behind the browser when we want to enter a website.
Have you ever wondered what happens in the browser when you enter a URL, for example: "google.com"?
After typing the desired address, in this case, "google.com" and pressing <enter>, the browser will want to establish a connection with the website to make a request for the content of the page, but how do they communicate? Communications between the browser and the server are made through IP (internet protocol) addresses. And if you remember you have written "google.com", not an IP address (128.1.87.69).
DNS: The great internet guide.
The DNS (Domain Name System), which in a few words is like a telephone book but from the Internet, was created so that humans do not have to remember numbers since IP addresses can be numbers in IPV4 (192.168.3.4) or also alphanumeric in IPV6 (2440:cb00:2048:1::c629:d7a2). Imagine having to remember the latter, it's impossible!
In the event that this is the first time that this search is performed in the browser, the path will be a little longer and this will be the example that we will follow.
DNS is going to take what we've typed and ask the browser cache "Have you ever visited google.com?" This is where the IP address would be if you had visited "google.com" before. If the answer is negative, the same will be consulted to the Operating System (OS) cache using the?gethostbyname?function, if the answer is still negative, the DNS resolver is called, which in the first instance will consult the router to which we are connected.
If the response is not satisfactory, it will continue to escalate in the network, and the ISP (Internet Service Provider), who also hosts a DNS cache, will be consulted. If the IP of the "google.com" domain is not known, it will consult the root DNS server which is the one that knows where to find the TLD (Top Level Domain) servers of ".com" in this case. The DNS root of ".com" will give you the address of the domain's nameserver (ns1.google.com) so that our DNS resolver will query there for the IP it is looking for.
So the DNS resolver will go to "ns1.google.com" and query for the IP address of "google.com", since when you buy a domain (google.com) the domain registrar reserves the name and shares it with the DNS root authorized servers. And this will finally answer the initial question of what is the IP address of "google.com" (8.8.8.8). The IP address of "google.com" will be saved by the cache of the browser and the OS so that you do not have to perform the entire search again each time you want to enter this website.
It should be noted that this entire process takes place in a matter of milliseconds.
And now, what's next..?
Now that the browser knows which IP address to query, it must use the TCP/IP (Transfer Control Protocol)/(Internet Protocol) protocol to establish a secure connection.
So what is TCP/IP and how does it work? To begin with, it is a standard based on the bidirectional transmission of information in the form of small packets. To establish the connection, it is necessary that two access points and the destination (client and server) are recognized. However, it does not matter which part is acting as the client and which is acting as the server, all that is required to establish the connection is that each of the parts has an assigned IP address and port.
This is a three-step process where the client and server exchange SYN (synchronize) and ACK (acknowledge) messages to establish a connection.
领英推荐
In total there are 65,535 TCP ports, of which 0 to 1023 are established by IANA (Internet Assigned Numbers Authority) and are very frequently used ports. Some examples: port 80 for HTTP queries, port 22 for SSH queries, port 443 for HTTPS queries.
Now yes, finally the browser will be able to send a request with the GET method using the HTTP protocol (Hyper Text Transfer Protocol) so that the server responds with the corresponding HTML, CSS, and JavaScript files. The response is accompanied by the status code, there are 5 types of status detailed by means of a numerical code: ● 1xx indicates an informational message only. ● 2xx indicates the success of some kind. ● 3xx redirects the client to another URL. ● 4xx indicates an error on the client’s part. ● 5xx indicates an error on the server’s part.
But, who receives the request made by the browser?
In this case, the one that receives it is the web server, but there are also other types of servers, such as the application server or the database server. In this article we will focus on the web server, it can be run with various software, the most common are Nginx and Apache. Now suppose we have a lot of traffic on our network, a single server would not be enough, so we should add more servers to our service, but how does the browser know which server is available to receive its request?
This is where the Load Balancer comes in, it is a server configured with different algorithms (Round Robin, Least Connections) that sits “in front” of the servers that have the HTML, CSS, and JavaScript files. This will redirect the different requests made to the website to each of the servers so that none is overloaded and network traffic is not affected.
It is worth clarifying that normally each of these servers should have the same versions of these HTML, CSS, JavaScript files in order to generate the same service regardless of which of the servers it is redirected to.
The load balancer is also important when it comes to attacks that take down servers or malfunctions in any of them since the web service will continue to work without the user realizing that there was an error in the servers.
Adding more security
In the load balancer is the firewall, which can be hardware, software, or a combination of both.
The firewall can be both in the load balancer and in the different servers of the service to add more security. The firewall is responsible for restricting requests to certain ports and ensuring the security of the servers, avoiding connections that it considers dangerous for them.
Another way to make this connection even more secure is to use the HTTPS protocol, which contains SSL (Secure Socket Layer) encryption that provides a secure connection between the client and the server. It is an Internet protocol that allows you to encrypt sensitive data (such as payment information, and credit card data) and make it unreadable to third parties.
A simple scheme about what was written:
CONCLUSION
A "simple" connection between the browser and a web server contains many important verification and security steps behind it so that it is secure enough when exchanging data and packets.
Small punctuation of the big steps that happen in the connection: