What happens when you type https://www.google.com in your browser and press Enter.

What happens when you type https://www.google.com in your browser and press Enter.


Web Infrastructure

  • Published on, October 4, 2022

No alt text provided for this image

What happens when you type https://www.google.com in your browser and press Enter.?

Today, access to the internet and the development of different activities has become one of the most recurrent and automated activities in people's daily lives.


For this reason, we tend to forget to ask ourselves what is happening behind all these processes that, although they make life easier for us, involve a certain degree of complexity that we should not ignore.

In this blog I want to explain what happens when we use our browser and look for a specific page, in my case, what happens when I type https://www.google.com; but you could also think about it for when we want to enter Facebook, for example, or Netflix and Google or just a random website.

No alt text provided for this image

To understand the web infrastructure process there are a couple of concepts we must keep in mind. One of them is "Browser"; which is a common tool these days used when we double click on, e.g.… Firefox. A web browser is a software application for accessing information on the World Wide Web. When a user requests a web page from a particular website, the web browser retrieves the necessary content from a web server and then displays the page on the user's device.

It sounds easy, yes, but between the requirement in our browser and the response time for us to be able to see and access the information there are lots of processes running to respond to our "enter".

DNS request

Here is the first thing that happens when we type something on our browser. Computers and other devices communicate using IP addresses to identify each other on the internet but, as humans, we don't like to remember most of the important things so we use words to make it easy. This is where DNS comes in, which is usually a not-so-unfamiliar word but a bit overrated.

The Domain Name System, which is actually what DNS means, brings the human address and IP address and gets us to the destination we are searching for. So, when I type https://www.holbertonschool.com the browser, through DNS, goes and searches for that IP address on its cache.

There is a possibility that the DNS does not find the IP address in the browser's cache in the first instance, which generates a whole search line, basically like this: If it cannot be found on the browser cache, the next step is to refer to the OS and search on its cache. If not found, the OS refers to the resolver. The resolver server is usually our ISP (Internet Service Provider). All resolvers must know where to locate the root, and why the root? The root server knows where to locate the .com Top-Level Domain (TLD) server. The root server sits at the top of the DNS hierarchy as in this search line, where the IP can be found only redirecting the search, in this case, to the .com TLD server. The TLD server will find the authoritative name server for the domain (holbertonschool.com, this blog case) then the resolver refers to the Authoritative Name Server, where the purchased domain names are reserved in the Domain Registrar, and that is how, ultimately, the resolver will get the IP address. The IP address is delivered to the OS, saving it on its cache for future responses and then passed to the browser.

No alt text provided for this image

Once the IP is found, the browser initiates a TCP connection to the IP address server. After the server completes the acknowledgment of its side of the TCP connection, the browser sends an HTTP request to the server to retrieve the content. Let's check out these concepts.


TCP/IP

The Internet protocol suite is the conceptual model and set of communications protocols used in the Internet and similar computer networks. It is commonly known as TCP/IP because the foundational protocols in the suite are the Transmission Control Protocol (TCP) and the Internet Protocol (IP). TCP transport is used to deliver data across IP networks by establishing a virtual connection between two devices through a series of request-and-reply messages sent across the physical network. TCP is a protocol that gradually transfers data but surely.


No alt text provided for this image


Web Server

The primary function of a web server is to store, process, and deliver web pages to clients. The communication between client and server takes place using Hypertext Transfer Protocol (HTTP). The term web server can refer to hardware or software, or both working together.

On the hardware side, a web server is a computer that stores web server software and website components files such as HTML documents, CSS stylesheets, and JavaScript files. A web server connects to the internet and supports physical data interchange with other devices connected to the web. On the software side, web servers include several parts that control how web users access hosted files. At a minimum, this is an HTTP server, which understands URLs and HTTP. An HTTP server can be accessed through the domain names of the websites it stores, and it delivers the content of these hosted websites to the end-user devices.

There are two types of web servers, dynamic and static. A static web server is a computer with an HTTP server (software). Here, the server sends its hosted files to the browser. The dynamic web servers are static web servers plus extra software (an application server and a database). It is "dynamic" because the application server updates the hosted files before sending content to your browser via the HTTP server.

So, whenever a browser needs a file that is hosted on a web server, the browser requests the file via HTTP. When the request reaches the correct (hardware) web server, the (software) HTTP server accepts the request, finds the requested document, and sends it back to the browser, also through HTTP. (If the server doesn't find the requested document, it returns a 404 response instead.) This way we can understand now how once the browser got the IP address for holbertonschool.com and there is a TCP connection between the IP server and the browser, the website information and my profile are charged and showed up to me thanks to the HTTP request and response between these two. loading and updating information, images, CSS stylesheets for all the HTML files and fonts, JavaScript files for dynamic interactions, and the profile information they have on their database for the moment when someone logs in.

Application server

An application server provides facilities to create web applications and a server environment to run them. Consists of web server connectors, computer programming languages, runtime libraries, and database connectors. An application server runs behind a web server and in front of an SQL database. Web applications are computer code that runs atop application servers and are written in the language(s) the application server supports and calls runtime libraries and components the application server offers.

Database

No alt text provided for this image


The database applications are used to search, sort, filter, and present information based upon web requests from users. Also, databases can contain code to perform mathematical and statistical calculations on the data to support queries submitted from web browsers. Databases grant and limit access to data based upon criteria such as username, password, region, or account number. They also enforce data integrity by ensuring that details are presented and collected using a consistent format. The database, in the case of a dynamic web server, automatically updates web pages, eliminating the requirement to manually update

HTTPS/SSL

The Hypertext Transfer Protocol (HTTP) specifies how to transfer linked web documents between two computers, is a set of rules for communication, and is responsible for processing and answering incoming requests. On the other hand, we can also find the Hypertext transfer protocol secure (HTTPS) which is the secure version of HTTP. How does this come to be?

No alt text provided for this image

At this point, we will all have noticed that, in most of the pages we open, we always find a "padlock" on the left side of the search bar just before the domain we want to access. In modern web browsers such as Chrome, websites that do not use HTTPS are marked differently than those that are. And that is exactly what that padlock is letting us know, that in this case (holbertonschool.com) the traffic is served over the HTTPS. Why? Because it is more secure, since HTTPS uses the secure port 443, which encrypts outgoing information, it is much more difficult for people to spy on this site's information. What happens with the regular HTTP? This protocol uses port 80 which sends information via plain text. Plain text can easily be compromised during transit and by using it, to the risk of exposing the site and visitor's information.

The HTTPS is managed by an SSL certificate that can be installed on the site to create an encrypted link to the visitors, this helps keep information from being stolen during transit between the browser and the server. For this reason, it is proper that we look when we open a page to see if it is safe or not. It does not mean that a page containing the "padlock" is safe, but at least we can believe that its managers are taking the time to at least try to protect their users' information. You can click here and check what I am trying to say.

Through all this review process about what happens when we search for something on our server, more specifically, when we type www.holbertonschool.com; a large part of what would be the web structure that is usually used in the web pages that we frequent or in the applications that we use can be denoted. To which we could add just a couple more concepts, firewall, and load balancers. Directly related to the protection of the servers in which all the information, of both the host and ours as users, is located.

Load-balancer

Load balancing refers to the process of distributing a set of tasks over a set of resources (computing units) to make their overall processing more efficient. Load balancing techniques can optimize the response time for each task, avoiding eventually overloading compute nodes while other compute nodes are left idle.

A load balancer can be managed through two main approaches: static algorithms and dynamic algorithms. The first one does not take into account the state of the system for the distribution of tasks. Aims to associate a known set of tasks with the available processor to minimize a certain performance function. The advantage of this type of algorithm is that they are easy to set up and extremely efficient in the case of fairly regular tasks (such as processing HTTP requests for a website). Dynamic algorithms are those algorithms that search for the lightest server in the network and then designate the appropriate load on it. In this, the workload is distributed among the processors at runtime. The algorithms in this category are considered complex but have better fault tolerance and overall performance.

Firewall

A firewall acts as a gatekeeper. It is a security device that can help to protect the network by filtering the traffic and blocking outsiders from gaining unauthorized access to private data on computers. It also blocks malicious software from infecting and monitors attempts to gain access to an operating system.

In conclusion, behind a simple search for a web page through our browser, an entire infrastructure, and a complex process is displayed. Which consists of request and response protocols, servers and search systems, databases and information manipulation, algorithms and load balancers that allow the most efficient development of the entire network, and different protection and security methods. In a brief look, this would be the infrastructure of a web application, for example, with a respective load balancer, SSL certificate with HTTPS routing, their respective firewalls, three servers working each with their respective configuration on a web server, web application, and database. for easy application updates and quick responses.


No alt text provided for this image


References

How DNS works

An Introduction to HAProxy and Load Balancing Concepts | DigitalOcean

What is a web server? - Learn web development | MDN (mozilla.org)

HTTPS & SSL Does Not Mean You Have a Secure Website (semrush.com)

What Is an IP Address? | HowStuffWorks

TCP/IP: What is TCP/IP and How Does it Work? (techtarget.com)

Port Numbers Used for Computer Networks (lifewire.com)

要查看或添加评论,请登录

社区洞察

其他会员也浏览了