This article simplifies what happens when a user enters https://goggle.com in their browser and covers related basic concepts:
- DNS request
- TCP/IP
- Firewall
- HTTPS/SSL
- Load-balancer
- Web server
- Application server
- Database
The text is divided into two parts based on familiarity with the concepts introduced. The first part assumes that the reader already has some basic understanding and explains what happens. This part is recommended for readers who are already familiar with the concepts.
The second part is more detailed and provides a deeper understanding of each concept while covering the basics. It is recommended for curious readers who want to explore the concepts more deeply.
With this in mind, let's get started!
SECTION ONE: what happens when the key is pressed?
Have you ever considered the complex process of typing a simple query such as "google.com" into your web browser and pressing the Enter key? From the moment you start typing to the moment the Google homepage appears on your screen, a fascinating journey occurs within the depths of the Internet depths of the internet. Let's explore the intricacies of this process and understand the steps involved in bringing you the website of the search engine giant.
- DNS Resolution: The journey begins when you enter "google.com" in your browser's address bar and press Enter begins. Your browser then sends a request to a DNS (Domain Name System) server to translate the human-readable domain name "google.com" into an IP (Internet Protocol) address, which identifies the server hosting Google's website on the Internet.
- Request Routing: After the DNS server resolves the domain name "google.com" to its corresponding IP address, your web browser sends an HTTP (Hypertext Transfer Protocol) request to the server identified by that IP address. This request contains the necessary information for the server to understand that you are requesting the homepage of Google's website.
- Load Balancing: Google has a vast infrastructure with data centers worldwide to guarantee quick and reliable service access. When you request access to a service, your request may go through a load balancer, a network device used to distribute incoming traffic across multiple servers. This helps improve performance and prevents servers from overloading.
- SSL Handshake: To ensure that your data remains safe and private while being transmitted, your browser and the Google server use a security measure called an SSL (Secure Sockets Layer) handshake. This process includes encryption and authentication and establishing a secure connection using HTTPS (Hypertext Transfer Protocol Secure).
- Server-Side Processing: Google's server establishes a secure connection to process your request and retrieves the necessary resources to construct the homepage dynamically.
- Content Delivery: After your browser sends a request to the server for a webpage, the server assembles the webpage by including various elements such as HTML code, CSS stylesheets, JavaScript code, and other assets. Once the webpage is fully assembled, the server sends the response back to your browser over an encrypted connection. This response contains all the elements your browser needs to render the Google homepage.
- Client-Side Rendering: After receiving a response from the server, your browser parses the HTML code, applies the CSS styles, executes the JavaScript code, and renders the visual elements of the Google homepage on your screen.
- Interactivity: The Google homepage is not a simple static webpage. Instead, it's a highly interactive interface that lets you perform diverse searches, access other Google services, and explore various content. Your browser enables this interactivity by managing user input, events, and interactions with the webpage's elements.
- Caching and Optimization: Various optimization techniques are utilized throughout the process to improve performance and minimize latency. These techniques include caching frequently accessed resources, compressing data to speed up transmission, and using content delivery networks (CDNs) to deliver content from servers closer to your area.
- Continuous Monitoring and Improvement: Google monitors its services for performance, reliability, and security using sophisticated monitoring systems, automated alerts, and rapid response mechanisms to ensure a seamless user experience.
When we type "google.com" in our browser and hit Enter, it may seem straightforward, but in reality, it involves several intricate steps. From Domain Name System (DNS) resolution to Secure Sockets Layer (SSL) handshake, server-side processing to client-side rendering, and more, complex actions bring us the Google homepage within milliseconds. Therefore, the next time we search or visit our favorite website, let's pause and appreciate the remarkable journey behind the scenes.
SECTION TWO: Basic ideas behind each concept.
A DNS request is a communication from a client to a DNS server seeking to resolve a domain name into an IP address.
- Initiation: When you enter a website's name into your web browser's address bar or click on a link, your computer sends a request to a DNS server to convert that name into the corresponding IP address necessary for establishing a connection.
- Query: The DNS request is a UDP packet sent to a DNS server containing the domain name that needs to be resolved.
- DNS Server: The DNS server is a specialized computer that stores databases of domain names and their associated IP addresses. When it receives a DNS request, it checks its records to find the matching IP address for the requested domain name.
- Resolution: When a client requests a domain name from a DNS server, if the server has the IP address for that domain name in its cache (temporary storage of recent DNS lookups), it will immediately return the IP address to the client. This type of response is called a cached response.
- Recursive Resolution: If the DNS server does not have the IP address in its cache, it will perform a recursive resolution process. During this process, the DNS server will search for the IP address by querying other DNS servers on the internet. The search may involve multiple steps until the IP address is found. The process of querying other servers on the internet to find the IP address can follow a particular pattern:
- Response: After receiving the DNS request, the server obtains the IP address for the requested domain name and responds to the client computer with the IP address.
- Connection Establishment: With the IP address obtained from the DNS server, the client can connect with the desired server that hosts the website or service linked with the domain name.
- Caching: To improve internet performance and reduce the need for repetitive DNS lookups, the client computer and intermediate DNS servers along the route may cache the resolved IP address for a certain period. This cache speeds up future requests for the same domain name.
TCP/IP, which stands for Transmission Control Protocol/Internet Protocol, is a set of communication protocols that connects network devices on the Internet and other computer networks. This suite of protocols provides a standardized set of rules for transmitting, routing, and receiving data across networks. It ensures reliable device communication regardless of the underlying hardware and software used.
Transmission Control Protocol (TCP):
- TCP is a protocol that establishes a connection between two devices. Its primary responsibility is to ensure reliable data transmission between these devices.
- It uses packetization and sequencing to send data in the proper order.
- To ensure data integrity, TCP handles error checking, correction, and retransmitting of lost or corrupted packets.
- Flow control and congestion avoidance regulate data transmission rates on networks.
- IP is a protocol that operates on the network layer of the OSI model. Its main functions include addressing and routing packets between devices on a network.
- Each device connected to the network is assigned a unique IP address, allowing identification and location tracking.
- real-timeIP stands for Internet Protocol, which is responsible for how data packets are packaged, directed, and transported across networks, including the Internet.
- Two versions of IP are currently used: IPv4 and IPv6. IPv4 uses 32-bit addresses, while IPv6 uses 128-bit addresses to accommodate the growing number of internet-connected devices.
- Firewall
A firewall is a security tool that monitors traffic coming into and leaving a network. It operates based on predetermined security rules to protect against unauthorized access, malware, data breaches, and other cyber threats. Its main purpose is to establish a barrier between a trusted internal network and untrusted external networks, such as the Internet.
- Packet Filtering: Firewalls inspect individual data packets passing through a network, examining key attributes such as source and destination IP addresses, port numbers, and protocol types.
- Rule-Based Filtering: Firewall administrators set rules to control incoming and outgoing traffic based on criteria such as IP addresses, port numbers, and application protocols. These rules determine which traffic is allowed, blocked, or restricted.
- Stateful Inspection: Firewalls now use stateful inspection to make more informed decisions about allowing or blocking traffic by tracking the state of active network connections.
- Application Layer Filtering: Certain firewalls can inspect the contents of data packets at the application layer, the seventh layer of the OSI model. This advanced feature enables these firewalls to detect and prevent specific applications or protocols, like peer-to-peer file sharing or certain types of malware, from passing through the network.
- Virtual Private Network (VPN) Support: Many firewalls have VPN functionality that establishes encrypted tunnels to protect sensitive data as it travels over untrusted networks for remote users to communicate with the internal network.
- Logging and Reporting: Firewalls maintain logs of network traffic and security events for audit and analysis, which can provide insights into potential threats, policy violations, and network activity.
- Intrusion Detection and Prevention: Some advanced firewalls have intrusion detection and prevention features that block malicious network activity in real time.
When you visit an HTTPS website, the website sends its SSL certificate to your browser. This SSL certificate includes the public key required to begin a secure session. Based on this initial exchange, your browser and the website initiate an 'SSL handshake,’ generating shared secrets to establish a unique and secure connection between you and the website.
Users will see a padlock icon in their browser address bar when a trusted SSL Digital Certificate is used during an HTTPS connection. The address bar will turn green if an Extended Validation Certificate is installed on a website.
SSL certificates contain the computer owner's public key, which is shared with anyone who needs it. Other users require the public key to encrypt messages to the owner, so the owner sends them the SSL certificate containing the public key. The owner doesn't share the private key with anyone.
The security maintained during the transfer is the Secure Sockets Layer (SSL) and Transport Layer Security (TLS).
Exchanging public keys using an SSL Certificate to enable HTTPS, SSL, and TLS is known as Public Key Infrastructure (PKI).
A load balancer is a device or software application that distributes incoming network traffic among several servers or resources. This helps optimize the system's utilization, reliability, and performance. Load balancers are typically used in large-scale web applications, databases, and server-based systems to improve their availability and scalability.
- Traffic Distribution: When a client wants to access a website or other online service, their request is sent to a load balancer. The load balancer analyzes the incoming traffic and distributes it among the available servers or resources based on predefined algorithms and configurations.
- Health Checking: Load balancers are responsible for constantly monitoring the health and availability of servers and resources in a pool. They perform periodic health checks, like ping or HTTP requests, to ensure that each server is responsive and capable of handling requests. When a server fails or becomes overloaded, the load balancer detects the issue and automatically removes it from the pool, redirecting traffic to healthy servers.
- Session Persistence: In some cases, it's essential to maintain session affinity or persistence, where all requests from a particular client are consistently directed to the same server. Load balancers can use techniques like source IP hashing or cookies to achieve session persistence while distributing traffic evenly across servers.
- Scalability: Load balancers enable horizontal scalability by automatically adding new servers to the pool. As the service’s demand grows, more servers can be provisioned and included in the load balancer configuration to manage the increased traffic load. Similarly, servers can be removed when the demand decreases, optimizing resource utilization and cost efficiency.
- SSL Termination: Load balancers can simplify server configuration and improve performance by offloading SSL/TLS encryption and decryption tasks from backend servers. The load balancer can terminate SSL connections and establish secure connections with backend servers over the internal network.
- Traffic Management and Routing: Load balancers provide advanced traffic management, enabling custom routing policies based on geographic location, content type, or server capacity. This flexibility optimizes application performance and resource allocation.
- High Availability: Load balancers are often deployed in redundant configurations to ensure high availability and fault tolerance. Clustering and failover mechanisms ensure that if one load balancer fails, another can seamlessly take over to prevent service disruptions.
- Logging and Monitoring: Load balancers offer logging and monitoring capabilities to track traffic patterns, server performance, and system health. These insights help administrators make informed decisions about resource allocation, capacity planning, and troubleshooting.
A web server is a software application or hardware device that responds to client requests by delivering web content over the internet or an intranet. It hosts websites, web applications, and other internet-based services, making them easily accessible to users via web browsers.
- Client Request: When a user enters a website's URL into their web browser or clicks on a hyperlink, the browser sends a request to the web server hosting the content that the user wishes to access. This request usually contains the URL along with any extra parameters or headers necessary for the server to process the request.
- Processing: The web server receives the client request and processes it based on the requested resource (e.g., HTML page, image, CSS file, JavaScript code). Depending on the configuration and the nature of the request, it locates the corresponding file or generates the content dynamically.
- Content Delivery: Once the requested content is prepared, the web server sends it back to the client's web browser over the internet or intranet. The content is typically transmitted using the HTTP (Hypertext Transfer Protocol) or its secure variant HTTPS (HTTP Secure).
- Response: The client's web browser receives the response from the web server and renders the content to the user. This may involve parsing HTML, rendering CSS styles, executing JavaScript code, and displaying multimedia elements such as images and videos.
- State Management: Web servers may also manage state for web applications using mechanisms such as cookies or sessions. These mechanisms allow the server to maintain information about the client's session or preferences across multiple requests and responses.
- Security: Web servers implement various security measures to protect against common threats, such as unauthorized access, data breaches, and denial-of-service attacks. This may include authentication, access control, encryption (e.g., SSL/TLS), and security patches to mitigate vulnerabilities.
- Logging and Monitoring: Web servers typically log information about client requests, server responses, and system performance for monitoring, analysis, and troubleshooting purposes. These logs help administrators identify issues, track usage patterns, and optimize server performance.
- Configuration and Customization: Administrators can configure and customize web servers to meet the specific requirements of their websites or applications. This may involve adjusting server settings, enabling/disabling features, and installing plugins or extensions to extend functionality.
Some popular web server software includes Apache HTTP Server, Nginx, Microsoft Internet Information Services (IIS), and LiteSpeed Web Server. These web servers offer various features, performance characteristics, and deployment options to suit different use cases and environments.
An application server is a software framework or platform that provides the runtime environment and services necessary for running and managing applications. It typically resides between the backend databases or systems and the frontend web servers or clients, serving as an intermediary to facilitate communication and execute business logic.
- Runtime Environment: Application servers provide a runtime environment where applications can run. This environment includes libraries, frameworks, and runtime engines for executing application code in programming languages such as Java, C#, Python, or PHP.
- Middleware Services: Application servers offer middleware services that enable communication between different components of an application or between the application and external systems. These services may include messaging, transaction management, security, caching, and data access.
- Business Logic Execution: Application servers execute the business logic or application logic defined in the application code. This logic encompasses the application’s core functionality, such as processing user requests, performing calculations, accessing databases, and generating responses.
- Integration: Application servers facilitate integration with backend systems, databases, and external services by providing APIs (Application Programming Interfaces) and connectors. They enable applications to retrieve and update data from databases, invoke remote services, and interact with other systems using standardized protocols and interfaces.
- Scalability and Load Balancing: Application servers support scalability by allowing applications to be deployed across multiple instances or nodes. They often include load balancing and clustering feature cases to distribute incoming traffic evenly across multiple instances and ensure high availability and performance.
- Session Management: In web applications, servers may manage sessions to maintain stateful client interactions. They manage session data, such as user authentication tokens, shopping cart contents, and user preferences, across multiple HTTP requests.
- Security: Application servers implement security mechanisms to protect applications and data from unauthorized access, tampering, and other threats. This may include authentication, authorization, encryption, input validation, and auditing features.
- Deployment and Management: Application servers provide tools and utilities for deploying, configuring, and managing applications throughout their lifecycle. This includes deployment automation, monitoring, logging, debugging, and performance tuning capabilities.
Some popular application server platforms include:
- Java EE (Enterprise Edition) application servers like Apache Tomcat, WildFly (formerly JBoss), IBM WebSphere, and Oracle WebLogic.
- Microsoft .NET application servers include Internet Information Services (IIS) and Microsoft Azure App Service.
- Open-source platforms like Node.js for JavaScript-based applications, Flask and Django for Python, and Ruby on Rails for Ruby-based applications.
- Database
A database is a structured collection of data organized and stored to allow efficient access, retrieval, manipulation, and management. Databases are the foundation for storing and managing various structured and unstructured data, enabling applications, websites, and systems to efficiently store, retrieve, and process information.
- Data Model: A data model defines the structure of the data stored in the database and the relationships between different data elements. Standard data models include relational, hierarchical, network, and document-oriented models.
- Tables and Records: In a relational database, data is organized into tables, each consisting of rows (also known as records) and columns. Each row represents a single record or entity, while each column represents a specific attribute or field of that record.
- Schema: The schema defines the structure and constraints of the database, including the tables, columns, data types, relationships, and integrity constraints. It serves as a blueprint for organizing and managing the data.
- Query Language: Databases use query languages to interact with and manipulate the data stored within them. SQL (Structured Query Language) is the most widely used query language for relational databases, allowing users to perform tasks such as retrieving data, inserting new records, updating existing records, and deleting records.
- Indexes: Indexes are data structures that optimize retrieval by providing fast access to specific data values within a database table. They improve query performance by reducing the time required to locate and retrieve relevant records.
- Transactions: A transaction is a unit of work performed within a database that must be executed as a single, indivisible operation. Transactions ensure data consistency and integrity by enforcing ACID properties (Atomicity, Consistency, Isolation, Durability) and allowing changes to be either fully committed or fully rolled back in case of failure.
- Concurrency Control: Concurrency control mechanisms manage simultaneous access to the database by multiple users or applications to prevent conflicts and ensure data consistency. Techniques such as locking, multi-version concurrency control (MVCC), and optimistic concurrency control are commonly used to handle concurrent transactions.
- Security: Database security measures protect the confidentiality, integrity, and availability of the data stored in the database. This includes access control mechanisms, encryption, authentication, auditing, and data masking techniques to safeguard sensitive information from unauthorized access, modification, or disclosure.
- Backup and Recovery: Backup and recovery processes are essential for protecting against data loss and ensuring data availability in case of disasters or system failures. Database administrators regularly perform backups of the database and transaction logs and implement strategies for restoring data to a consistent state.
- Scalability and Performance: Databases must be scalable to handle growing volumes of data and increasing numbers of users or transactions. Techniques such as sharding, replication, partitioning, and caching improve scalability and performance.
Common types of databases include:
- Relational Databases: Store data in tables with predefined schemas and support SQL for querying and manipulating data. Examples include MySQL, PostgreSQL, Oracle Database, Microsoft SQL Server, and SQLite.
- NoSQL Databases are designed to handle large volumes of unstructured or semi-structured data and offer flexible schemas. Examples include MongoDB, Cassandra, Redis, Couchbase, and Amazon DynamoDB.
- NewSQL Databases: Combine features of traditional relational databases with scalability and performance characteristics of NoSQL databases. Examples include Google Spanner, CockroachDB, and NuoDB.