How a browser works!

How a browser works!

"How a browser works!" This is an eternal question that comes to every web developer's mind. We all know about this superficially, still to get a deep understanding, requires effort to perform extensive research. In this article, I've tried to put my research in detail to demystify a few things and exploring underneath details of each execution step that a browser performs.

Enters URL. What happens before the Browser takes over the rendering part.

Let's understand what happens when a user enters a URL in the browser and hit enter. The following are the steps that get executed behind the curtain.

1.The browser checks the cache for DNS records to find the corresponding IP address of that URL.

Now this "checking the cache" is an intriguing process. First, it checks for browser cache. In case of unavailability, it looks for OS cache as the operating system also maintains a cache of DNS records. If it is still not available there, it checks the router cache. The browser itself tries to communicate with that router as it keeps its own DNS cache. If all these steps fail, then the browser moves to the ISP (Internet Service Provider) to get the registered IP address. Now you would be wondering why these many levels of caches are maintained. These are necessary to speed up the browsing process as well as regulating network traffic.

2.Once the browser receives the IP address of the entered URL from the ISP, it initiated the TCP connection (Transmission control protocol) with the server.

It's a kind of Three-way handshaking protocol

  1. The client machine sends an SYN (Synchronize) packet to the server and asks for opening the new connection.
  2. If the server opens up the port that can accept and initiate a new connection. The server responds with an acknowledgment (ACK) of the SYN packet using the SYN/ACK packet.
  3. The client receives the SYN/ACK packet and acknowledges by sending the ACK packet to the server. Finally, the server establishes the connection and starts data transmission.
3.The browser sends a GET HTTP(s) request to the server.

The request has a header part, containing browser information, type of request, the origin of the request, etc. as depicted below. It keeps the connection alive for additional requests to come in.

Request header
4.The server sends out the response. 

The server sends out the response containing two parts the header and body. The header part contains the response metadata, e.g. type of response (JSON, Text, HTML, etc.), the longevity of the client-side cached data, cookie information, and more. The body contains the actual response which browser renders for the user. If it is an HTML response, the browser takes over the rendition part from this.

Response header

Functional components of a Browser

Now the browser takes over the control to render the response to the user. The following seven distinct components play a significant role in this rendition process. Let's have a look at those components and their purposes.

No alt text provided for this image
  1. The user interface: It includes the address bar, back/forward button, bookmarking menu, etc. Every part of the browser display except the main window where you see the requested page. 
  2. The browser engine: It does the marshaling of actions between the UI and the rendering engine. 
  3. The rendering engine: This is responsible for displaying the requested content. For example, if the requested content is HTML, it performs the parsing of HTML and CSS and displays the parsed content on to the screen. It is important to note that Chrome, unlike most browsers, holds multiple instances of the rendering engine - one for each tab. Each tab owns a separate process. 
  4. Networking: It manages network calls, like HTTP/HTTPS requests, with the help of the platform-independent interface. 
  5. UI backend: It performs drawing basic widgets like select boxes and alert boxes, etc. It exposes a generic interface that is not platform-specific. Underneath it uses the operating system user interface methods. 
  6. Javascript interpreter: It does parsing and executing the JavaScript code. 
  7. Data storage: This is a persistence layer. The browser needs to save all sorts of data on the hard disk, for example, cookies. The new HTML specification (HTML5) defines 'web database' which is a complete (although light) database in the browser. 

How the Rendering Engine bootstraps the initial work.

Different browsers use various engines. For example, Firefox uses Geoke. Both Safari and Chrome use Webkit. The rendering engine is single-threaded. Also, it shared the same thread as the JS Engine of a browser. This godlike thread is known as the browser’s main thread. Networking occurs in a separate thread. The browser’s main thread uses an event loop to handle asynchrony. It’s an infinite loop that pulls tasks such as layout, painting, and JS execution from a message queue and processes them.

The basic flow of such engines is depicted as follows.

No alt text provided for this image

Though, there is a vivid difference between Webkit and Gecko rendition process.

Gecko rendition flowchart

No alt text provided for this image

Webkit rendition flowchart

No alt text provided for this image

Gecko calls the tree of visually formatted elements a Frame tree. Each element is a frame. Webkit uses the term Render Tree and it consists of Render Objects. Webkit uses the term layout for the placing of elements, while Gecko calls it Reflow. Attachment is Webkit's term for connecting DOM nodes and visual information to create the render tree. A minor non-semantic difference is that Gecko has an extra layer between the HTML and the DOM tree. It is called the Content Sink and is a factory for making DOM elements.  

Let's deep dive into details of each step of execution that how a browser shows a webpage to the user.

1.Initiates the general parsing

General parsing means performing lexical and syntactic analysis based on pre-defined grammars.

Grammar is a precise description of any formal language. That is, it describes what possible sequence of symbols/string constitutes valid words or sentences in that language, but doesn’t describe their semantics. Precisely, there are four types of grammars e.g. Regular grammar, Context-free grammar, Context-sensitive grammar, Unrestricted grammar. In this general parsing, mainly regular grammar is used to develop regular expressions to perform the syntactical checks.

Lexical Analysis, is also known as tokenization, is the process of breaking down raw input into atomic structures/tokens. In parsing, the machine that performs lexical analysis is known as a Lexer.

Syntactic Analysis validates whether the sequence of tokens can form a valid expression or not. It checks both syntaxes and semantics aspects of it. In parsing, the machine that performs Syntactic Analysis is known as a Pars

2.Starts HTML parsing to create DOM tree

The HTML parser is very forgiving. However, this flexibility comes at a cost of the increased grammatical complexity and tougher syntactic analysis. The HTML5 specification officially stated standard error-handling algorithms for HTML parsers to implement. This is good because now, all compliant browsers handle HTML errors in the exact same way. The HTML parser creates a Parse Tree of DOM (Document Object Model) node.

For example, this markup:

<html>
  <body>
    <p>Hello World</p>
    <div> 
      <img src="example.png"/>
    </div> 
  </body>
</html> 

Would be translated to the following DOM tree: 

No alt text provided for this image

The Parse Tree creation algorithm consists of two stages - tokenization and tree construction. Tokenization is the lexical analysis, parsing the input into tokens. Among HTML tokens are start tags, end tags, attribute names, and attribute values.

The tokenizer recognizes the token, gives it to the tree constructor, and consumes the next character for recognizing the next token, and so on until the end of the input. In the common case, the data handled by the tokenization stage comes from the network, but it can also come from the script running in the user agent, e.g. using the document.write() API.

There is only one set of states for the tokenizer stage and the tree construction stage, but the tree construction stage is re-entrant, meaning that while the tree construction stage is handling one token, the tokenizer might be resumed, causing further tokens to be emitted and processed before the first token's processing is complete.

For more details, you could visit this link.

3.Starts CSS parsing to develop CSSOM tree

Unlike HTML parsing, CSS parsing is relatively straightforward. It is a very simple language as Lexicons and Syntaxes are well defined as per W3C specifications. It helps to build a CSSOM tree (kind of CSS parse tree) to validate the syntactic and semantic nature of the CSS code.

For example, this CSS:

p, img {
  background-color: peachpuff;
  color: gray;
}

div {
  font-size: 16px;
  background-color: beige;
}   

would be translated to the following CSSOM tree:

No alt text provided for this image

Each CSSStyleDeclaration Node is a dictionary that has entries for every possible CSS property. For the configured properties, the value is the given value.

4.Developing the Render Tree combining DOM and CSSOM tree

It combines both DOM and CSSOM tree to prepare a third tree named Render Tree. This tree holds visual nodes (i.e. things that will actually appear on the page). These nodes are called Render Objects/Renderers in WebKit (Chrome, Safari, Opera) and Frames in Gecko (Firefox).

Some render objects correspond to a DOM node but not in the same place in the tree. Floats and absolutely positioned elements are out of the flow, placed in a different part of the tree, and mapped to the real frame. Something like as depicted below.

No alt text provided for this image

RenderObject is a parent class of any renderable objects. RenderBox is one of the main subclasses of RenderObject. It represents the CSS Box of each DOM node that obeys the CSS Box Model (not everything obeys the CSS Box Model, such as inline SVGs). It computes the dimensional information based on the box model data including height, width, margin, padding, border, scrollHeight, scrollWidth, offsetLet, offsetTop, etc.

There are various subclasses of RenderBoxRenderInlineRenderBlockRenderListItem, etc. A specific subclass of RenderBox is chosen for a given DOM node based on a few factors, primarily its display value. If a given DOM node has display: inline, it will use a RenderInline box. If it has a display: block, it will use a RenderBlock box, etc.

Gecko and Webkit handle Render Tree construction slightly differently.

Gecko adds a listener to DOM updates. When the DOM updates, the relevant DOM node is passed to a specialized object: FrameConstructor. FrameConstructor computes style information for the DOM node & creates the appropriate Render-Tree Node(s) for the DOM node. Gecko delegates style computation and Render-Tree Node construction to a specialized object.

On the contrary, WebKit takes a “self-service” approach; each DOM node is responsible for computing its own style information & constructing its own Render-Tree Node(s). For WebKit, the process of style computation & Render-Tree Node construction is called attachment. Every DOM node is given a method called attach(), which initiates this process. Attachment is performed in a synchronous manner. Each DOM node calls it's own attach() method upon being inserted into the DOM tree.

5.Starts parsing stylesheets and scripts

Then the stylesheets and scripts are considered for further parsing. (considered that stylesheets and scripts are added in the header section).

Parsing script or stylesheet is a synchronous process unless you are adding async property on the same to process them asynchronously.

Parsing stylesheet is all about reading the file and creating the CSSOM as described above.

Parsing script is an intriguing process. Let's deep dive into this.

A JavaScript code gets executed by two major superficial steps.

  1. Creating an Abstract Syntax Tree for performing both syntactic, semantic check, and determining execution steps.
  2. Maintaining Stack, Heap, and other data structures to execute the code.

Parsing the source code as plain text to a data structure called an Abstract Syntax Tree (AST). Not only do ASTs present the source code in a structured way but they also play a critical role in the semantic analysis where the compiler validates the correctness and proper usage of the program and the language elements. So let’s see how an AST gets built. We have a simple JavaScript function as an example:

function foo(x) {
  if (x > 10) {
    var a = 2;
    return a * x;
  }

  return x + 10;
}

The parser will produce the following AST.

No alt text provided for this image

Note that for visualization purposes, this is a simplified version of what the parser would produce. The actual AST is much more complex. The idea here, however, is to get a feel for what would be the first thing that would happen to the source code before it gets executed.

Please check out this nice link to explore the generated AST for your JavaScript code.

Modern JavaScript parsers use heuristics to determine whether a certain piece of code is going to be executed immediately or its execution will be postponed for some time in the future. Based on these heuristics the parser will do either eager or lazy parsing. Eager parsing runs through the functions that need to be compiled immediately. It does three main things: builds AST, builds scope hierarchy, and finds all syntax errors. Lazy parsing, on the other hand, is used only on functions that don’t need to be compiled yet. It doesn’t build an AST and it doesn’t find all syntax errors. It only builds the scope hierarchy which saves about half the time compared to the eager evaluation. So let’s see an example of how this works. Say we have some JavaScript which has the following code snippet:

function foo() {
  function bar(x) {
    return x + 10;
  }

  function baz(x, y) {
    return x + y;
  }

  console.log(baz(100, 200));
}

Just like in the previous example, the code is fed into the parser which does syntactic analysis and outputs an AST.

Here, you could clearly see the bar() function is unused. The actual parsing takes place when necessary, just before the function is executed. And yes, the lazy parsing still needs to find the whole body of the function and make a declaration for it, but that’s it. It doesn’t need the syntax tree because it’s not going to be processed yet. Plus, it doesn’t allocate memory from the heap which usually takes up a fair amount of system resources. In short, skipping these steps introduces a big performance improvement. So, the parser would actually do something like the following.

No alt text provided for this image

It’s a fairly simple concept but in reality, its implementation is far from being simple. Here we showed one example which is definitely not the only case. The entire method applies to functions, loops, conditionals, objects, etc. Basically, everything that needs to be parsed.

Once both the parsing gets over, it updates the Render Tree accordingly.

6.Starts painting Render Tree

Now the browser has everything before it starts painting the final Render Tree. Painting can either be global or local. Global scope is like changing all font-family or font-size, resizing the screen, etc. On the contrary, the local scope is evaluating singular or multiple child DOM nodes. Regions of the content are flagged as dirty to enable local paints. Then, the rendering engine invokes the UI Backend component of the browser to actually repaint the dirty regions. Recall that the UI Backend relies on the host OS’s API to paint content onto the screen.

Render objects have many layers on the z-axis. Their painting order (from back to front) is:

  1. Background Color
  2. Background Image
  3. Border
  4. Children Render Objects
  5. Outline

Please note that the local painting is performed asynchronously by the single-threaded rendering engine though Global painting is almost always synchronous.

7.The User finally can see Web-page for the requested URL

In the end, the user could see the web page, coming from the server for the given URL.

These articles are primarily for my future reference, but perhaps someone out there might gain something from these notes, so that’s why I publish them.

I hope you enjoy it!

References:

要查看或添加评论,请登录

Amit Pal的更多文章

社区洞察

其他会员也浏览了