How a Browser Works: A Beginner-Friendly Guide to Browser Internals
What Actually Happens Between Typing a URL and Seeing a Webpage
The Question
You type google.com into your browser and press Enter. A second later, a fully styled, interactive page appears on your screen.
What just happened?
That one second hides an incredible amount of work. Your browser fetched files from a server thousands of miles away, parsed cryptic code into a tree structure, calculated where every element should appear, and painted millions of pixels on your screen.
Let's trace this journey from URL to pixels.
What is a Browser, Really?
A browser is not just "something that opens websites." It's a sophisticated piece of software that:
- Fetches files from the internet (HTML, CSS, JavaScript, images)
- Parses those files into data structures it can work with
- Calculates where everything should appear on screen
- Paints the final result as pixels
- Responds to your clicks, scrolls, and typing
Think of a browser as a translation machine. It takes code that humans write and turns it into visuals that humans see.
The Main Parts of a Browser
A browser isn't one monolithic program. It's a collection of specialized components, each handling a specific job.
Component What It Does
| User Interface | Everything you see and interact with — address bar, tabs, back button
| Browser Engine | The coordinator that connects the UI to the rendering engine
| Rendering Engine | Parses HTML and CSS, calculates layout, paints the page
| Networking | Handles HTTP requests, fetches files from the internet
| JavaScript Engine | Executes JavaScript code (V8 in Chrome, SpiderMonkey in Firefox)
| Data Storage | Stores cookies, cached files, and local data
Different browsers use different rendering engines:
- Chrome and Edge use Blink
- Firefox uses Gecko
- Safari uses WebKit
But the overall architecture is similar across all of them.
The Journey: URL to Pixels
Let's follow what happens when you visit a webpage. We'll go step by step.
Step 1: Networking — Fetching the Files
When you press Enter, the browser’s networking component takes over.
- DNS lookup — Translates google.com into an IP address
- TCP connection — Establishes a reliable connection with the server
- HTTP request — Asks the server: “Give me the HTML for this page”
- Response — The server sends back HTML (and later, CSS, JS, images)
The browser receives the HTML file first. This is the starting point — the skeleton of the page.
GET / HTTP/1.1
Host: google.com
← Server responds with HTMLStep 2: HTML Parsing — Building the DOM
The HTML file is just text. The browser needs to convert it into something it can work with.
Parsing is the process of reading text and converting it into a structured format. The browser reads HTML and builds a DOM (Document Object Model) — a tree structure representing every element on the page.
Example HTML:
html
DOCTYPE html> <html> <head> <title>My Pagetitle> head> <body> <h1>Helloh1> <p>Welcome to my site.p> body> html>
Resulting DOM Tree:
The DOM is a tree because HTML is nested. The is inside . The
is inside . This parent-child relationship becomes a tree structure.
Every tag becomes a node. Every piece of text becomes a text node. The browser can now navigate and manipulate this tree.
Step 3: CSS Parsing — Building the CSSOM
While parsing HTML, the browser encounters a tag pointing to a CSS file. The networking component fetches it.
CSS is also just text. The browser parses it into a CSSOM (CSS Object Model) — another tree structure, this time representing styles.
Example CSS:
css
body { font-family: Arial; } h1 { color: blue; font-size: 24px; } p { color: gray; }
Resulting CSSOM Tree:
The CSSOM captures which styles apply to which elements. It also handles inheritance — the h1 inherits font-family: Arial from body.
Step 4: Combining DOM and CSSOM — The Render Tree
Now the browser has two trees:
- DOM — What elements exist
- CSSOM — How elements should look
The browser combines them into a Render Tree. This tree contains only the elements that will actually be displayed, along with their computed styles.
What gets excluded from the render tree:
- Elements with display: none
- The element and its contents
- tags
The render tree answers: "What should appear on screen, and what should it look like?"
Step 5: Layout — Calculating Positions
The render tree knows what to display and how it should look. But it doesn't know where things go.
The layout phase (also called reflow) calculates the exact position and size of every element. The browser walks through the render tree and answers:
- How wide is this element?
- How tall is it?
- Where does it sit relative to other elements?
This depends on many factors:
- Screen size and viewport
- CSS properties like width, margin, padding
- Content inside the element
- Position of sibling and parent elements
Layout is expensive. If something changes — window resize, content added — the browser may need to recalculate positions for many elements.
Step 6: Painting — Drawing Pixels
Now the browser knows what to display, how it should look, and where it goes. Time to actually draw it.
The paint phase converts the layout into actual pixels. The browser draws:
- Backgrounds and borders
- Text
- Images
- Shadows and effects
Painting happens in layers. Elements that overlap or have special effects (like transparency) might be painted on separate layers and then combined.
The final image is sent to your screen. You see the page.
The Complete Flow
Here's the full journey from URL to pixels:
All of this happens in roughly a second. Usually less.
A Quick Look at Parsing
We’ve mentioned “parsing” several times. Let’s demystify it with a simple example.
Parsing means taking text and converting it into a structure the computer can work with.
Consider this math expression:
3 + 5 * 2A human understands this immediately. A computer needs to parse it into a tree that respects operator precedence (multiplication before addition):
The computer evaluates from the leaves up:
- 5 * 2 = 10
- 3 + 10 = 13
HTML parsing works the same way. The browser reads text and builds a tree structure it can navigate and manipulate. The tags, their nesting, and their content all become nodes in the tree.
Browser Engine vs Rendering Engine
You might hear these terms used interchangeably, but they're different:
Browser Engine: The coordinator. It sits between the UI (what you click) and the rendering engine (what displays pages). It passes commands back and forth.
Rendering Engine: The workhorse. It parses HTML and CSS, builds the DOM and CSSOM, calculates layout, and paints pixels. This is where most of the magic happens.
When people say "Chrome uses Blink," they mean Chrome's rendering engine is Blink.
Recap: The Key Steps
Wrapping Up
A browser is a translation pipeline. Code goes in, pixels come out.
The journey from URL to displayed page involves fetching files across the internet, parsing text into tree structures, combining those trees, calculating geometry, and painting millions of pixels. All in under a second.
You don't need to memorize every step. What matters is understanding the flow: fetch → parse → combine → layout → paint. Once you see the pipeline, debugging rendering issues and writing performant code becomes much more intuitive.
0 Comments
Sign in to join the conversation
No comments yet. Be the first to comment!