What happens when you type a URL in your browser and press Enter
In this article I want to go through what really happens under the hood when you type holbertonschool.com and you hit enter in your browser.
This article is inspired by Alex’s github page references below it has a great detailed description of what really happens when you do that thing. I did however add more details at the networking aspects.
Alright so if you’re interested to know what it really happened when you type holbertonschool.com and hit enter stay tuned.
I’m gonna break up this article into eight components to talk through right and we’re gonna go through each component one by one.
I am assuming that I’m hitting holbertonschool.com assuming this is a brand new machine your machine nobody ever opened the browser so Google Chrome will be the first page I ever visit.
That’s ur first step…
Initial Typing
You start typing holbertonschool.com you type H what happened is many things the browser will either start looking for your history and pages that start the letter H in your recent visited history and start showing you autocomplete list or some browser will actually do a search to an index that is local through the locally searched index that is cached some browsers might actually send the request to a server to this default search engine baked into the browser. I’m not a gonna go through any of this I’m gonna go through the first step where you’re listing the visited of the pages that you visit.
Let’s assume that so are you getting a list of the visited pages.
URL Parsing
holbertonschool.com has finished typing it and you’re about to hit enter you didn’t add an https:// you didn’t add anything you just type holbertonschool.com and hit enter.
Now you have holbertonschool.com as a string the browser will start parsing this thing and that’s a question. Is this a URL or s this a search term?
If it’s a search term it actually does a search, if it’s a URL it visits that page.
Let it starts the process to visit the holbertonschool.com page and we’re going through this route we’re going to holbertonschool.com it’s a page so I want to establish a connection with that website and I want to send a get request to that website so that’s the next thing we need to do so step two done okay we know it’s a URL we know it’s a page let’s go ahead and visit it.
Find Protocol
Third step determining which protocol and which port to connect to.
Why do we need to know which protocol?
Will we know it’s a page so it’s either HTTP or HTTPS so this the trick is it HTTP
unencrypted port 80 or is it HTTPS encrypted on port 443 because like the user didn’t tell us it only he only — or she only told us holbertonschool.com it didn’t tell http://holbertonschool.com they’ll be easier for the browser or didn’t say https://holbertonschool.com it says just holbertonschool.com so the browser has to figure out what’s the protocol. To know that there is a protocol called HSTS list stands for HTTP Strict Transport Security and it’s essentially a list that the browser’s keep cached in it’s in a local database and it has the most famous web pages that forces users or clients to communicate only through HTTPS.
So what does what the client does is looks through this list it says hey is holbertonschool.com and HTTPS site or as just an normal HTTP if it found the DNS HSTS as list then it uses HTTPS protocol that means the port 443 if it’s not in te list then it will be forced to use HTTP which is unsecure which means the port is 80. We need to figure out the IP address so we can establish the connection.
DNS Lookup
TO find out the IP address we do a DNS look up. First we ask the operating system, because the domain could be cached, we find that its not. The OS then looks through the hosts file and see if there is a n hardcoded entry , there isnt…
Next the browser check if DoH is enabled DNS over HTTPS if yes then it communicate with the DNS provided (e.g cloud flair and ask for DNS) thats another TLS connection assume we are not using DoH The we establish an insecure UDP request to port 53 on the default DNS Configured on our router that in itself is a connection so we need to send the packet.
TCP connection
We know the IP we know the port! we can now establish a connection, we also know that we should also do TLS since its HTTPS and our client is smart enough to do TLS 1.3 so we will first do 3 way handshake and establish a TCP connection between 10.0.0.2 port random 1234 and 4.1.2.3 port 443
TLS, ALPN, SNI
Assuming I’m using the latest browser so it supports TLS 1.3 and my server also supports TLS 1.3, next is Client Hello. Client generates a public and private key, merges public and private key in DH sends out public and merged keys which cannot be broken in a client hello. It also sends the supported cipher suits (supported for symmetric key algorithms) If TLS extensions are enabled such as ALPN & SNI the client also sends in the same request the host name holbertonschool.com in the TLS client hello along with the fact that it actually supports HTTP2 (this might be different in Chrome since it uses HTTP/2 over UDP or QUIC)
First Request GET/
The client is now ready to send an actual HTTP data, so it builds header GET / since that is what we want to send, puts the hostname in the header and other stuff, checks if there are cookies and puts them, the whole thing is compressed and sent as a binary format. The data is then encrypted with the TLS symmetric key and sent…
The get request is then streamed into the HTTP/2 tcp connection and sent to the server.
HTML Parsing
Make multiple requests css? JS? Multiple streams? If H1 then pipeline?