Lecture 1
The Web from First Principles
WWW 25/26 HTTP · Sockets · Protocols
Course Overview
13 labs building a full web stack from scratch:
| Labs 1–3 | HTML, CSS, raw HTTP |
|---|---|
| Labs 4–8 | Django (Python framework) |
| Labs 9 | REST APIs |
| Labs 10–13 | TypeScript + full-stack project |
Philosophy: Build it by hand first. Then you’ll know what frameworks are saving you from.
What Happens When You Type a URL?
https://en.wikipedia.org/wiki/HTTP
- Browser asks DNS: what is the IP of
en.wikipedia.org? - DNS replies:
185.15.58.224 - Browser opens a TCP connection to that IP, port 443
- Browser sends an HTTP request (text)
- Server sends back an HTTP response (text + bytes)
- Browser parses the response and renders the page
Every single step is a protocol. Today we focus on steps 3–5.
The Internet vs the Web
The web is to the internet as email is to the postal system.
Clients and Servers
CLIENT SERVER
┌──────────┐ HTTP request ┌──────────────┐
│ Browser │ ──────────────────▶│ Web Server │
│ curl │ │ (your code) │
│ mobile │ ◀──────────────────│ │
└──────────┘ HTTP response └──────────────┘
- A client initiates connections and makes requests.
- A server listens, accepts connections, and sends responses.
- The same machine can be both (e.g. your laptop in development).
- Roles are per-connection, not per-machine.
IP Addresses
Every device on the internet has a numeric address.
| Version | Format | Example |
|---|---|---|
| IPv4 | 4 bytes, dotted decimal | 93.184.216.34 |
| IPv6 | 16 bytes, hex | 2606:2800:220:1:248:1893:25c8:1946 |
Special addresses:
127.0.0.1/localhost— your own machine (loopback)0.0.0.0— “all interfaces” (listen on everything)
Ports
IP addresses identify machines. Ports identify services on a machine.
93.184.216.34 : 443
▲ ▲
machine service
| Port | Service |
|---|---|
| 80 | HTTP |
| 443 | HTTPS |
| 22 | SSH |
| 5432 | PostgreSQL |
| 8000 | Django dev server (convention) |
Ports 0–1023 are “well-known”; using them requires root. Use 1024+ in dev.
TCP — Transmission Control Protocol
TCP is the transport layer under HTTP.
Guarantees:
- Data arrives in order
- Data arrives complete (retransmits on loss)
- Both sides know if the connection breaks
Cost: Connection setup takes a round trip (the 3-way handshake).
Client Server
│── SYN ──────────▶│
│◀──── SYN-ACK ────│
│── ACK ──────────▶│
│ │
│ (data flows) │
What is a Protocol?
A protocol is an agreement between two parties on:
- The format of messages
- The order of messages
- What to do when things go wrong
Protocols are just text documents (RFCs — Request for Comments).
HTTP is defined in:
- RFC 7230–7235 (HTTP/1.1, 2014)
- RFC 9110–9114 (HTTP semantics, 2022)
You can read them at rfc-editor.org — they are public and free.
HTTP — HyperText Transfer Protocol
HTTP is an application-layer protocol on top of TCP.
Key design decisions:
- Text-based — human-readable (intentionally)
- Stateless — each request is independent; the server forgets you
- Request/Response — one request, one response
- Extensible — headers let you add metadata without changing the format
HTTP/1.1 (1997) is what you’ll implement in Lab 1. HTTP/2 and HTTP/3 add performance, but same semantics.
HTTP is Just Text
Open a TCP connection and type:
GET / HTTP/1.1
Host: example.com
(blank line ends the headers)
The server replies:
HTTP/1.1 200 OK
Content-Type: text/html
Content-Length: 1256
<!doctype html>
<html>...
That’s it. Everything else is built on top of this.
The HTTP Request
GET /search?q=http HTTP/1.1 ← Request Line
Host: www.google.com ─┐
User-Agent: Mozilla/5.0 │ Headers
Accept: text/html ─┘
← Blank line (end of headers)
← Body (empty for GET)
The Request Line has three parts:
- Method — what action to perform
- Path — which resource
- Version — HTTP/1.1
HTTP Request Methods
| Method | Meaning | Has Body? |
|---|---|---|
GET |
Fetch a resource | No |
POST |
Submit data / create | Yes |
PUT |
Replace a resource | Yes |
PATCH |
Partially update | Yes |
DELETE |
Remove a resource | No |
HEAD |
Like GET but no body | No |
OPTIONS |
Ask what’s allowed | No |
Rule of thumb: GET should never change server state. It must be safe and idempotent.
The HTTP Response
HTTP/1.1 200 OK ← Status Line
Content-Type: text/html; charset=utf-8 ─┐
Content-Length: 2048 │ Headers
Date: Thu, 01 Jan 2026 12:00:00 GMT ─┘
← Blank line
<!doctype html> ─┐
<html> │ Body
<body>Hello!</body> │
</html> ─┘
The Status Line has three parts:
- Version — HTTP/1.1
- Status Code — three-digit number
- Reason Phrase — human-readable description
HTTP Status Codes
Status codes are grouped by their first digit:
| Range | Category | Meaning |
|---|---|---|
| 1xx | Informational | Request received, continuing |
| 2xx | Success | Request fulfilled |
| 3xx | Redirection | Do something else |
| 4xx | Client Error | You made a mistake |
| 5xx | Server Error | Server made a mistake |
Common Status Codes
| Code | Name | When |
|---|---|---|
| 200 | OK | Normal success |
| 201 | Created | POST that created a resource |
| 204 | No Content | Success, no body (DELETE) |
| 301 | Moved Permanently | URL changed forever |
| 302 | Found | Temporary redirect |
| 400 | Bad Request | Malformed request |
| 401 | Unauthorized | Not logged in |
| 403 | Forbidden | Logged in but not allowed |
| 404 | Not Found | Resource doesn’t exist |
| 500 | Internal Server Error | Bug in your code |
HTTP Headers
Headers are key: value metadata attached to requests and responses.
Common Request Headers:
Host: example.com (required in HTTP/1.1)
Accept: text/html, */*
Accept-Language: en-US
Cookie: sessionid=abc123
Authorization: Bearer <token>
Content-Type: application/json
Content-Length: 42
Common Response Headers:
Content-Type: text/html; charset=utf-8
Content-Length: 1024
Set-Cookie: sessionid=abc123; HttpOnly
Location: /new-url (used with 3xx)
Cache-Control: max-age=3600
The URL Structure
https://example.com:8080/path/to/page?key=val&x=1#section
│ │ │ │ │ │
scheme host port path query fragment
- Scheme — which protocol (
http,https,ftp) - Host — domain name or IP address
- Port — optional (defaults: 80 for http, 443 for https)
- Path — which resource on the server
- Query String — optional parameters (after
?) - Fragment — client-side anchor (never sent to server)
DNS — Domain Name System
Humans remember names. Computers use numbers.
www.example.com ──DNS query──▶ 93.184.216.34
DNS is a distributed database replicated across thousands of servers.
Resolution order:
- Browser cache
- OS cache (
/etc/hosts) - Local resolver (your router / ISP)
- Recursive resolvers → Root → TLD → Authoritative nameserver
$ nslookup example.com
$ dig example.com
HTTPS — HTTP Secure
HTTPS = HTTP + TLS (Transport Layer Security)
Client Server
│──── TCP handshake ────────│
│──── TLS handshake ────────│ (certificate, key exchange)
│ │
│ HTTP request (encrypted) │
│──────────────────────────▶│
│ HTTP response (encrypted)│
│◀──────────────────────────│
Everything inside TLS is encrypted — URLs, headers, body, cookies.
In Lab 1 we use plain HTTP on localhost. In production, always use HTTPS.
HTTP is Stateless
Each HTTP request is completely independent. The server has no memory of previous requests.
Request 1: GET /page1 → Server: "Who are you? Don't care, here's page1"
Request 2: GET /page2 → Server: "Who are you? Don't care, here's page2"
Problem: How does a site know you’re logged in?
Solution: The client sends proof of identity with every request.
- Cookies — browser stores a token, sends it automatically
- Bearer tokens — JS code sends a token in the
Authorizationheader
Python Sockets
A socket is an OS abstraction for a network connection.
import socket
# AF_INET = IPv4
# SOCK_STREAM = TCP (stream of bytes, reliable)
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
Every network library — requests, urllib, Django, Flask — uses sockets under the hood. You’re cutting out the middlemen.
# SO_REUSEADDR: allow reuse of port immediately after crash
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
Binding and Listening
# Bind: claim a port on this machine
s.bind(('localhost', 8000))
# Listen: enter a state of waiting for connections
# The argument (5) is the backlog — how many connections
# to queue before refusing new ones
s.listen(5)
print("Server running on http://localhost:8000")
After listen(), the socket is a passive socket — it only accepts connections, it doesn’t send or receive data directly.
Accepting Connections
while True:
# accept() blocks until a client connects
# Returns a NEW socket for this specific connection
conn, addr = s.accept()
print(f"Connection from {addr}")
# conn is a fresh socket just for this client
# s is still listening for more connections
data = conn.recv(4096) # receive up to 4096 bytes
print(data.decode('utf-8'))
conn.close()
s.accept() → waits
conn.recv(n) → waits for up to n bytes from this client
Receiving and Sending Data
# Receiving
raw_bytes = conn.recv(4096)
text = raw_bytes.decode('utf-8')
# Sending
response = "HTTP/1.1 200 OK\r\n\r\nHello!"
conn.sendall(response.encode('utf-8'))
# IMPORTANT: close the connection when done
conn.close()
recv() vs recvall():
There is no recvall() — recv() may return less than you asked for. For production you’d loop. In Lab 1, one recv(4096) is enough for typical requests.
Parsing an HTTP Request
def parse_request(raw):
if not raw:
return "GET", "/", {}
lines = raw.split("\r\n")
# First line: "GET /path HTTP/1.1"
method, path, _ = lines[0].split(" ", 2)
headers = {}
for line in lines[1:]:
if ": " in line:
key, val = line.split(": ", 1)
headers[key] = val
else:
break # blank line = end of headers
return method, path, headers
The body follows after the blank line (\r\n\r\n).
Building an HTTP Response
def make_response(body, status="200 OK",
content_type="text/html; charset=utf-8"):
body_bytes = body.encode('utf-8')
response = f"HTTP/1.1 {status}\r\n"
response += f"Content-Type: {content_type}\r\n"
response += f"Content-Length: {len(body_bytes)}\r\n"
response += "\r\n" # mandatory blank line
return response.encode('utf-8') + body_bytes
Content-Length matters. Without it, the browser doesn’t know when the response ends and may hang waiting for more data — or truncate early.
A Minimal Working Server
import socket
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
s.bind(('localhost', 8000))
s.listen(5)
while True:
conn, _ = s.accept()
data = conn.recv(4096).decode('utf-8')
_, path, _ = data.split("\r\n")[0].split(" ")
if path == "/":
body = "<h1>Hello from raw Python!</h1>"
else:
body = "<h1>404 Not Found</h1>"
status = "200 OK" if path == "/" else "404 Not Found"
conn.sendall(make_response(body, status))
conn.close()
Testing with Netcat
Netcat (nc) is a raw TCP client — no HTTP magic, no retries.
$ nc localhost 8000
GET / HTTP/1.1
Host: localhost
(type the blank line and press Enter)
You see exactly what the server sends back — raw bytes.
Why not a browser? Browsers automatically add headers, retry, cache, and follow redirects. Netcat is transparent. Use it to debug when browsers hide the problem.
The Visitor Counter Problem
VISITORS = 0
# In the request loop:
path = parse_path(request_data)
if path == "/":
VISITORS += 1
body = f"<p>Visited {VISITORS} times</p>"
Bug: Browsers automatically request /favicon.ico on every page load.
First visit → GET / → count = 1
GET /favicon.ico → count = 2 ← wrong!
Fix: Only increment when path == "/".
The Single-Threaded Problem
while True:
conn, _ = s.accept()
# --- everything below blocks the loop ---
data = conn.recv(4096)
time.sleep(5) # simulating slow work
conn.sendall(make_response("Done"))
conn.close()
# --- only now can the next client connect ---
Open two browser tabs simultaneously. The second tab waits for the first to finish.
This is fine for a single developer. In production, a server handles thousands of simultaneous clients.
Threads to the Rescue
import threading
def handle_client(conn):
data = conn.recv(4096).decode('utf-8')
time.sleep(5)
conn.sendall(make_response("Done"))
conn.close()
while True:
conn, addr = s.accept()
t = threading.Thread(target=handle_client, args=(conn,))
t.start()
# accept() is called again immediately
Each client gets its own thread. The main loop just dispatches.
Race Conditions
VISITORS = 0 # shared mutable state
def handle_client(conn):
global VISITORS
# Thread A reads VISITORS = 5
# Thread B reads VISITORS = 5
# Thread A writes VISITORS = 6
# Thread B writes VISITORS = 6 ← lost an increment!
VISITORS += 1
Two threads incrementing the same variable can lose updates.
Fix: Use a lock.
lock = threading.Lock()
with lock:
VISITORS += 1
Locks
import threading
VISITORS = 0
lock = threading.Lock()
def handle_client(conn):
global VISITORS
with lock:
VISITORS += 1
count = VISITORS
conn.sendall(make_response(f"<p>Visitor #{count}</p>"))
conn.close()
with lock: — only one thread can be inside this block at a time. Others wait.
This is the foundation of concurrent programming. Frameworks handle most of this for you automatically.
What Frameworks Actually Do
You have now experienced:
| Problem | Framework solution |
|---|---|
| Parse HTTP request | Done for you |
| Route paths to functions | URL dispatcher |
| Build HTTP responses | HttpResponse / render() |
| Serve static files | Static file middleware |
| Handle threading | WSGI / ASGI worker pool |
| Prevent race conditions | ORM + database transactions |
Frameworks don’t hide complexity. They relocate it to battle-tested code.
HTTP/2 and HTTP/3 (Brief Overview)
HTTP/1.1 — one request per TCP connection (at a time)
HTTP/2 (2015):
- Binary framing instead of plain text
- Multiplexing — multiple requests over one TCP connection
- Header compression (HPACK)
- Server Push
HTTP/3 (2022):
- Runs over QUIC (UDP-based) instead of TCP
- Faster connection setup
- No head-of-line blocking at transport level
All three use the same HTTP semantics (methods, status codes, headers).
Tools You’ll Use
| Tool | Purpose |
|---|---|
nc / ncat |
Raw TCP client for manual testing |
curl |
HTTP client for scripting/testing |
| Browser DevTools → Network | See all real HTTP traffic |
python manage.py runserver |
Django dev server |
| Postman / Insomnia | GUI HTTP client |
# curl examples
curl http://localhost:8000/
curl -v http://localhost:8000/ # verbose: see headers
curl -X POST -d '{"x":1}' \
-H "Content-Type: application/json" \
http://localhost:8000/api/
Browser DevTools — Network Tab
Open any website → F12 → Network tab → Reload
You can see every request:
- Status code
- Method and URL
- Request headers (what your browser sent)
- Response headers (what the server returned)
- Response body
- Timing breakdown (DNS, TCP, TLS, TTFB, download)
Try it: How many requests does a single page load trigger? What’s the biggest file?
Summary
You type a URL
↓
DNS resolves the hostname to an IP
↓
TCP connection opened (3-way handshake)
↓
Browser sends HTTP request (text)
↓
Server parses request → runs your code
↓
Server sends HTTP response (headers + body)
↓
Browser renders the response
HTTP is just formatted text over a TCP socket. You can implement a compliant server in ~50 lines of Python.
Lab 1 Preview
What you’ll build:
- A raw socket server in Python
- Parses incoming HTTP requests
- Serves different content based on the URL path
- Counts visitors correctly (ignoring
/favicon.ico) - Handles multiple simultaneous clients with threads
Key milestones:
- TCP connection + see raw request
- Parse path from request line
- Build valid HTTP response
- Visitor counter (no double-counting)
- Thread-safe concurrent handling
Run with: uv run server.py
Questions?
Next lecture: HTML — giving structure to the content your server will serve.
Lab 1 is available now. Try to get through at least Phases 1–3 before the next session.
curl -LsSf https://astral.sh/uv/install.sh | sh # install uv