Understand the architecture of the PostgreSQL database.

Published on February 2, 20268 min read
Understand the architecture of the PostgreSQL database.

Most developers can write SQL. Most know what a table is.

Yet, many still treat databases as magical black boxes.

“I send a query, data comes back. Done.”

But the line between an average developer and a strong backend engineer is drawn here — at truly understanding how the database works.

This isn’t about memorizing commands. It’s about building a mental model of what happens inside.

Let’s walk through PostgreSQL not as a list of features, but as a story your data lives through — from your app to the disk and back.


First: What Problem Is PostgreSQL Trying to Solve?

Before terms like pages, MVCC, or WAL, let’s answer the core question:

What must a database actually do?

  • Accept requests from many users at once

  • Store data safely on disk

  • Let people read and write simultaneously without breaking things

  • Never lose data, even if the system crashes

Everything in PostgreSQL exists only to solve these four problems.


Step 1: How Your App Talks to PostgreSQL (The Client–Server Model)

Think of PostgreSQL as a restaurant kitchen.

  • Your app is the customer

  • PostgreSQL is the kitchen

You don’t walk into the kitchen. You send an order — in database terms, a SQL query.

When PostgreSQL starts:

  • A main controller process (the postmaster) starts up

  • It prepares the system to accept connections

Each time a client connects:

  • PostgreSQL creates a separate worker process just for that client

This means your query doesn’t mix with others. Every connection is isolated and safe.

This is the foundation of concurrency and reliability.


Step 2: Where Does PostgreSQL Keep Data?

All PostgreSQL data lives inside one main folder on disk called the data directory (or data cluster).

Inside are:

  • Databases

  • Tables

  • Indexes

  • System info

  • Recovery logs

Imagine it like a warehouse:

  • Rooms = databases

  • Shelves = tables

  • Labels and maps = indexes

Important idea:

  • The query engine decides what to do

  • The storage engine decides how to save it

This separation keeps PostgreSQL stable and crash-resistant.


Step 3: How Tables Are Stored on Disk (Pages First, Not Rows)

Beginners think: “A table stores rows.”

PostgreSQL thinks differently.

Disk reality: Disks don’t understand rows. They read and write in fixed-size blocks.

So PostgreSQL divides disk space into pages — usually 8 KB each.

A table is not one big file. It’s a collection of pages.


Step 4: From Pages to Rows (Tuples)

Inside each page:

  • Multiple rows are stored

PostgreSQL calls a row a tuple.

A tuple contains:

  • Your actual data (columns)

  • Hidden system data

This hidden data includes:

  • Which transaction created the row

  • Whether it was updated or deleted

  • Visibility information

Why store this?

Because PostgreSQL needs to answer:

“Should this user see this row right now?”

This becomes crucial for handling many users at once.


Step 4.1 — A Table Is NOT a List of Rows

Beginner assumption:

“A table is a list of rows.”

Reality:

A table is a collection of pages.

Think like this:

TABLE: users

├── Page 1 (8 KB)

├── Page 2 (8 KB)

├── Page 3 (8 KB)

PostgreSQL calls this collection a heap file.

Why “heap”?

  • rows are added wherever there is space

  • no fixed order

  • no sorting by default


Step 4.2 — What Is a Tuple? (Finally the Row)

Inside one page, PostgreSQL stores multiple rows.

PostgreSQL does not call them rows.

It calls them tuples.

Page (8 KB)

├── Tuple 1

├── Tuple 2

├── Tuple 3

A tuple = one row

Same thing. Different name.


Step 4.3 — A Tuple Is NOT Just Your Data

Here is the critical part.

You think a row looks like this:

(id=1, name="Ram", age=20)

PostgreSQL stores more than that.

Every tuple contains:

1️⃣ Your actual data

  • id

  • name

  • age

2️⃣ Hidden system information

  • Created by transaction: 120

  • Deleted by transaction: NULL

  • Visibility information

PostgreSQL secretly adds:

  • who created this row

  • whether it was deleted

  • visibility info

You never see this extra info — but PostgreSQL uses it all the time.


Step 4.5 — WHY Does PostgreSQL Store This Extra Info?

Because many users work at the same time.

Example:

  • You are reading data

  • Someone else updates the same row

PostgreSQL must answer:

“Should YOU see this row or not?”

That decision is made using:

  • creation transaction ID

  • deletion transaction ID

This is the foundation of MVCC, which comes later.


Step 5: What If a Row Is Too Big? (TOAST)

PostgreSQL has a rule: One row(tuple) must fit inside one page.

But real apps store:

  • Large JSON

  • Long text

  • Files

So PostgreSQL uses TOAST (The Oversized-Attribute Storage Technique).

Simply put:

If a row is too big:

  • PostgreSQL tries to compress it

  • If it’s still too large, big parts are moved elsewhere

The main table keeps a small pointer.

The big data lives in a separate hidden table.

Result:

  • Normal queries stay fast

  • Large data is loaded only when needed

That’s why PostgreSQL handles JSON so well.


Step 6: The Biggest Problem — Many Users at Once

Now the core challenge:

Imagine:

  • User A is reading data

  • User B updates the same data

  • User C inserts new data

All at the same time.

Old databases used locks:

  • Readers block writers

  • Writers block readers

This slows everything down.

PostgreSQL solves this with MVCC.


Step 7: MVCC Explained Like a Story

MVCC = Multi-Version Concurrency Control

Golden rule: PostgreSQL never overwrites a row.

Instead:

  • Updates create a new version

  • Old versions stay temporarily

Simple example:

  • Transaction A reads a row

  • Transaction B updates the same row

  • PostgreSQL creates a new row version

  • Transaction A still sees the old version

  • Transaction B sees the new one

No waiting. No blocking. No fighting.

Each transaction sees a snapshot of the database at a specific point in time.


Step 8: How PostgreSQL Knows What You Can See

Every transaction gets a snapshot time.

Rules:

  • You see only data committed before your transaction started

  • You never see half-finished work

PostgreSQL checks those hidden tuple fields to decide:

  • Who created the row

  • Whether it was replaced

  • Whether it’s visible to you

That’s why that extra hidden data exists.


Step 9: Why VACUUM Exists (Cleaning Old Versions)

Because PostgreSQL keeps old versions:

  • Disk fills with dead rows

So PostgreSQL runs autovacuum:

  • Removes dead row versions

  • Frees disk space

  • Keeps performance healthy

MVCC gives speed and safety.

VACUUM keeps the system clean.


Step 10: What Happens If PostgreSQL Crashes? (Write-Ahead Logging)

Now we reach durability — the promise that once PostgreSQL says a transaction is committed, the data will never be lost, even if something goes wrong.

The Problem PostgreSQL Faces

  • Writing full data files directly to disk is slow

  • Changes are first made in memory (RAM) for speed

  • RAM is fast but disappears if PostgreSQL crashes or the server reboots

So the challenge is:

How can PostgreSQL make sure data is safe before it actually hits the disk?


The Solution: Write-Ahead Logging (WAL)

During a transaction, PostgreSQL writes the change to the Write-Ahead Log (WAL) and immediately flushes it to disk.

  • WAL is sequential and small, making it fast to write even though it’s on disk

  • Flushing ensures the WAL is safely on physical disk, not just in memory

Only after this does PostgreSQL consider the transaction committed.

This guarantees that even if PostgreSQL crashes immediately, the data is safe and can be fully restored.


Why WAL Survives a Crash

If PostgreSQL crashes:

  • RAM is lost — any in-memory data pages disappear

  • Data files may be inconsistent — pages on disk might not yet have the new changes

  • WAL is intact on disk — all committed changes are safely stored


How Recovery Works

When PostgreSQL starts again after a crash:

  • It finds the most recent checkpoint (a known consistent state of data pages)

  • It looks at all WAL entries written after the checkpoint

  • It replays those changes to restore data pages

Result: the database is back to a consistent state, exactly as it was before the crash. Nothing is guessed or lost.


The Guarantee

PostgreSQL will never report a transaction as committed until:

  • the WAL entry is safely written to disk

In short:

If PostgreSQL says COMMIT, the data is safe — even if the database crashes immediately after.

Remind you that those WAL logs are just the information i.e. transactions going to happen, nothing more, just the sequential writes, but eventually the whole data needs to be stored on the disk. That’s where the concept of Lazy writing comes into existence.


Lazy Writing and Random Disk Writes

When PostgreSQL eventually writes data from memory to table files, the writes are random, not sequential.

This is because:

  • rows can be inserted or updated on any data page

  • each page is stored at a different location on disk

  • flushing these pages requires random disk access

Random writes are slow and expensive.

By delaying these writes and performing them in batches:

  • PostgreSQL reduces disk pressure

  • improves overall performance

  • relies on WAL to keep changes safe in the meantime


Final Big Picture (For Beginners)

Now everything connects:

  • Clients send SQL → PostgreSQL processes queries

  • Data is stored in pages → Rows are versioned, not overwritten

  • MVCC enables concurrency → Many users work simultaneously

  • WAL guarantees durability → Data survives crashes

  • VACUUM cleans old data → System stays efficient

  • Checkpoints speed up recovery → Balance between safety and performance

PostgreSQL is not just a database. It is:

  • A concurrency engine

  • A transaction system

  • A recovery machine

  • A modern data platform

If you understand this flow, you don’t just use PostgreSQL — you understand it.

And that’s what separates a developer from an engineer.

Jeevan KC

Jeevan KC

Software developer and blogger passionate about web technologies, databases, and system design. Sharing knowledge to help others grow in their tech journey.