Transaction Management
6.1 ACID Properties
A transaction is a logical unit of work that must satisfy:
| Property | Meaning |
|---|---|
| Atomicity | All or nothing: either all operations commit or none |
| Consistency | Transforms the database from one consistent state to another |
| Isolation | Concurrent transactions do not interfere with each other |
| Durability | Once committed, the effects survive failures |
6.2 Isolation Levels
SQL defines four isolation levels with increasing strictness:
| Level | Dirty Read | Non-repeatable Read | Phantom Read |
|---|---|---|---|
| Read Uncommitted | Possible | Possible | Possible |
| Read Committed | Prevented | Possible | Possible |
| Repeatable Read | Prevented | Prevented | Possible |
| Serializable | Prevented | Prevented | Prevented |
- Dirty read: Reading uncommitted data from another transaction.
- Non-repeatable read: Same query returns different results within the same transaction due to updates by another transaction.
- Phantom read: Same range query returns new rows due to inserts by another transaction.
Default levels: PostgreSQL: Read Committed. MySQL InnoDB: Repeatable Read. SQL Server: Read Committed. Oracle: Read Committed.
6.3 Serialisability
A schedule is serialisable if its effect is equivalent to some serial schedule (where Transactions execute one after another).
Conflict serialisability. Two operations conflict if they belong to different transactions, Access the same data item, and at least one is a write. A schedule is conflict-serialisable if its precedence graph (also called serialisation graph) is acyclic.
- Precedence graph: Nodes are transactions. Draw a directed edge if an operation of conflicts with and precedes an operation of .
Theorem 6.1. A schedule is conflict-serialisable if and only if its precedence graph is acyclic.
Proof sketch. If the precedence graph has a cycle, no serial ordering can respect all the Precedence constraints, so the schedule is not conflict-serialisable. Conversely, a topological sort Of an acyclic graph gives a serial order equivalent to the schedule.
View serialisability. A schedule is view-serialisable if it is view-equivalent to a Serial schedule . View equivalence requires:
- Initial read: If reads the initial value of in It does so in .
- Updated read: If reads the value of written by in It does so in .
- Final write: If performs the final write of in It does so in .
Every conflict-serialisable schedule is view-serialisable, but the converse does not hold. Testing For view serialisability is NP-complete.
Worked Example 6.1: Testing Conflict Serialisability
Schedule: .
Build precedence graph:
- before : edge .
- before : No — is after .
- before… Nothing comes after on .
- before : edge .
- before… Nothing comes after on .
Precedence graph: (acyclic).
Equivalent serial schedule: (topological order).
Now consider: .
- before : .
- before : .
Precedence graph has cycle: . Not conflict-serialisable.
6.4 Concurrency Control
Two-Phase Locking (2PL).
Protocol:
- Growing phase: Acquire locks; do not release any.
- Shrinking phase: Release locks; do not acquire any.
Theorem 6.2. 2PL guarantees conflict serialisability.
Proof. By the protocol, transaction acquires all its locks before releasing any. If Requests a lock held by , waits. This creates a total ordering on conflicting Operations, which corresponds to a serial schedule.
Variants:
- Strict 2PL: All locks held until commit/abort. Prevents cascading aborts.
- Rigorous 2PL: All locks held until commit/abort (same as strict 2PL for exclusive locks).
Problem with 2PL: Deadlocks. Detection (wait-for graph) or prevention (timestamp ordering).
Timestamp Ordering (TO). Each transaction receives a timestamp . For conflicting Operations:
- reads : if was last written by with Abort .
- writes : if was last read by with Abort .
No deadlocks (no waiting), but may abort transactions unnecessarily.
Optimistic Concurrency Control (OCC). Assumes conflicts are rare. Three phases:
- Read phase: Transaction reads data and writes to private workspace. No locks acquired.
- Validation phase: Before commit, check that no data read by this transaction has been modified by a committed transaction since the read phase began.
- Write phase: If validation passes, apply private writes to the database. Otherwise, abort and restart.
Forward validation: Check against committed transactions. For each data item read by Verify that the transaction that last wrote committed before ‘s read phase began. Backward validation: Check against active transactions.
OCC performs well when conflicts are rare but degrades under high contention (many restarts).
Worked Example 6.2: Optimistic Concurrency Control
Scenario: Two transactions and both read and update account balances.
- : Read Read Write Write (transfer 50 from to ).
- : Read Write (withdraw 25).
Execution:
- read phase: Reads , into private workspace.
- read phase: Reads into private workspace.
- validation phase: Checks if was modified by any committed transaction since started. No committed transactions modified . Validation passes.
- write phase: Writes . commits.
- validation phase: Checks if or was modified by any committed transaction since started. committed and modified . Validation fails.
- is aborted and restarted. On restart, reads , And correctly computes , .
Multiversion Concurrency Control (MVCC). Maintain multiple versions of each data item.
- Read operations read a consistent snapshot (a version visible at the start of the transaction).
- Write operations create a new version; the old version remains for concurrent readers.
- Conflicts between readers and writers are avoided (no read-write contention).
MVCC pseudocode (simplified):
def read(txn, item): version = find_version(item, txn.start_ts) return version.value
def write(txn, item, value): if has_conflicting_write(item, txn.start_ts, txn.id): abort(txn) create_version(item, value, txn.id, txn.start_ts)MVCC implementations:
- PostgreSQL: Snapshot Isolation. Each transaction sees a snapshot of committed data at transaction start. Uses
xmin/xmaxon each tuple. - MySQL InnoDB: MVCC with undo logs. Each row has hidden columns for transaction ID and rollback pointer.
- Oracle: Snapshot Isolation at the statement level by default.
MVCC anomaly: Write Skew. Two concurrent transactions read overlapping data and update Non-overlapping parts based on what they read. Serializable Snapshot Isolation (SSI) in PostgreSQL 9.1+ Detects and prevents this.
6.5 Log-Based Recovery
Write-Ahead Logging (WAL). Before writing a modified page to disk, write the log record describing The modification to stable storage.
WAL protocol:
- Before a data page is flushed to disk, all log records pertaining to that page must be on stable storage.
- Before a transaction commits, all its log records must be on stable storage.
Log records: [LSN, TxnID, PageID, BeforeImage, AfterImage, Commit/Abort].
ARIES (Algorithm for Recovery and Isolation Exploiting Semantics). The standard recovery Algorithm used in IBM DB2, PostgreSQL, SQL Server, and others:
- Analysis phase: Scan the log from the last checkpoint. Identify dirty pages and active transactions. Reconstruct the transaction table (which transactions were active at crash time) and the dirty page table (which pages may not have been written to disk).
- Redo phase: Scan forward from the earliest LSN in the dirty page table. Redo all logged updates (from both committed and uncommitted transactions) to restore the database to the exact state at the time of the crash. A record is redone only if the page’s on-disk LSN is less than the record’s LSN (ensuring idempotency).
- Undo phase: Scan backward from the end of the log. Undo all updates from transactions that were active at crash time (did not commit). Write compensation log records (CLRs) so that undo operations themselves are logged.
Key ARIES properties:
- Idempotent: Recovery can be restarted safely (re-processing log records has no additional effect).
- Steal/no-force: Uncommitted pages can be flushed to disk (steal), and committed pages need not be flushed (no-force). This improves performance at the cost of more recovery work.
- Fine-grained logging: Log records track individual page modifications, enabling precise recovery.
Checkpoints. Periodically flush dirty pages to disk and write a checkpoint record. Reduces the Amount of log to scan during recovery.
// Simplified checkpoint pseudocodevoid checkpoint() { flush_all_dirty_pages(); write_log_record(CHECKPOINT, dirty_page_list, active_txns); flush_log();}6.6 Buffer Management
The buffer pool caches frequently accessed disk pages in memory. Replacement policies include LRU, Clock, and variants. PostgreSQL uses a Clock sweep algorithm; MySQL InnoDB uses an LRU variant with a Midpoint insertion strategy to avoid scan pollution.