Skip to content

Operating Systems

1. Introduction to Operating Systems

1.1 Role of the Operating System

An operating system (OS) is system software that manages hardware resources and provides services To application programs. It serves as an intermediary between users and the underlying hardware.

Core responsibilities:

  • Process management: Create, schedule, and terminate processes.
  • Memory management: Allocate and protect memory; implement virtual memory.
  • File system management: Organise, store, retrieve, and protect data.
  • I/O management: Control and mediate access to I/O devices.
  • Protection and security: Enforce access controls and resource isolation.

1.2 Kernel Architectures

Monolithic Kernel. All OS services (file system, device drivers, network stack, scheduler) run In a single address space in kernel mode. Examples: Linux, FreeBSD.

  • Advantage: Low overhead from function-call latency.
  • Disadvantage: A bug in any component can crash the entire system.

Microkernel. Only essential services (IPC, scheduling, basic memory management) run in kernel Mode. Other services run in user mode as separate processes. Examples: MINIX 3, seL4, QNX.

  • Advantage: Fault isolation; easier to verify formally.
  • Disadvantage: Higher overhead from message passing between user-mode servers.

Hybrid Kernel. A pragmatic compromise combining monolithic and microkernel ideas. Some Non-essential services run in kernel mode for performance, but the architecture is more modular Than a pure monolith. Examples: Windows NT, macOS XNU.

1.3 System Calls

System calls provide the interface between user-mode applications and kernel-mode OS services. They Are invoked via software interrupts (e.g., syscall on x86-64, svc on ARM).

Categories:

CategoryExamples
Process controlfork``exec``wait``exit
File managementopen``read``write``close
Device I/Oioctl``read``write
Communicationpipe``shmget``mmap``socket
Informationgetpid``stat``sysconf
Protectionchmod``chown``setuid

System call overhead. A transition from user mode to kernel mode involves saving user Registers, switching to the kernel stack, validating arguments, executing kernel code, and Returning. Typical overhead: 1001000100\mathrm{--1000} ns on modern hardware.

2. Process Management

2.1 Process Concept

A process is an instance of a program in execution. The OS maintains a process control block (PCB) for each process:

FieldDescription
PIDUnique process identifier
Process stateRunning, ready, blocked, etc.
Program counterAddress of next instruction
RegistersCPU register contents
Memory limitsBase/limit registers or page table pointer
Open filesList of open file descriptors
I/O statusI/O devices allocated and their status
Scheduling infoPriority, scheduling queue, CPU usage

Process states. Processes transition through states:

NewReadyRunningTerminated\mathrm{New} \to \mathrm{Ready} \to \mathrm{Running} \to \mathrm{Terminated}

RunningBlockedReady\mathrm{Running} \to \mathrm{Blocked} \to \mathrm{Ready}

The scheduler dispatches processes from ready to running. A running process may be preempted back to ready, or may block on I/O or a synchronisation event.

2.2 Threads

A thread is the fundamental unit of CPU execution. Multiple threads within a process share the Same address space, open files, and other resources, but each has its own program counter, Registers, and stack.

User-level threads. Managed entirely by a user-space library (e.g., POSIX pthread). The OS is Unaware of them.

  • Advantage: Fast creation and switching (no kernel involvement).
  • Disadvantage: One blocking system call blocks all threads in the process.

Kernel-level threads. Managed by the OS kernel (e.g., Linux clone()Windows threads).

  • Advantage: True parallelism on multiprocessor systems; blocking one thread does not block others.
  • Disadvantage: Slower creation and context switch (kernel trap required).

Thread models:

ModelUser:KernelCharacteristics
Many-to-onem:1m:1All user threads map to one kernel thread
One-to-one1:11:1Each user thread maps to a kernel thread
Many-to-manym:nm:nUser threads multiplexed onto kernel threads

Linux uses a 1:11:1 model by default.

2.3 CPU Scheduling

The CPU scheduler decides which process runs next from the set of ready processes.

Scheduling criteria:

  • CPU utilisation: Keep the CPU busy (utilisation=1p\mathrm{utilisation} = 1 - p where pp is the I/O wait probability).
  • Throughput: Number of processes completed per time unit.
  • Turnaround time: Tturnaround=TcompletionTarrivalT_{\mathrm{turnaround} = T_{\mathrm{completion} - T_{\mathrm{arrival}}}}.
  • Waiting time: Time spent in the ready queue.
  • Response time: Time from submission to first response.

2.4 Scheduling Algorithms

First-Come, First-Served (FCFS). Non-preemptive. Processes scheduled in arrival order. Suffers From the convoy effect: short processes waiting behind a long process.

Shortest Job First (SJF). Non-preemptive. Schedule the process with the shortest next CPU burst. Provably optimal for average waiting time. Requires burst length estimation via exponential Averaging: τn+1=αtn+(1α)τn\tau_{n+1} = \alpha t_n + (1 - \alpha) \tau_n.

Shortest Remaining Time First (SRTF). Preemptive SJF. If a new process arrives with a shorter Remaining burst, preempt the current process.

Round Robin (RR). Each process gets a time quantum qq. If not finished within qqPreempted and Placed at the back of the ready queue. If qq is large, RR degenerates to FCFS. Typical qq: 1010010\mathrm{--100} ms.

Priority Scheduling. Highest-priority ready process runs. Can be preemptive or non-preemptive. Risk of starvation; solved by aging (gradually increase priority of waiting processes).

Multilevel Feedback Queue (MLFQ). Partition the ready queue into multiple priority levels. Processes that use too much CPU are demoted; processes that wait are promoted. The most general and Widely used approach. Linux CFS uses a variant with red-black trees keyed on virtual runtime.

Theorem 2.1. SJF gives the minimum average waiting time for a given set of processes.

Proof. Consider any schedule that is not SJF. There exist consecutive processes AA and BB where AA has a longer burst than BB but is scheduled first. Swapping AA and BB reduces the waiting Time of BB by the burst time of AA and increases the waiting time of AA by the burst time of BB. Since BB“s burst is shorter, the net change reduces the average. \blacksquare

Worked Example 2.1 — FCFS vs SJF Comparison

Consider three processes, all arriving at time t=0t = 0:

ProcessBurst Time
P1P_124
P2P_23
P3P_33

FCFS Gantt chart:

| P1 (24) | P2 (3) | P3 (3) |
0 24 27 30
ProcessWaiting TimeTurnaround Time
P1P_1024
P2P_22427
P3P_32730
Avg1727

SJF Gantt chart:

| P2 (3) | P3 (3) | P1 (24) |
0 3 6 30
ProcessWaiting TimeTurnaround Time
P1P_1630
P2P_203
P3P_336
Avg313

SJF reduces average waiting time from 17 to 3, illustrating the convoy effect in FCFS.

Solution — Round Robin with $q = 4$

Using the same three processes with quantum q=4q = 4:

| P1(4) | P2(3) | P3(3) | P1(4) | P1(4) | P1(4) | P1(4) | P1(1) |
0 4 7 10 14 18 22 26 27
ProcessCompletionTurnaroundWaiting
P1P_127273
P2P_2774
P3P_310107
Avg14.674.67

Round Robin eliminates the convoy effect but gives a higher average waiting time than SJF due to Preemption overhead. The turnaround time for P1P_1 is unchanged (the work must be done), but P2P_2 And P3P_3 receive faster first response.

2.5 Scheduling Algorithm Pseudocode

Round Robin Scheduling:

#define MAX_PROCESSES 128
typedef struct {
int pid;
int burst_remaining;
int arrival_time;
int priority;
} Process;
void round_robin(Process processes[], int n, int quantum) {
Queue ready_queue;
int current_time = 0;
int completed = 0;
while (completed < n) {
for (int i = 0; i < n; i++) {
if (processes[i].arrival_time <= current_time
&& processes[i].burst_remaining > 0
&& !is_in_queue(&ready_queue, processes[i].pid)) {
enqueue(&ready_queue, &processes[i]);
}
}
if (is_empty(&ready_queue)) {
current_time++;
continue;
}
Process *p = dequeue(&ready_queue);
int exec_time = min(p->burst_remaining, quantum);
p->burst_remaining -= exec_time;
current_time += exec_time;
if (p->burst_remaining > 0) {
enqueue(&ready_queue, p);
} else {
completed++;
record_turnaround(p->pid, current_time - p->arrival_time);
}
}
}

Priority Scheduling (preemptive):

void priority_schedule(Process processes[], int n) {
int current_time = 0;
int completed = 0;
int shortest_priority = INT_MAX;
int idx = -1;
while (completed < n) {
shortest_priority = INT_MAX;
idx = -1;
for (int i = 0; i < n; i++) {
if (processes[i].arrival_time <= current_time
&& processes[i].burst_remaining > 0
&& processes[i].priority < shortest_priority) {
shortest_priority = processes[i].priority;
idx = i;
}
}
if (idx == -1) {
current_time++;
continue;
}
processes[idx].burst_remaining--;
current_time++;
if (processes[idx].burst_remaining == 0) {
completed++;
record_turnaround(processes[idx].pid,
current_time - processes[idx].arrival_time);
}
}
}

Multilevel Feedback Queue:

#define NUM_LEVELS 3
#define QUANTUMS {8, 16, 32}
void mlfq_schedule(Process processes[], int n, int quantums[]) {
Queue queues[NUM_LEVELS];
int current_time = 0;
int completed = 0;
while (completed < n) {
for (int i = 0; i < n; i++) {
if (processes[i].arrival_time <= current_time
&& processes[i].burst_remaining > 0
&& !is_any_queue(&queues, processes[i].pid)) {
enqueue(&queues[0], &processes[i]);
}
}
int level;
for (level = 0; level < NUM_LEVELS; level++) {
if (!is_empty(&queues[level])) break;
}
if (level == NUM_LEVELS) {
current_time++;
continue;
}
Process *p = dequeue(&queues[level]);
int exec_time = min(p->burst_remaining, quantums[level]);
p->burst_remaining -= exec_time;
current_time += exec_time;
if (p->burst_remaining > 0 && level + 1 < NUM_LEVELS) {
enqueue(&queues[level + 1], p);
} else if (p->burst_remaining > 0) {
enqueue(&queues[level], p);
} else {
completed++;
record_turnaround(p->pid, current_time - p->arrival_time);
}
}
}

2.6 Inter-Process Communication

Processes may need to communicate and synchronise. IPC mechanisms:

Shared memory. A region of memory mapped into multiple address spaces. Fast, but requires Explicit synchronisation.

int shmid = shmget(SHM_KEY, SHM_SIZE, IPC_CREAT | 0666);
char *shm_ptr = (char *)shmat(shmid, NULL, 0);
shm_ptr[offset] = data;
shmdt(shm_ptr);

Message passing. Processes exchange messages through the OS. Operations: send(dest, msg) and receive(src, msg). Can be direct (named processes) or indirect (via mailboxes/ports).

PropertyShared MemoryMessage Passing
SpeedFastSlower (kernel mediation)
SynchronisationRequiredBuilt-in
Kernel useSetup onlyEvery message
ScalabilitySingle machineCan be distributed

Pipes. Unidirectional data channel. pipe() creates a pipe; fork() inherits file descriptors. Named pipes (mkfifo) allow unrelated processes to communicate.

Signals. Asynchronous notifications (integer payload only). Common: SIGTERM``SIGKILL SIGSEGV``SIGCHLD.

3. Synchronisation

3.1 The Critical Section Problem

Consider nn processes sharing a resource. The critical section is the code segment accessing the Resource. A correct solution must satisfy:

  1. Mutual exclusion: At most one process executes in its critical section at any time.
  2. Progress: If no process is in its critical section and some wish to enter, only those not in their remainder section participate, and the decision cannot be postponed indefinitely.
  3. Bounded waiting: A bound exists on the number of times other processes enter their critical sections after a process has requested entry.

3.2 Hardware Support

Test-and-Set. Atomic read-and-set instruction:

boolean test_and_set(boolean *target) {
boolean rv = *target;
*target = true;
return rv;
}

Compare-and-Swap (CAS). Atomically compare and conditionally write:

boolean compare_and_swap(int *value, int expected, int new_value) {
int temp = *value;
if (*value == expected) {
*value = new_value;
}
return temp == expected;
}

These instructions are the foundation for lock-free and wait-free data structures.

3.3 Mutexes

A mutex provides exclusive access to a shared resource.

pthread_mutex_t lock;
pthread_mutex_init(&lock, NULL);
pthread_mutex_lock(&lock);
// critical section
pthread_mutex_unlock(&lock);
pthread_mutex_destroy(&lock);

Spinlocks. A mutex that busy-waits. Useful when the expected wait time is shorter than the Context-switch overhead.

void spinlock_acquire(volatile int *lock) {
while (__atomic_test_and_set(lock, __ATOMIC_ACQUIRE)) {
// busy wait
}
}
void spinlock_release(volatile int *lock) {
__atomic_clear(lock, __ATOMIC_RELEASE);
}

3.4 Semaphores

A semaphore is an integer variable SS accessed only through two atomic operations:

  • Wait (P): P(S)P(S): while S0S \leq 0 do no-op; SS1S \leftarrow S - 1.
  • Signal (V): V(S)V(S): SS+1S \leftarrow S + 1.

Counting semaphore: SS takes any non-negative integer. Controls access to a resource with Multiple instances.

Binary semaphore: S{0,1}S \in \{0, 1\}. Functionally similar to a mutex, but can be signalled by any Process (not just the owner).

sem_t mutex;
sem_init(&mutex, 0, 1);
sem_wait(&mutex);
// critical section
sem_post(&mutex);

Theorem 3.1. Semaphores can implement mutual exclusion correctly if initialised to 1 and used With wait on entry and signal on exit.

Proof. If the semaphore is 1, the first wait decrements it to 0 and enters. Any concurrent wait finds S=0S = 0 and blocks. When the first process signals, SS becomes 1, waking exactly one Blocked process. \blacksquare

3.5 Monitors and Condition Variables

A monitor is a high-level synchronisation construct encapsulating shared data and operations, Allowing only one process to execute within it at any time.

A condition variable allows a process to wait for a condition inside a monitor:

pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t cond = PTHREAD_COND_INITIALIZER;
int buffer_count = 0;
// Producer
pthread_mutex_lock(&mutex);
while (buffer_count == BUFFER_SIZE) {
pthread_cond_wait(&cond, &mutex);
}
buffer_count++;
pthread_cond_signal(&cond);
pthread_mutex_unlock(&mutex);
// Consumer
pthread_mutex_lock(&mutex);
while (buffer_count == 0) {
pthread_cond_wait(&cond, &mutex);
}
buffer_count--;
pthread_cond_signal(&cond);
pthread_mutex_unlock(&mutex);

pthread_cond_wait atomically releases the mutex and blocks. On wakeup, it re-acquires the mutex.

3.6 Classic Synchronisation Problems

Producer-Consumer (Bounded Buffer). A producer writes items to a shared buffer of size NN; a Consumer reads them.

sem_t empty, full, mutex;
sem_init(&empty, 0, N);
sem_init(&full, 0, 0);
sem_init(&mutex, 0, 1);
void producer() {
while (true) {
item = produce_item();
sem_wait(&empty);
sem_wait(&mutex);
insert_item(item);
sem_post(&mutex);
sem_post(&full);
}
}
void consumer() {
while (true) {
sem_wait(&full);
sem_wait(&mutex);
item = remove_item();
sem_post(&mutex);
sem_post(&empty);
consume_item(item);
}
}

Readers-Writers. Multiple readers access concurrently; writers require exclusive access.

sem_t rw_mutex, mutex;
int read_count = 0;
sem_init(&rw_mutex, 0, 1);
sem_init(&mutex, 0, 1);
void writer() {
sem_wait(&rw_mutex);
// write to shared data
sem_post(&rw_mutex);
}
void reader() {
sem_wait(&mutex);
read_count++;
if (read_count == 1) sem_wait(&rw_mutex);
sem_post(&mutex);
// read shared data
sem_wait(&mutex);
read_count--;
if (read_count == 0) sem_post(&rw_mutex);
sem_post(&mutex);
}

Dining Philosophers. Five philosophers, five forks between them. Each needs two forks to eat. Naive solution (one semaphore per fork) deadlocks when all pick up their left fork.

Solutions:

  1. Resource hierarchy: Pick up lower-numbered fork first.
  2. Arbitrator: A single semaphore limits concurrent eaters to four.
  3. Chandy-Misra: Forks are “owned” by a philosopher; others request via messages.
sem_t forks[5];
sem_t room;
for (int i = 0; i < 5; i++) sem_init(&forks[i], 0, 1);
sem_init(&room, 0, 4);
void philosopher(int i) {
while (true) {
think();
sem_wait(&room);
sem_wait(&forks[i]);
sem_wait(&forks[(i + 1) % 5]);
eat();
sem_post(&forks[(i + 1) % 5]);
sem_post(&forks[i]);
sem_post(&room);
}
}

:::caution Common Pitfall Always use a while loop (not if) when checking conditions with condition variables. Spurious Wakeups can cause pthread_cond_wait() to return without the condition being signalled. The loop Re-checks the condition after every wakeup. :::

4. Deadlocks

4.1 Definition and Necessary Conditions

A deadlock is a situation where a set of processes are all blocked, each waiting for a resource Held by another process in the set.

Coffman conditions (all four must hold simultaneously):

  1. Mutual exclusion: At least one resource is non-sharable.
  2. Hold and wait: A process holds at least one resource and is waiting for others.
  3. No preemption: Resources cannot be forcibly taken.
  4. Circular wait: A circular chain of processes exists, each waiting for a resource held by the next.

4.2 Deadlock Prevention

Eliminate one of the four Coffman conditions:

ConditionPrevention strategy
Mutual exclusionGenerally not feasible for non-sharable resources
Hold and waitRequire all resources upfront before execution
No preemptionRelease held resources if a new request cannot be granted
Circular waitImpose a total ordering on resources; request in increasing order

4.3 Deadlock Avoidance: Banker’s Algorithm

The Banker’s algorithm avoids deadlock by checking whether granting a request leads to a safe State.

Definitions:

  • Available: Vector A=(A1,,Am)A = (A_1, \ldots, A_m) of available instances of each resource type.
  • Maximum: Matrix Max\mathrm{Max} where Max[i][j]\mathrm{Max}[i][j] is the maximum demand of process ii for resource jj.
  • Allocation: Matrix Alloc\mathrm{Alloc} where Alloc[i][j]\mathrm{Alloc}[i][j] is the current allocation.
  • Need: Need[i][j]=Max[i][j]Alloc[i][j]\mathrm{Need}[i][j] = \mathrm{Max}[i][j] - \mathrm{Alloc}[i][j].

Safety algorithm:

1. Work = Available; Finish[i] = false for all i
2. Find i such that Finish[i] == false AND Need[i] <= Work
3. If no such i exists, go to step 5
4. Work = Work + Alloc[i]; Finish[i] = true; go to step 2
5. If Finish[i] == true for all i, system is SAFE

Resource request algorithm. When process ii requests resources:

1. If Request[i] > Need[i], error (exceeded maximum claim)
2. If Request[i] > Available, process must wait
3. Pretend to allocate:
Available = Available - Request[i]
Alloc[i] = Alloc[i] + Request[i]
Need[i] = Need[i] - Request[i]
4. Run safety algorithm.
If safe, grant the request.
If unsafe, roll back and make the process wait.

Example. Five processes, three resource types, A=(3,3,2)A = (3, 3, 2):

ProcessAllocationMaxNeed
P0P_0(0,1,0)(7,5,3)(7,4,3)
P1P_1(2,0,0)(3,2,2)(1,2,2)
P2P_2(3,0,2)(9,0,2)(6,0,0)
P3P_3(2,1,1)(2,2,2)(0,1,1)
P4P_4(0,0,2)(4,3,3)(4,3,1)

Safety check: P1P_1 has Need=(1,2,2)(3,3,2)=A\mathrm{Need} = (1,2,2) \leq (3,3,2) = A. Execute P1P_1Release (2,0,0)(2,0,0)New A=(5,3,2)A = (5,3,2). Then P3P_3: Need=(0,1,1)(5,3,2)\mathrm{Need} = (0,1,1) \leq (5,3,2). Continuing, All processes can complete: system is safe.

Worked Example 4.1 — Banker's Algorithm Step-by-Step

Given the state above, suppose P1P_1 requests (1,0,2)(1,0,2).

Step 1: Verify Request1=(1,0,2)Need1=(1,2,2)\mathrm{Request_1} = (1,0,2) \leq \mathrm{Need_1} = (1,2,2). OK.

Step 2: Verify Request1=(1,0,2)A=(3,3,2)\mathrm{Request_1} = (1,0,2) \leq A = (3,3,2). OK.

Step 3: Pretend to allocate:

ProcessAllocationMaxNeed
P0P_0(0,1,0)(7,5,3)(7,4,3)
P1P_1(3,0,2)(3,2,2)(0,2,0)
P2P_2(3,0,2)(9,0,2)(6,0,0)
P3P_3(2,1,1)(2,2,2)(0,1,1)
P4P_4(0,0,2)(4,3,3)(4,3,1)

A=(3,3,2)(1,0,2)=(2,3,0)A = (3,3,2) - (1,0,2) = (2,3,0).

Step 4: Run safety algorithm.

  1. P1P_1: Need=(0,2,0)(2,3,0)\mathrm{Need} = (0,2,0) \leq (2,3,0). Execute, A=(2,3,0)+(3,0,2)=(5,3,2)A = (2,3,0) + (3,0,2) = (5,3,2).
  2. P3P_3: Need=(0,1,1)(5,3,2)\mathrm{Need} = (0,1,1) \leq (5,3,2). Execute, A=(5,3,2)+(2,1,1)=(7,4,3)A = (5,3,2) + (2,1,1) = (7,4,3).
  3. P4P_4: Need=(4,3,1)(7,4,3)\mathrm{Need} = (4,3,1) \leq (7,4,3). Execute, A=(7,4,3)+(0,0,2)=(7,4,5)A = (7,4,3) + (0,0,2) = (7,4,5).
  4. P0P_0: Need=(7,4,3)(7,4,5)\mathrm{Need} = (7,4,3) \leq (7,4,5). Execute, A=(7,4,5)+(0,1,0)=(7,5,5)A = (7,4,5) + (0,1,0) = (7,5,5).
  5. P2P_2: Need=(6,0,0)(7,5,5)\mathrm{Need} = (6,0,0) \leq (7,5,5). Execute, A=(7,5,5)+(3,0,2)=(10,5,7)A = (7,5,5) + (3,0,2) = (10,5,7).

All processes finish. The request is granted. Safe sequence: P1,P3,P4,P0,P2\langle P_1, P_3, P_4, P_0, P_2 \rangle.

Worked Example 4.2 — Unsafe State Detection

Suppose instead P0P_0 requests (0,2,0)(0,2,0) in the original state.

Step 1: Request0=(0,2,0)Need0=(7,4,3)\mathrm{Request_0} = (0,2,0) \leq \mathrm{Need_0} = (7,4,3). OK.

Step 2: Request0=(0,2,0)A=(3,3,2)\mathrm{Request_0} = (0,2,0) \leq A = (3,3,2). OK.

Step 3: Pretend to allocate. New A=(3,1,2)A = (3,1,2), Need0=(7,2,3)\mathrm{Need_0} = (7,2,3).

Step 4: Safety check. No process can execute: P1P_1 needs (1,2,2)(1,2,2) but only (3,1,2)(3,1,2) available (second component insufficient). P3P_3 needs (0,1,1)(3,1,2)(0,1,1) \leq (3,1,2) — OK, execute P3P_3: A=(5,2,3)A = (5,2,3). Then P1P_1: (1,2,2)(5,2,3)(1,2,2) \leq (5,2,3) — OK, execute: A=(7,2,3)A = (7,2,3). But now P0P_0 needs (7,2,3)(7,2,3) — exact match, execute: A=(7,3,3)A = (7,3,3). P4P_4: (4,3,1)(7,3,3)(4,3,1) \leq (7,3,3) — OK, execute: A=(7,3,5)A = (7,3,5). P2P_2: (6,0,0)(7,3,5)(6,0,0) \leq (7,3,5) — OK.

Safe sequence: P3,P1,P0,P4,P2\langle P_3, P_1, P_0, P_4, P_2 \rangle. The request is granted.

Theorem 4.1. The Banker’s algorithm is correct: if it declares a state safe, there exists a Sequence of process completions that avoids deadlock.

Proof. The safety algorithm constructs an explicit sequence. Each process in the sequence can run To completion with the currently available resources plus those released by previously completed Processes. If no such sequence exists, there is a set of processes whose combined needs exceed Available resources. \blacksquare

4.4 Deadlock Detection

Allow deadlocks to occur, then detect and recover.

Detection algorithm:

1. Work = Available
Finish[i] = false if Alloc[i] != 0; true otherwise
2. Find i such that Finish[i] == false AND Request[i] <= Work
3. If found: Work = Work + Alloc[i]; Finish[i] = true; go to step 2
4. If not found: processes with Finish[i] == false are deadlocked

Recovery strategies:

  1. Process termination: Kill deadlocked processes one at a time until the cycle is broken.
  2. Resource preemption: Forcibly reclaim resources; must handle rollback of the affected process.
  3. Checkpoint and rollback: Periodically save process state; restore on deadlock.
Worked Example 4.3 — Deadlock Detection

Three processes and one resource type with 10 instances:

ProcessAllocationMaximum
P0P_059
P1P_124
P2P_227

Available = 10(5+2+2)=110 - (5 + 2 + 2) = 1.

Request0=4\mathrm{Request_0} = 4, Request1=2\mathrm{Request_1} = 2, Request2=5\mathrm{Request_2} = 5.

Detection:

  1. Work = 1. No process has Request1\mathrm{Request} \leq 1.
  2. No process can proceed. All three are deadlocked.

The system is in an unsafe state with a deadlock involving {P0,P1,P2}\{P_0, P_1, P_2\}.

Recovery: Preempt 3 units from P0P_0 (reducing its allocation to 2). Now Available = 4. P1P_1 can proceed (Request1=24\mathrm{Request_1} = 2 \leq 4). After P1P_1 finishes, Available = 4+2=64 + 2 = 6. P0P_0: Request0=46\mathrm{Request_0} = 4 \leq 6Proceeds. After: Available = 6+5=116 + 5 = 11. P2P_2 proceeds. Deadlock resolved.

4.5 Deadlock Handling in Practice

Most modern OSes (Linux, Windows) use the ostrich algorithm: ignore deadlocks and rely on process Termination or reboot. Rationale: deadlocks are rare, detection is expensive, and prevention is Overly restrictive.

5. Memory Management

5.1 Contiguous Memory Allocation

Fixed partitioning. Memory divided into fixed-size partitions at boot. Internal fragmentation.

Variable partitioning. Partitions created dynamically. External fragmentation.

Allocation strategies:

StrategyDescription
First-fitAllocate the first hole large enough
Best-fitAllocate the smallest hole large enough
Worst-fitAllocate the largest hole
Next-fitContinue searching from the last allocation

Theorem 5.1. First-fit and best-fit have roughly equivalent utilisation; first-fit is faster.

5.2 Paging

Divide physical memory into fixed-size frames and logical memory into same-size pages. A page table maps page numbers to frame numbers.

physicaladdress=PT[p]×F+f\mathrm{physical} address = \mathrm{PT}[p] \times F + f

Where PT[p]\mathrm{PT}[p] is the frame number for page pp and FF is the frame size.

Page table entry fields:

FieldPurpose
Frame numberPhysical frame containing this page
Valid bitPage is in memory (1) or not (0)
Dirty bitPage has been modified
Reference bitPage accessed recently
ProtectionRead/write/execute permissions

Multi-level page tables. A single page table for a 64-bit address space is infeasible. A Two-level scheme uses a page directory indexed by the outer page number, pointing to inner Page tables:

Address:p1outerp2innerdoffset\mathrm{Address}: \underbrace{p_1}_{\mathrm{outer} \mid \underbrace{p_2}_{\mathrm{inner} \mid \underbrace{d}_{\mathrm{offset}}}}

X86-64 with 48-bit virtual addresses uses four-level page tables.

Inverted page table. One entry per physical frame (not per virtual page). Reduces memory Overhead but makes searching for a specific virtual page expensive; solved by hashing.

5.3 Segmentation

Segments divide the address space into logical units (code, data, stack, heap). Each segment has a base and limit:

physicaladdress=base+offset,ifoffset<limit\mathrm{physical} address = \mathrm{base} + \mathrm{offset}, \quad \mathrm{if} \mathrm{offset} \lt \mathrm{limit}

Advantages: Reflects program structure; supports sharing individual segments. Disadvantages: External fragmentation (variable-size segments).

Worked Example 5.1 — Segmented Address Translation

A process has three segments with the following segment table:

SegmentBaseLimit
Code40962048
Data122884096
Stack204802048

Translate logical address (segment = 1, offset = 1500):

  1. Check: offset 1500<1500 \lt limit 40964096. Valid.
  2. Physical address = base + offset = 12288+1500=1378812288 + 1500 = 13788.

Translate (segment = 2, offset = 2500):

  1. Check: offset 2500<2500 \lt limit 20482048. Invalid — segmentation fault.

5.4 Segmented Paging

Combine segmentation and paging: the segment offset is divided into page number and page offset.

Address:ssegmentppagedoffset\mathrm{Address}: \underbrace{s}_{\mathrm{segment} \mid \underbrace{p}_{\mathrm{page} \mid \underbrace{d}_{\mathrm{offset}}}}

Used by x86 (segmentation + paging).

5.5 Virtual Memory

Virtual memory allows processes to use more memory than physically available by keeping only active Pages in RAM; the rest reside on disk (swap space).

Demand paging. Pages loaded on access, not at process start.

Page fault handling:

  1. CPU generates logical address; page table entry has valid bit = 0.
  2. Trap to OS page fault handler.
  3. Find a free frame (possibly evicting a victim page).
  4. Read the page from disk into the frame.
  5. Update page table entry (valid bit = 1, frame number).
  6. Restart the faulting instruction.

Effective access time (EAT):

EAT=(1p)×ma+p×pf\mathrm{EAT} = (1 - p) \times \mathrm{ma} + p \times \mathrm{pf}

Where pp = page fault rate, ma\mathrm{ma} = memory access time, pf\mathrm{pf} = page fault service Time.

For p=0.001p = 0.001, ma=100\mathrm{ma} = 100 ns, pf=8\mathrm{pf} = 8 ms:

EAT=0.999×100+0.001×8000000=8.1  μs\mathrm{EAT} = 0.999 \times 100 + 0.001 \times 8\,000\,000 = 8.1 \; \mu\mathrm{s}

This is roughly 80×80\times slower than pure memory access, illustrating why a low page fault rate Is critical.

5.6 Translation Lookaside Buffer (TLB)

A TLB is a hardware cache of recently used page table entries, avoiding an extra memory access Per translation.

EAT=h×(TLB+ma)+(1h)×(TLB+ma+ma)\mathrm{EAT} = h \times (\mathrm{TLB} + \mathrm{ma}) + (1 - h) \times (\mathrm{TLB} + \mathrm{ma} + \mathrm{ma})

Where hh is the TLB hit ratio.

For h=0.99h = 0.99, TLB=2\mathrm{TLB} = 2 ns, ma=100\mathrm{ma} = 100 ns:

EAT=0.99×102+0.01×202=103  ns\mathrm{EAT} = 0.99 \times 102 + 0.01 \times 202 = 103 \; \mathrm{ns}

TLB coherence. When the OS modifies a page table entry, it must invalidate the corresponding TLB entry. On x86-64: invlpg for single-entry invalidation, or reload CR3 to flush the entire TLB.

Worked Example 5.2 — TLB + Page Fault EAT Calculation

A system with TLB access time = 2 ns, memory access time = 100 ns, page fault service time = 8 ms, TLB hit ratio = 0.80, and page fault rate = 0.0005.

Case 1: TLB hit, no page fault (0.80×0.9995=0.79960.80 \times 0.9995 = 0.7996): Time = TLB+ma=2+100=102\mathrm{TLB} + \mathrm{ma} = 2 + 100 = 102 ns.

Case 2: TLB miss, no page fault (0.20×0.9995=0.19990.20 \times 0.9995 = 0.1999): Time = TLB+2×ma=2+200=202\mathrm{TLB} + 2 \times \mathrm{ma} = 2 + 200 = 202 ns.

Case 3: Page fault (0.00050.0005): Time = TLB+ma+pf=2+100+8×106=8000002\mathrm{TLB} + \mathrm{ma} + \mathrm{pf} = 2 + 100 + 8 \times 10^6 = 8000002 ns.

EAT=0.7996×102+0.1999×202+0.0005×8000102\mathrm{EAT} = 0.7996 \times 102 + 0.1999 \times 202 + 0.0005 \times 8\,000\,102 EAT=81.56+40.38+4000.05=4121.99  ns4.12  μs\mathrm{EAT} = 81.56 + 40.38 + 4000.05 = 4121.99 \;\mathrm{ns} \approx 4.12 \;\mu\mathrm{s}

This is roughly 41×41\times slower than pure memory access, driven almost entirely by the page fault Component.

5.7 Page Replacement Algorithms

When physical memory is full, a victim page must be selected for eviction.

Optimal (OPT/MIN). Replace the page not used for the longest time in the future. Provably Optimal but requires future knowledge; used only as a benchmark.

Theorem 5.2 (Belady’s Optimality). OPT yields the fewest page faults for any reference string.

Proof. If OPT replaces page xx (used furthest in future) and another algorithm replaces page yy (used sooner), the other algorithm faults at least once more by the time yy is next Referenced. \blacksquare

First-In, First-Out (FIFO). Replace the oldest page. Simple but may evict heavily used pages. Suffers from Belady’s anomaly: increasing frames can increase faults.

Example of Belady’s anomaly. Reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5. With 3 Frames: 9 faults. With 4 frames: 10 faults.

Least Recently Used (LRU). Replace the page not used for the longest time. Approximates OPT Well. Does not suffer from Belady’s anomaly.

Theorem 5.3. LRU is a stack algorithm: the set of pages in n+1n+1 frames is a superset of the Set in nn frames, for any reference string.

Proof. When adding a frame, LRU keeps the nn most recently used pages (same as before) plus the New frame. No page in the nn-frame set can be more recently used than a page outside it, so the n+1n+1 set must contain the nn-frame set. \blacksquare

Clock (Second Chance). Pages in a circular list with a reference bit. On replacement:

  1. If reference bit is 0, replace the page.
  2. If reference bit is 1, clear it and advance.

Approximates LRU with O(1)O(1) per operation.

LFU (Least Frequently Used). Replace the page with the smallest access count. May fail to adapt To changing access patterns.

Approximating LRU in practice. Most OSes use a variant of Clock. Linux uses an LRU-like Approximation with active and inactive lists: pages on the active list are protected; pages Not accessed are demoted to the inactive list; eviction targets the inactive list.

Worked Example 5.3 — Optimal Page Replacement

Reference string: 7, 0, 1, 2, 0, 3, 0, 4, 2, 3, 0, 3, 2, 1, 2, 0, 1, 7, 0, 1. Three frames.

RefFrame 1Frame 2Frame 3Fault?Victim
77Yes
070Yes
1701Yes
2201Yes7 (used at 18)
0201No
3231Yes0 (used at 10)
0230Yes1 (used at 14)
4240Yes3 (used at 11)
2240No
3340Yes2 (used at 13)
0340No
3340No
2240Yes3 (used at 11, already past)
1210Yes4 (used at \infty)
2210No
0210No
1210No
7710Yes2 (used at 13, already past)
0710No
1710No

Total page faults: 9. This is the theoretical minimum.

Worked Example 5.4 — LRU Page Replacement

Same reference string, three frames. LRU replaces the page whose last use was furthest in the past.

RefF1F2F3Fault?Victim (least recent)
77Yes
070Yes
1701Yes
2201Yes7
0201No
3231Yes0
0031Yes2
4041Yes3
2042Yes1
3342Yes0
0302Yes4
3302No
2302No
1102Yes3
2102No
0102No
1102No
7107Yes2
0107No
1107No

Total page faults: 12. LRU produces 33% more faults than optimal, but does not require future Knowledge.

Worked Example 5.5 — Clock (Second Chance) Replacement

Same reference string, three frames. Clock hand starts at frame 0. R = reference bit.

RefState (R bits)Fault?Action
7[7,R] [-] [-]YesLoad 7 into F0
0[7,R] [0,R] [-]YesLoad 0 into F1
1[7,R] [0,R] [1,R]YesLoad 1 into F2
2[2,R] [0,R] [1,R]YesF0: R=1, clear, advance; F1: R=1, clear, advance; F2: R=1, clear, advance; F0: R=0, replace with 2
0[2,R] [0,R] [1,R]NoSet R on F1
3[2,R] [3,R] [1,R]YesF0: R=1, clear, advance; F1: R=1, clear, advance; F2: R=1, clear, advance; F0: R=0, replace with 3
0[0,R] [3,R] [1,R]YesF0: R=0, replace with 0
4[0,R] [4,R] [1,R]YesF0: R=1, clear, advance; F1: R=0, replace with 4
2[2,R] [4,R] [1,R]YesF0: R=0, replace with 2
3[2,R] [3,R] [1,R]YesF0: R=1, clear, advance; F1: R=0, replace with 3
0[0,R] [3,R] [1,R]YesF0: R=0, replace with 0
3[0,R] [3,R] [1,R]NoSet R on F1
2[2,R] [3,R] [1,R]YesF0: R=0, replace with 2
1[2,R] [3,R] [1,R]NoSet R on F2
2[2,R] [3,R] [1,R]NoSet R on F0
0[2,R] [0,R] [1,R]YesF0: R=1, clear, advance; F1: R=1, clear, advance; F2: R=1, clear, advance; F0: R=0, replace with 0
1[0,R] [1,R] [1,R]NoSet R on F2
7[7,R] [1,R] [1,R]YesF0: R=0, replace with 7
0[7,R] [0,R] [1,R]YesF0: R=1, clear, advance; F1: R=0, replace with 0
1[7,R] [0,R] [1,R]NoSet R on F2

Total page faults: 15. Clock performs worse than LRU here but requires only O(1)O(1) per operation And no global ordering of references.

:::caution Common Pitfall Belady’s anomaly applies to FIFO but not to LRU or Optimal. Adding more memory does not always Reduce page faults for non-stack algorithms. :::

5.8 Thrashing

Thrashing occurs when a process spends more time paging than executing. This happens when the Total working set of all active processes exceeds physical memory.

Working set model. W(t,Δ)W(t, \Delta) is the set of pages referenced in the last Δ\Delta references. If Wi>\sum W_i \gt available frames, thrashing occurs.

Solutions:

  1. Working set strategy: Admit a process only if its working set fits.
  2. Page fault frequency (PFF): Adjust resident set size based on observed fault rate.
  3. Local replacement: Restrict eviction to the process’s own frames.
Worked Example 5.6 — Thrashing Analysis

A system has 64 frames of physical memory. Four processes with the following working set sizes (Δ=5\Delta = 5 references):

ProcessWorking Set Size
P1P_115
P2P_220
P3P_318
P4P_425
Total78

Total working set = 78 frames, but only 64 available. Thrashing occurs.

Solution 1 (working set admission): Suspend P4P_4 (largest working set). Remaining: 15+20+18=536415 + 20 + 18 = 53 \leq 64. Thrashing eliminated.

Solution 2 (PFF): If P3P_3‘s page fault rate exceeds the upper threshold, increase its resident Set size by stealing frames from P4P_4. If P4P_4‘s fault rate drops below the lower threshold, Decrease its allocation.

Theorem 5.4. If the sum of all working sets exceeds the number of physical frames, at least one Process must thrash.

Proof. By the pigeonhole principle, if i=1nWi>F\sum_{i=1}^{n} \lvert W_i \rvert \gt F where FF is the Total number of frames, at least one process cannot hold its entire working set in memory. That Process will repeatedly evict pages it needs, causing its page fault rate to dominate its execution Time. \blacksquare

5.9 Copy-on-Write (COW)

Copy-on-Write is an optimisation for fork(). Instead of copying all pages, parent and child Share physical frames (marked read-only). On a write to a shared page, a fault triggers a copy of Just that page.

  • fork() becomes nearly O(1)O(1) instead of O(n)O(n) where nn is the number of pages.
  • If the child immediately calls exec()No copies are ever made.

6. File Systems

6.1 File Concepts

A file is a named collection of related data on secondary storage.

File attributes: Name, identifier (inode number), type, location, size, protection, timestamps.

File operations: create``delete``open``close``read``write``seek``truncate.

Access methods:

  • Sequential access: Read bytes in order (tapes, log-structured files).
  • Direct (random) access: Read/write at arbitrary offsets (disks).
  • Indexed access: Access via an index structure (B-trees, hash tables).

6.2 File Allocation Methods

Contiguous allocation. Each file occupies a contiguous set of blocks on disk. The directory entry Stores the starting block and length.

  • Advantage: Excellent for sequential and direct access; minimal seek time.
  • Disadvantage: External fragmentation; difficulty finding space for large files; cannot grow dynamically.

Linked allocation. Each block contains a pointer to the next block. The directory entry stores The starting block and the end block (or a count).

  • Advantage: No external fragmentation; file can grow dynamically.
  • Disadvantage: No direct access (must follow pointers); pointer overhead per block; reliability (a broken pointer loses the rest of the file).

Indexed allocation. The directory entry points to an index block that contains an array of Pointers to data blocks. Supports direct access without external fragmentation.

  • Advantage: Direct access; no external fragmentation.
  • Disadvantage: Index block may be too small for large files (solved by linked index blocks or multi-level indexing, as in the inode scheme of §6.3).
MethodSequentialDirectExternal Frag.Internal Frag.
ContiguousExcellentYesYesPossible
LinkedGoodNoNoNone
IndexedGoodYesNoIndex block
Worked Example 6.1 — Allocation Comparison

A file of 10 blocks is stored on a disk. The disk has blocks at positions: 0 (free), 1 (used), 2-5 (free), 6 (used), 7-9 (free), 10-15 (free), 16 (used), 17-31 (free).

Contiguous: Needs 10 consecutive free blocks. Largest free run is 16-31 (16 blocks). File stored At blocks 17—26. Access to block kk: position 17+k17 + k.

Linked: Blocks can be scattered. E.g., 2, 3, 4, 5, 7, 8, 9, 10, 11, 12. To read block 7, must Traverse 7 pointers.

Indexed: Index block at position 0. Index block contains pointers to data blocks 2, 3, 4, 5, 7, 8, 9, 10, 11, 12. To read block 7, access index block position 7 (one seek + one read).

6.3 Directory Structures

StructureDescription
Single-levelAll files in one directory
Two-levelSeparate directory per user
TreeHierarchical tree with subdirectories
Acyclic graphTree with shared subdirectories (hard links)
General graphAllows cycles; requires garbage collection

Path resolution. A path name is resolved by traversing the directory tree from the root (/) Or the current working directory. Each component is looked up in the current directory to find the Next inode. Symbolic links (symlinks) store a path string that is substituted during resolution; Circular symlinks are detected by tracking the link count.

Hard links vs symbolic links. A hard link creates an additional directory entry pointing to the Same inode (increments the link count). A symbolic link is a special file containing a path string. Hard links cannot cross file systems; symbolic links can. Deleting the original file does not affect A symbolic link but removes one reference from a hard link (the inode is freed only when the link Count reaches zero).

6.4 Inode-Based File Systems

An inode (index node) stores metadata about a file (but not the file name, which is in the Directory entry):

FieldDescription
ModeFile type and permissions
OwnerUser and group IDs
SizeFile size in bytes
TimestampsLast access, modification, inode change
Link countNumber of hard links to this inode
Block pointersDirect, indirect, double-indirect, triple-indirect
BlocksNumber of blocks allocated

Block pointer scheme (ext4 with 4 KiB blocks, 4-byte pointers):

  • Direct blocks: 12 pointers to data blocks (48 KiB).
  • Single indirect: 1 pointer to a block of 1024 pointers (4 MiB).
  • Double indirect: 1 pointer to a block of 1024 blocks of 1024 pointers (4 GiB).
  • Triple indirect: 1 more level of indirection (4 TiB).

Maximum file size: 48  KiB+4  MiB+4  GiB+4  TiB4  TiB48\;\mathrm{KiB} + 4\;\mathrm{MiB} + 4\;\mathrm{GiB} + 4\;\mathrm{TiB} \approx 4\;\mathrm{TiB}.

6.5 Journaling File Systems

The problem. A crash during a file system update can leave inconsistent metadata (orphaned Inodes, lost blocks, incorrect free block count).

Journaling. Before modifying the file system, write a description of the modifications to a Dedicated journal. After a crash, replay the journal to restore consistency.

Journaling modes (ext4):

ModeDescription
orderedMetadata journaled; data written before metadata commit (default)
writebackMetadata journaled; data not journaled (fastest, data loss risk)
journalBoth metadata and data journaled (safest, slowest)

Write-ahead logging (WAL). The journal entry is written to stable storage before the actual Modification. Recovery scans the journal: committed entries are applied; uncommitted entries are Discarded.

Other journaling file systems: NTFS (Windows), APFS (macOS), XFS, ZFS.

6.5a RAID Levels

RAID (Redundant Array of Independent Disks) combines multiple disks for performance, reliability, Or both.

LevelMin DisksDescriptionFault ToleranceWrite Performance
01Striping only, no redundancyNoneExcellent
12Mirroring every disk1 diskModerate
23Bit-level striping with Hamming code ECC1 diskPoor
33Byte-level striping with dedicated parity1 diskModerate
43Block-level striping with dedicated parity1 diskModerate
53Block-level striping with distributed parity1 diskGood
64Block-level striping with dual parity2 disksModerate
104Mirrored sets of stripes (RAID 1 + RAID 0)1 disk per mirrorExcellent
506Parity sets of stripes (RAID 5 + RAID 0)1 disk per setGood

Effective capacity. For nn disks of capacity CC:

  • RAID 0: nCnC
  • RAID 1: nC/2nC / 2
  • RAID 5: (n1)C(n - 1)C
  • RAID 6: (n2)C(n - 2)C
  • RAID 10: nC/2nC / 2

Mean time to failure (MTTF). If each disk has \mathrm{MTTF_}{\mathrm{disk}} and the MTTR (mean time to repair) is TrepairT_{\mathrm{repair}}:

\mathrm{MTTF_}{\mathrm{RAID}\;5} \approx \frac{\mathrm{MTTF_}{\mathrm{disk}^2}{n(n-1) \cdot T_{\mathrm{repair}}}}

RAID 5 significantly improves reliability for large arrays, but the rebuild time grows with disk Capacity, increasing the window of vulnerability.

Worked Example 6.2 — RAID Capacity and Reliability

Eight 2 TiB disks. \mathrm{MTTF_}{\mathrm{disk} = 1.2 \times 10^6} hours, Trepair=24T_{\mathrm{repair} = 24} hours.

RAID 0: Capacity = 8×2=168 \times 2 = 16 TiB. No fault tolerance. MTTF = \mathrm{MTTF_}{\mathrm{disk} / 8 = 150000} hours.

RAID 5: Capacity = 7×2=147 \times 2 = 14 TiB. \mathrm{MTTF_}{\mathrm{RAID}\;5} \approx \frac{(1.2 \times 10^6)^2}{8 \times 7 \times 24} = \frac{1.44 \times 10^{12}}{1344} \approx 1.07 \times 10^9 hours.

RAID 6: Capacity = 6×2=126 \times 2 = 12 TiB. Tolerates 2 simultaneous disk failures. MTTF is orders of Magnitude higher than RAID 5.

6.6 FUSE

FUSE (Filesystem in Userspace) allows non-privileged users to create file systems without Modifying the kernel. The FUSE kernel module routes VFS operations to a user-space daemon.

Workflow:

  1. User program calls a file system operation (e.g., read).
  2. VFS routes the call to the FUSE module.
  3. FUSE sends the request to the user-space daemon via /dev/fuse.
  4. The daemon processes the request and responds.
  5. FUSE returns the result to the user program.

Examples: sshfs (mount remote directories over SSH), ntfs-3g (NTFS on Linux), bindfs (remap permissions).

6.7 Free Space Management

MethodDescription
BitmapOne bit per block; 1 = free, 0 = used. Simple and fast.
Linked listFree blocks linked together. Slow to traverse.
GroupingFirst free block stores addresses of nn free blocks.
CountingTrack contiguous free blocks as (start, count) pairs.

6.8 File System Consistency

Consistency checking (fsck). Scans the file system for:

  • Orphaned inodes: Inodes not referenced by any directory entry.
  • Incorrect link counts: Link count does not match actual directory entries.
  • Duplicate blocks: Two inodes claiming the same data block.
  • Missing/free blocks: Blocks marked as used but not referenced, or vice versa.

Journaling vs. fsck. Journaling eliminates the need for full fsck after a crash. Only the Journal needs to be replayed, which takes seconds instead of hours for large file systems.

7. I/O Systems

7.1 I/O Hardware

CategoryExamplesData rate
Human-readableKeyboard, screen, printerLow
Machine-readableDisk, tape, sensorsMedium to high
CommunicationNetwork interfaces, modemsVariable

7.2 Device Drivers

A device driver is kernel-mode software that translates OS I/O requests into device-specific Operations. It provides a uniform interface to the OS while hiding hardware details.

Driver architecture (layered):

User application
System call interface (read, write, ioctl)
File system / block layer
Device driver (kernel module)
Device controller (hardware)

Types:

  • Block device drivers: Manage fixed-size block access (disks, SSDs). Use a buffer cache to reduce physical I/O. The OS can reorder and merge block requests for performance.
  • Character device drivers: Manage byte-stream access (terminals, serial ports). No buffering or reordering by the OS.

Driver loading. Most modern OSes support loadable kernel modules (LKMs): drivers loaded at Runtime without rebooting. On Linux: insmod``modprobe. This allows third-party hardware support Without kernel recompilation.

7.3 I/O Scheduling

I/O schedulers reorder and merge requests to improve performance:

  • FCFS: Simple but may cause starvation.
  • SSTF: Service the request closest to the current head position. Minimises seek but can starve distant requests.
  • SCAN (elevator): Head moves in one direction servicing requests, then reverses.
  • C-SCAN: Like SCAN but only services in one direction; returns without servicing.
  • LOOK / C-LOOK: Optimised SCAN/C-SCAN that reverses when no requests remain ahead.

Theorem 7.1. C-SCAN provides more uniform wait times than SCAN.

Proof. SCAN services requests in both directions, so requests at the extremes of the disk wait Longer. C-SCAN treats the disk as a circular queue, ensuring every request is serviced within one Full sweep. \blacksquare

Worked Example 7.1 — Disk Scheduling Comparison

Disk with 200 cylinders (0—199). Request queue (sorted): 98, 183, 37, 122, 14, 124, 65, 67. Current head position: 53, moving toward higher cylinders.

FCFS: 53 → 98 → 183 → 37 → 122 → 14 → 124 → 65 → 67. Total seek = 5398+98183+18337+37122+12214+14124+12465+6567=45+85+146+85+108+110+59+2=640\lvert 53-98 \rvert + \lvert 98-183 \rvert + \lvert 183-37 \rvert + \lvert 37-122 \rvert + \lvert 122-14 \rvert + \lvert 14-124 \rvert + \lvert 124-65 \rvert + \lvert 65-67 \rvert = 45 + 85 + 146 + 85 + 108 + 110 + 59 + 2 = 640 cylinders.

SSTF: 53 → 65 → 67 → 37 → 14 → 98 → 122 → 124 → 183. Total seek = 12+2+30+23+84+24+2+59=23612 + 2 + 30 + 23 + 84 + 24 + 2 + 59 = 236 cylinders.

SCAN (elevator, moving right): 53 → 65 → 67 → 98 → 122 → 124 → 183 → 199 → 37 → 14. Total seek = 12+2+31+24+2+59+16+162+23=33112 + 2 + 31 + 24 + 2 + 59 + 16 + 162 + 23 = 331 cylinders.

C-SCAN (moving right): 53 → 65 → 67 → 98 → 122 → 124 → 183 → 199 → 0 → 14 → 37. Total seek = 12+2+31+24+2+59+16+199+14+23=38212 + 2 + 31 + 24 + 2 + 59 + 16 + 199 + 14 + 23 = 382 cylinders.

LOOK (stops at last request): 53 → 65 → 67 → 98 → 122 → 124 → 183 → 37 → 14. Total seek = 12+2+31+24+2+59+146+23=29912 + 2 + 31 + 24 + 2 + 59 + 146 + 23 = 299 cylinders.

AlgorithmTotal SeekMax WaitFairness
FCFS640335Fair
SSTF236130Starves
SCAN331313Good
C-SCAN382329Best
LOOK299299Good

SSTF minimises total seek but starves requests at the extremes. C-SCAN provides the most uniform Wait time at the cost of higher total seek.

7.4 Direct Memory Access (DMA)

DMA allows I/O devices to transfer data directly to/from memory without per-word CPU Intervention.

Without DMA: CPU reads each word from the device controller to memory. For a 4 KiB read, the CPU Is interrupted 4096 times.

With DMA: CPU programs the DMA controller with source, destination, and count. DMA handles the Transfer, interrupting the CPU only when complete.

  • DMA transfers are cycle-stealing: the DMA controller uses the system bus, pausing the CPU.
  • Modern systems use bus mastering: the I/O device controller performs the transfer autonomously.

8. Virtualization

8.1 Hypervisor Types

Type 1 (bare-metal). Runs directly on hardware. Examples: VMware ESXi, Xen, Hyper-V.

  • Direct hardware access; high performance.
  • Must include device drivers for all hardware.

Type 2 (hosted). Runs on a host OS. Examples: VirtualBox, VMware Workstation, QEMU.

  • Easier to develop and install.
  • Additional overhead from the host OS.

8.2 Virtual Machine Monitors

A VMM (hypervisor) presents the illusion of multiple independent physical machines to guest OSes.

Challenges:

  • Instruction emulation: Sensitive instructions must be trapped and emulated. On x86, IN OUT``HLT``CLI``STIAnd MOV to/from control registers are sensitive.
  • Memory virtualization: The VMM maintains shadow page tables mapping guest virtual to host physical addresses. Hardware support: Extended Page Tables (EPT, Intel) and Nested Page Tables (NPT, AMD).
  • I/O virtualization: Device access mediated by the VMM. Paravirtualized drivers (e.g., VirtIO) improve performance.

8.3 Paravirtualization

The guest OS is modified to make hypercalls instead of executing sensitive instructions directly.

  • Advantage: Lower overhead (no trap-and-emulate for every sensitive instruction).
  • Disadvantage: Requires modifying the guest kernel.

8.4 Containers

Containers share the host kernel but provide process isolation via:

  • Namespaces: Isolate process trees, network, file systems, IPC, UTS (hostname).
  • cgroups: Limit resource usage (CPU, memory, I/O).
PropertyVMsContainers
KernelSeparate per VMShared with host
StartupMinutesSeconds
FootprintGBsMBs
IsolationHardware-levelProcess-level
PerformanceNear-native (with EPT)Near-native (minimal)

Docker uses OverlayFS for layered images. Kubernetes orchestrates containers across clusters.

:::caution Common Pitfall Containers do not provide hardware-level isolation. A kernel vulnerability can potentially Compromise all containers on a host. For strong multi-tenant isolation, VMs are preferred. :::

9. Security

9.1 Access Control

Access control lists (ACLs). Each object (file, directory, device) has an associated list of Entries specifying which subjects (users, groups) have which permissions (read, write, execute).

ACL for /data/report.txt:
alice: rw-
bob: r--
@dev: rwx
other: ---
  • Advantage: Flexible; per-object granularity; easy to audit.
  • Disadvantage: Checking permissions requires scanning the list; ACLs can grow large.

Capabilities. Each subject (process) carries a list of capability tokens, each granting access to A specific object with specific rights. The kernel verifies that the process presents a valid Capability.

  • Advantage: Decentralised; no per-object list to scan; efficient for distributed systems.
  • Disadvantage: Capability revocation is difficult; if a process copies a capability, revoking the original does not affect copies (solved by indirection: capabilities point to a kernel-managed object table entry that can be invalidated).
PropertyACLsCapabilities
AssociationWith objectsWith subjects
RevocationEasy (modify the list)Difficult
DelegationRequires policyNatural (copy token)
ImplementationPOSIX, NTFSseL4, Capsicum, FUSE

9.2 Principle of Least Privilege

A subject should receive only the minimum privileges necessary to perform its task. Violations Create unnecessary attack surface.

Application to OS design:

  • Processes: Run with the lowest possible privileges. Web servers should not run as root.
  • System calls: chmod and chown require appropriate ownership; setuid requires root.
  • Kernel modules: Loadable kernel modules have full kernel access; restrict loading to privileged users.
  • Containers: Limit capabilities via docker run --cap-drop ALL --cap-add NET_BIND_SERVICE.

Privilege separation. Split a program into components with different privilege levels. Example: OpenSSH splits into an unprivileged monitor (handles network I/O) and a privileged child (handles Authentication and session setup). A compromise of the monitor does not grant root access.

9.3 Buffer Overflow Prevention

A buffer overflow occurs when a program writes data beyond the bounds of a buffer, potentially Overwriting return addresses, function pointers, or other control data.

Defences:

Stack canaries. A random value (canary) placed between the local variables and the saved Return address on the stack. Before returning, the function checks that the canary is unchanged. If Modified, the program aborts.

  • Implemented in GCC/Clang with -fstack-protector-all.
  • The canary value is randomised per process and stored in a segment register (%fs:40 on x86-64).

Address Space Layout Randomisation (ASLR). Randomises the base addresses of the stack, heap, Libraries, and executable code in each process invocation.

  • Defeats return-to-libc and ROP (Return-Oriented Programming) attacks that rely on known addresses.
  • Entropy: 22—28 bits on 64-bit systems, providing 2222^{22} to 2282^{28} possible layouts.
  • Limitation: information leaks (e.g., pointer disclosure) can defeat ASLR.

Data Execution Prevention (DEP / W^X). Marks memory pages as either writable or executable, Never both. A buffer overflow that injects shellcode into a writable data region cannot execute it.

  • Hardware support: the NX (No-eXecute) bit in page table entries.
  • Compiler support: -Wl,-z,noexecstack (GNU ld).

Control Flow Integrity (CFI). Verifies that indirect branches (function pointers, returns) jump Only to valid targets. Forward-edge CFI checks call targets; backward-edge CFI checks return Addresses using shadow stacks.

  • Implemented in LLVM via -fsanitize=cfi.
  • Hardware support: Intel CET (Control-flow Enforcement Technology).

:::caution Common Pitfall ASLR, stack canaries, and DEP are complementary defences. Relying on any single mechanism is Insufficient. A determined attacker who can read memory can defeat ASLR; a format string Vulnerability can leak canary values; and JIT compilers require writable-and-executable pages. :::

10. Problem Set

Problem 1 — Process States

List all possible state transitions for a process and identify which transition requires the Scheduler, which requires an I/O event, and which is initiated by the process itself.

Solution. (Revision: §2.1)

TransitionTrigger
New → ReadyOS admission
Ready → RunningScheduler (dispatch)
Running → ReadyPreemption (timer)
Running → BlockedI/O or wait request
Blocked → ReadyI/O completion
Running → Terminatedexit() system call

The scheduler triggers Ready → Running and Running → Ready (preemption). I/O events trigger Running → Blocked and Blocked → Ready. The process itself initiates Running → Terminated.

Problem 2 — Kernel Architecture Trade-offs

A microkernel-based OS adds 2 μ\muS of message-passing overhead per system call compared to a Monolithic kernel. If a web server makes 10610^6 system calls per second, what is the total overhead As a fraction of CPU time on a 3 GHz processor?

Solution. (Revision: §1.2)

Total message-passing time per second: 106×2  μs=210^6 \times 2 \;\mu\mathrm{s} = 2 seconds of CPU time per Second. This exceeds available CPU time, making the microkernel approach infeasible at this call Rate without optimisations such as batched IPC or shared-memory channels.

Problem 3 — FCFS Scheduling

Processes arrive in the order P1P_1 (burst 10, arrival 0), P2P_2 (burst 5, arrival 1), P3P_3 (burst 2, arrival 2). Compute waiting time and turnaround time for each process under FCFS.

Solution. (Revision: §2.4)

Gantt: P1(10)P2(5)P3(2)\lvert P_1(10) \rvert P_2(5) \rvert P_3(2) \rvert at times 0, 10, 15, 17.

ProcessWaitingTurnaround
P1P_1010
P2P_2914
P3P_31315
Avg7.3313

P3P_3 waits the longest despite having the shortest burst — the convoy effect.

Problem 4 — SJF Scheduling

Using the same processes as Problem 3, compute the schedule under non-preemptive SJF.

Solution. (Revision: §2.4)

At t=0t = 0Only P1P_1 is available. P1P_1 runs 0—10. At t=10t = 10, P2P_2 (burst 5) and P3P_3 (burst 2) are both ready. SJF selects P3P_3Then P2P_2.

Gantt: P1(10)P3(2)P2(5)\lvert P_1(10) \rvert P_3(2) \rvert P_2(5) \rvert at times 0, 10, 12, 17.

ProcessWaitingTurnaround
P1P_1010
P2P_21015
P3P_3810
Avg611.67

Average waiting time improves from 7.33 to 6 compared to FCFS.

Problem 5 — Round Robin Scheduling

Using the processes from Problem 3 with quantum q=2q = 2Draw the Gantt chart and compute the Average turnaround time.

Solution. (Revision: §2.4)

Gantt: P1(2)P2(2)P3(2)P1(2)P2(2)P1(2)P2(1)P1(2)P1(2)\lvert P_1(2) \rvert P_2(2) \rvert P_3(2) \rvert P_1(2) \rvert P_2(2) \rvert P_1(2) \rvert P_2(1) \rvert P_1(2) \rvert P_1(2) \rvert

Times: 0, 2, 4, 6, 8, 10, 12, 13, 15, 17.

ProcessCompletionTurnaround
P1P_11717
P2P_21312
P3P_364
Avg11

Round Robin gives the fastest turnaround for P3P_3 (4 vs 10 under SJF) at the cost of slower Average turnaround.

Problem 6 — SRTF Scheduling

Processes: P1P_1 (burst 8, arrival 0), P2P_2 (burst 4, arrival 1), P3P_3 (burst 2, arrival 2). Compute the schedule under SRTF (preemptive SJF).

Solution. (Revision: §2.4)

t=0t=0: P1P_1 starts (remaining 8). t=1t=1: P2P_2 arrives (remaining 4). SRTF: 4 <\lt 8, preempt P1P_1. P2P_2 starts. t=2t=2: P3P_3 arrives (remaining 2). SRTF: 2 <\lt 4, preempt P2P_2. P3P_3 starts. t=4t=4: P3P_3 completes. P2P_2 (remaining 4) vs P1P_1 (remaining 8). P2P_2 resumes. t=8t=8: P2P_2 completes. P1P_1 resumes (remaining 6). t=14t=14: P1P_1 completes.

Gantt: P1(1)P2(1)P3(2)P2(4)P1(6)\lvert P_1(1) \rvert P_2(1) \rvert P_3(2) \rvert P_2(4) \rvert P_1(6) \rvert

ProcessCompletionTurnaroundWaiting
P1P_114146
P2P_2873
P3P_3420
Avg7.673
Problem 7 — Critical Section

Show that the following solution to the critical section problem is incorrect (Peterson’s algorithm With the order of flag[i] = true and turn = j swapped):

// Process i:
turn = j;
flag[i] = true;
while (flag[j] && turn == j);
// critical section
flag[i] = false;

Solution. (Revision: §3.1)

Both processes can set turn to the other’s value, then set their own flag. If both execute turn = j and turn = i respectively, then turn ends up as the last writer’s value. Both then Set their flags to true. The while loop checks flag[j] && turn == j. If the interleaving is:

  1. P0P_0: turn = 1
  2. P1P_1: turn = 0
  3. P0P_0: flag[0] = true
  4. P1P_1: flag[1] = true

P0P_0 checks: flag[1] is true and turn == 1? No (turn = 0). P0P_0 enters. P1P_1 checks: flag[0] is true and turn == 0? Yes. P1P_1 waits.

Now consider:

  1. P0P_0: turn = 1
  2. P1P_1: turn = 0
  3. P1P_1: flag[1] = true
  4. P1P_1: checks flag[0] (false) → enters critical section.
  5. P0P_0: flag[0] = true
  6. P0P_0: checks flag[1] (true) and turn == 1 (no, turn = 0) → enters critical section.

Both processes are in their critical section simultaneously. Mutual exclusion is violated.

Problem 8 — Producer-Consumer with Semaphores

In the bounded buffer solution of §3.6, explain why the empty and full semaphores must be Different from the mutex. What goes wrong if we use only mutex (initialised to 1) and count (shared variable)?

Solution. (Revision: §3.6)

If we use only mutex and check count manually:

pthread_mutex_lock(&mutex);
while (count == BUFFER_SIZE) {
pthread_mutex_unlock(&mutex); // release while waiting
// ... but how to re-acquire when space is available?
}

Without empty and full semaphores, the producer must busy-wait or use condition variables. Semaphores provide blocking semantics: the producer blocks on empty when the buffer is full And is automatically woken when a consumer signals empty. Using only mutex either causes Busy-waiting (wasting CPU cycles) or requires the programmer to correctly implement the Wait/signal protocol — which is exactly what semaphores encapsulate.

Problem 9 — Deadlock: Necessary Conditions

A system has three processes and three resources. Each process holds one resource and requests a Second. Is deadlock possible? If so, identify the deadlocked set.

Solution. (Revision: §4.1)

Yes. If P0P_0 holds R0R_0 and requests R1R_1; P1P_1 holds R1R_1 and requests R2R_2; P2P_2 holds R2R_2 and requests R0R_0We have circular wait: P0R1P1R2P2R0P0P_0 \to R_1 \to P_1 \to R_2 \to P_2 \to R_0 \to P_0. All four Coffman conditions hold (mutual exclusion, hold-and-wait, no preemption, Circular wait), so deadlock exists. The deadlocked set is {P0,P1,P2}\{P_0, P_1, P_2\}.

If instead P0P_0 holds R0R_0 and requests R2R_2; P1P_1 holds R1R_1 and requests R0R_0; P2P_2 holds R2R_2 and requests R1R_1The same circular wait exists.

Problem 10 — Banker's Algorithm Safety

Three processes, three resource types. Available = (3,2,1)(3, 2, 1).

ProcessAllocationMaxNeed
P0P_0(1, 0, 0)(3, 2, 1)(2, 2, 1)
P1P_1(1, 0, 1)(2, 1, 2)(1, 1, 1)
P2P_2(1, 1, 0)(2, 1, 1)(1, 0, 1)

Determine if the system is in a safe state.

Solution. (Revision: §4.3)

Work = (3,2,1)(3, 2, 1). All Finish = false.

  1. P1P_1: Need (1,1,1)(3,2,1)(1,1,1) \leq (3,2,1). Execute. Work = (3,2,1)+(1,0,1)=(4,2,2)(3,2,1) + (1,0,1) = (4,2,2). Finish[1] = true.
  2. P2P_2: Need (1,0,1)(4,2,2)(1,0,1) \leq (4,2,2). Execute. Work = (4,2,2)+(1,1,0)=(5,3,2)(4,2,2) + (1,1,0) = (5,3,2). Finish[2] = true.
  3. P0P_0: Need (2,2,1)(5,3,2)(2,2,1) \leq (5,3,2). Execute. Work = (5,3,2)+(1,0,0)=(6,3,2)(5,3,2) + (1,0,0) = (6,3,2). Finish[0] = true.

All Finish = true. Safe sequence: P1,P2,P0\langle P_1, P_2, P_0 \rangle.

Problem 11 — Banker's Algorithm Request

Using the state from Problem 10, determine whether the request from P0P_0 for (1,0,0)(1, 0, 0) can be Granted.

Solution. (Revision: §4.3)

  1. Request0=(1,0,0)Need0=(2,2,1)\mathrm{Request_0} = (1,0,0) \leq \mathrm{Need_0} = (2,2,1). OK.
  2. Request0=(1,0,0)A=(3,2,1)\mathrm{Request_0} = (1,0,0) \leq A = (3,2,1). OK.
  3. Pretend to allocate: A=(2,2,1)A = (2,2,1), Alloc0=(2,0,0)\mathrm{Alloc_0} = (2,0,0), Need0=(1,2,1)\mathrm{Need_0} = (1,2,1).
  4. Safety: Work = (2,2,1)(2,2,1).
  • P0P_0: (1,2,1)(2,2,1)(1,2,1) \leq (2,2,1). Execute. Work = (4,2,1)(4,2,1).
  • P1P_1: (1,1,1)(4,2,1)(1,1,1) \leq (4,2,1). Execute. Work = (5,2,2)(5,2,2).
  • P2P_2: (1,0,1)(5,2,2)(1,0,1) \leq (5,2,2). Execute. Work = (6,3,2)(6,3,2).

All finish. Request granted.

Problem 12 — Page Table Address Translation

A system uses 32-bit virtual addresses with 4 KiB pages. The page table entry is 4 bytes.

(a) How many bits for the page number and offset? (b) How many entries in the page table? (c) What is the size of the page table?

Solution. (Revision: §5.2)

(a) Offset = log2(4096)=12\log_2(4096) = 12 bits. Page number = 3212=2032 - 12 = 20 bits. (b) 220=10485762^{20} = 1048576 entries. (c) 1048576×41048576 \times 4 bytes = 4 MiB.

Problem 13 — TLB and EAT Calculation

A system with a two-level page table. TLB hit ratio = 0.90, TLB access = 2 ns, memory access = 100 ns (for any level). Page fault rate = 0.001, page fault service = 6 ms. Compute the effective Access time.

Solution. (Revision: §5.6)

TLB hit, no fault (0.90×0.9990.90 \times 0.999): 2+100=1022 + 100 = 102 ns. TLB miss, no fault (0.10×0.9990.10 \times 0.999): 2+100+100=2022 + 100 + 100 = 202 ns (TLB + outer PT + inner PT + data). Fault (0.0010.001): 2+100+100+6×106=60000022 + 100 + 100 + 6 \times 10^6 = 6000002 ns.

EAT=0.8991×102+0.0999×202+0.001×6000002\mathrm{EAT} = 0.8991 \times 102 + 0.0999 \times 202 + 0.001 \times 6000002 =91.71+20.18+6000.20=6112.09= 91.71 + 20.18 + 6000.20 = 6112.09 ns 6.11\approx 6.11 μ\muS.

Problem 14 — Page Fault EAT

What page fault rate is needed to ensure the effective access time is no more than 200 ns? Memory Access time = 100 ns, page fault service = 8 ms.

Solution. (Revision: §5.5)

Assume no TLB for simplicity: EAT=(1p)×100+p×8×106\mathrm{EAT} = (1 - p) \times 100 + p \times 8 \times 10^6.

200=100100p+8×106p200 = 100 - 100p + 8 \times 10^6 p 100=7999900p100 = 7999900 p p1.25×105p \approx 1.25 \times 10^{-5}

The page fault rate must be below 0.00125%, or roughly 1 fault per 80,000 accesses. This Illustrates why effective caching is essential.

Problem 15 — LRU Page Replacement

Reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5. Four frames. Compute the number of page Faults under LRU and under FIFO. Does Belady’s anomaly occur?

Solution. (Revision: §5.7)

FIFO (4 frames):

RefF1F2F3F4Fault?
11Yes
212Yes
3123Yes
41234Yes
11234No
21234No
55234Yes
15134Yes
25124Yes
35123Yes
45123No (wait, 4 not in frames)

Let me redo FIFO more carefully. FIFO replaces the oldest page in memory.

Frames after each reference (FIFO queue, front = oldest):

RefQueue (oldest→newest)Fault?
1[1]Yes
2[1, 2]Yes
3[1, 2, 3]Yes
4[1, 2, 3, 4]Yes
1[1, 2, 3, 4]No
2[1, 2, 3, 4]No
5[2, 3, 4, 5]Yes
1[3, 4, 5, 1]Yes
2[4, 5, 1, 2]Yes
3[5, 1, 2, 3]Yes
4[1, 2, 3, 4]Yes
5[2, 3, 4, 5]Yes

FIFO faults: 10.

With 3 frames: faults = 9 (from Belady’s anomaly example in the text). With 4 frames: faults = 10. Belady’s anomaly occurs: more frames produce more faults.

LRU (4 frames): LRU replaces the least recently used page.

RefFramesFault?
1[1]Yes
2[1, 2]Yes
3[1, 2, 3]Yes
4[1, 2, 3, 4]Yes
1[1, 2, 3, 4]No
2[1, 2, 3, 4]No
5[5, 2, 3, 4]Yes
1[5, 1, 3, 4]Yes
2[5, 1, 2, 4]Yes
3[5, 1, 2, 3]Yes
4[4, 1, 2, 3]Yes
5[4, 5, 2, 3]Yes

LRU faults: 10. LRU does not exhibit Belady’s anomaly (it is a stack algorithm), but in this Particular reference string, 3 and 4 frames happen to produce the same count under LRU.

Problem 16 — Belady's Anomaly

Prove that FIFO can exhibit Belady’s anomaly by constructing a counterexample with 2 and 3 frames.

Solution. (Revision: §5.7)

Reference string: 1, 2, 3, 1, 2.

2 frames: | Ref | F1 | F2 | Fault? | | --- | — | — | ------ | | 1 | 1 | | Yes | | 2 | 1 | 2 | Yes | | 3 | 3 | 2 | Yes | | 1 | 3 | 1 | Yes | | 2 | 3 | 2 | Yes |

Faults: 5.

3 frames: | Ref | F1 | F2 | F3 | Fault? | | --- | — | — | — | ------ | | 1 | 1 | | | Yes | | 2 | 1 | 2 | | Yes | | 3 | 1 | 2 | 3 | Yes | | 1 | 1 | 2 | 3 | No | | 2 | 1 | 2 | 3 | No |

Faults: 3.

Here 3 frames produces fewer faults (3 vs 5), so this is not a counterexample. The original Counterexample from the text (1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 with 3 vs 4 frames) remains Valid: 3 frames → 9 faults, 4 frames → 10 faults.

Problem 17 — Working Set and Thrashing

A system has 40 frames. Process P1P_1 has a working set of 15 pages, P2P_2 has 12 pages, and P3P_3 Has 18 pages. Can all three run simultaneously without thrashing? What if P4P_4 with a working set Of 8 pages is added?

Solution. (Revision: §5.8)

Without P4P_4: total working set = 15+12+18=45>4015 + 12 + 18 = 45 \gt 40. Thrashing occurs. Only two Processes can run concurrently (e.g., P1+P2=2740P_1 + P_2 = 27 \leq 40Or P2+P3=3040P_2 + P_3 = 30 \leq 40).

With P4P_4: total = 15+12+18+8=53>4015 + 12 + 18 + 8 = 53 \gt 40. Even worse. Using working set admission, we Would run at most two processes. The best combination that fits is P1+P3=33P_1 + P_3 = 33 or P2+P3=30P_2 + P_3 = 30.

Problem 18 — File Allocation

A file system uses contiguous allocation with 512-byte blocks. A file is created at block 1000 and Grows to 5000 bytes. Blocks 1000—1009 are allocated. The file then grows by 2000 bytes, but Blocks 1010—1012 are occupied by another file. How does the file system handle this growth?

Solution. (Revision: §6.2)

The file currently occupies blocks 1000—1009 (10 blocks for 5000 bytes). To grow by 2000 bytes, it Needs 4 more blocks. Blocks 1010—1012 are occupied, so contiguous extension is impossible.

Options:

  1. Move the file: Copy blocks 1000—1009 to a new location with 14 consecutive free blocks. Expensive for large files.
  2. Extents: Use multiple extents. The file has extent 1 = blocks 1000—1009 and extent 2 = blocks 1016—1019 (wherever 4 free blocks are found). The directory stores a list of extents.
  3. Switch to linked or indexed allocation: Not practical once the file system is in use.

This illustrates the primary disadvantage of contiguous allocation: inability to grow files Dynamically without costly relocation.

Problem 19 — Disk Scheduling

A disk has 500 cylinders (0—499). The request queue contains: 55, 58, 39, 18, 90, 160, 150, 38, 184. The head is at cylinder 50, moving toward 0. Compute the total head movement for SCAN and C-SCAN.

Solution. (Revision: §7.3)

SCAN (moving left first): 50 → 39 → 38 → 18 → 0 → 55 → 58 → 90 → 150 → 160 → 184. Total = (5018)+18+184=32+18+184=234(50-18) + 18 + 184 = 32 + 18 + 184 = 234 cylinders.

Wait, let me be more precise. Head starts at 50, moving toward 0. 50 → 39: 11 39 → 38: 1 38 → 18: 20 18 → 0: 18 0 → 55: 55 55 → 58: 3 58 → 90: 32 90 → 150: 60 150 → 160: 10 160 → 184: 24 Total = 11+1+20+18+55+3+32+60+10+24=23411 + 1 + 20 + 18 + 55 + 3 + 32 + 60 + 10 + 24 = 234 cylinders.

C-SCAN (moving left, then jump right): 50 → 39 → 38 → 18 → 0 → 499 → 184 → 160 → 150 → 90 → 58 → 55. Total = 11+1+20+18+499+315+24+10+60+32+3=99311 + 1 + 20 + 18 + 499 + 315 + 24 + 10 + 60 + 32 + 3 = 993 cylinders.

SCAN is far more efficient here because the requests are clustered between 18 and 184.

Problem 20 — Security: Buffer Overflow and Defences

Consider the following C function:

void vulnerable(char *input) {
char buffer[64];
strcpy(buffer, input);
}

(a) Explain how a buffer overflow attack works against this function. (b) Which defences (from §9.3) would prevent the attack? Explain how each works in this context.

Solution. (Revision: §9.3)

(a) If input exceeds 63 characters, strcpy writes past the end of bufferOverwriting the Stack frame. On x86-64, the saved return address is located above the buffer on the stack. An Attacker can overwrite the return address with the address of injected shellcode (also placed in The buffer), gaining control of execution when the function returns.

(b) Defences:

  • Stack canary: A random value is placed between buffer and the return address. The overflow would overwrite the canary. Before returning, the function checks the canary and aborts if it has been modified.
  • ASLR: Randomises the stack base address, so the attacker cannot predict where buffer (and therefore the shellcode) will be in memory.
  • DEP (W^X): Marks the stack as non-executable. Even if the attacker overwrites the return address to point to the shellcode in bufferThe CPU will fault when trying to execute it.
  • CFI: Verifies that the return address points to a valid call site. A crafted address injected by the overflow would fail the CFI check.

Common Pitfalls

  1. Neglecting to normalise database designs, leading to data redundancy and update anomalies.

  2. Confusing an algorithm with a program — an algorithm is a step-by-step procedure, not its implementation in code.

  3. Misunderstanding the difference between a stack (LIFO) and a queue (FIFO) in data structure applications.

  4. Mixing up Big O, Big Ω\Omega, and Big Θ\Theta notation — Big O is an upper bound, not necessarily tight.

Worked Examples

Example 1: Process Scheduling — Round Robin

Problem. Three processes arrive at time 0 with burst times: P1 = 24 ms, P2 = 3 ms, P3 = 3 ms. Calculate average waiting time with Round Robin (quantum = 4 ms).

Solution. Execution order: P1(4), P2(3), P3(3), P1(4), P1(4), …, P1(4).

ProcessFinish TimeWait Time
P1306
P274
P3107

Average waiting time = 6+4+73=5.67\frac{6 + 4 + 7}{3} = 5.67 ms.

Compare with FCFS: P1 waits 0, P2 waits 24, P3 waits 27, average = 17 ms.

\blacksquare

Example 2: Banker’s Algorithm for Deadlock Avoidance

Problem. System has 3 resource types (A=10, B=5, C=7). Current allocation and maximum need for processes P0, P1, P2 are given. Can P1 request (1,2,2)?

Solution. Need for P1 = Maximum − Allocation = (1,2,2). Request (1,2,2) = Need, so valid.

After granting: available = (4,2,3). P1 can finish (need 0,0,0 ≤ available 4,2,3) → release resources. P0 can then finish. Safe sequence exists. Request granted.

\blacksquare

Summary

  • Process management: states (ready, running, waiting, terminated); context switching overhead; PCB stores process state.
  • CPU scheduling algorithms: FCFS, SJF, Round Robin, Priority; trade-offs between throughput, latency, and fairness.
  • Memory management: paging eliminates external fragmentation; virtual address → physical address via page tables and TLB.
  • Deadlock: four necessary conditions; prevention, avoidance (Banker’s algorithm), detection and recovery strategies.
  • File systems: inodes store metadata; journaling (ext4) and copy-on-write (ZFS, Btrfs) improve crash recovery.

Cross-References

TopicSiteLink
Advanced Operating SystemsWyattsNotesView
Computer NetworksWyattsNotesView
DatabasesWyattsNotesView
Operating Systems — MIT 6.S081MITView