Skip to content

File Systems

6.1 File Concepts

A file is a named collection of related data on secondary storage.

File attributes: Name, identifier (inode number), type, location, size, protection, timestamps.

File operations: create``delete``open``close``read``write``seek``truncate.

Access methods:

  • Sequential access: Read bytes in order (tapes, log-structured files).
  • Direct (random) access: Read/write at arbitrary offsets (disks).
  • Indexed access: Access via an index structure (B-trees, hash tables).

6.2 File Allocation Methods

Contiguous allocation. Each file occupies a contiguous set of blocks on disk. The directory entry Stores the starting block and length.

  • Advantage: Excellent for sequential and direct access; minimal seek time.
  • Disadvantage: External fragmentation; difficulty finding space for large files; cannot grow dynamically.

Linked allocation. Each block contains a pointer to the next block. The directory entry stores The starting block and the end block (or a count).

  • Advantage: No external fragmentation; file can grow dynamically.
  • Disadvantage: No direct access (must follow pointers); pointer overhead per block; reliability (a broken pointer loses the rest of the file).

Indexed allocation. The directory entry points to an index block that contains an array of Pointers to data blocks. Supports direct access without external fragmentation.

  • Advantage: Direct access; no external fragmentation.
  • Disadvantage: Index block may be too small for large files (solved by linked index blocks or multi-level indexing, as in the inode scheme of §6.3).
MethodSequentialDirectExternal Frag.Internal Frag.
ContiguousExcellentYesYesPossible
LinkedGoodNoNoNone
IndexedGoodYesNoIndex block
Worked Example 6.1 — Allocation Comparison

A file of 10 blocks is stored on a disk. The disk has blocks at positions: 0 (free), 1 (used), 2-5 (free), 6 (used), 7-9 (free), 10-15 (free), 16 (used), 17-31 (free).

Contiguous: Needs 10 consecutive free blocks. Largest free run is 16-31 (16 blocks). File stored At blocks 17—26. Access to block kk: position 17+k17 + k.

Linked: Blocks can be scattered. E.g., 2, 3, 4, 5, 7, 8, 9, 10, 11, 12. To read block 7, must Traverse 7 pointers.

Indexed: Index block at position 0. Index block contains pointers to data blocks 2, 3, 4, 5, 7, 8, 9, 10, 11, 12. To read block 7, access index block position 7 (one seek + one read).

6.3 Directory Structures

StructureDescription
Single-levelAll files in one directory
Two-levelSeparate directory per user
TreeHierarchical tree with subdirectories
Acyclic graphTree with shared subdirectories (hard links)
General graphAllows cycles; requires garbage collection

Path resolution. A path name is resolved by traversing the directory tree from the root (/) Or the current working directory. Each component is looked up in the current directory to find the Next inode. Symbolic links (symlinks) store a path string that is substituted during resolution; Circular symlinks are detected by tracking the link count.

Hard links vs symbolic links. A hard link creates an additional directory entry pointing to the Same inode (increments the link count). A symbolic link is a special file containing a path string. Hard links cannot cross file systems; symbolic links can. Deleting the original file does not affect A symbolic link but removes one reference from a hard link (the inode is freed only when the link Count reaches zero).

6.4 Inode-Based File Systems

An inode (index node) stores metadata about a file (but not the file name, which is in the Directory entry):

FieldDescription
ModeFile type and permissions
OwnerUser and group IDs
SizeFile size in bytes
TimestampsLast access, modification, inode change
Link countNumber of hard links to this inode
Block pointersDirect, indirect, double-indirect, triple-indirect
BlocksNumber of blocks allocated

Block pointer scheme (ext4 with 4 KiB blocks, 4-byte pointers):

  • Direct blocks: 12 pointers to data blocks (48 KiB).
  • Single indirect: 1 pointer to a block of 1024 pointers (4 MiB).
  • Double indirect: 1 pointer to a block of 1024 blocks of 1024 pointers (4 GiB).
  • Triple indirect: 1 more level of indirection (4 TiB).

Maximum file size: 48  KiB+4  MiB+4  GiB+4  TiB4  TiB48\;\mathrm{KiB} + 4\;\mathrm{MiB} + 4\;\mathrm{GiB} + 4\;\mathrm{TiB} \approx 4\;\mathrm{TiB}.

6.5 Journaling File Systems

The problem. A crash during a file system update can leave inconsistent metadata (orphaned Inodes, lost blocks, incorrect free block count).

Journaling. Before modifying the file system, write a description of the modifications to a Dedicated journal. After a crash, replay the journal to restore consistency.

Journaling modes (ext4):

ModeDescription
orderedMetadata journaled; data written before metadata commit (default)
writebackMetadata journaled; data not journaled (fastest, data loss risk)
journalBoth metadata and data journaled (safest, slowest)

Write-ahead logging (WAL). The journal entry is written to stable storage before the actual Modification. Recovery scans the journal: committed entries are applied; uncommitted entries are Discarded.

Other journaling file systems: NTFS (Windows), APFS (macOS), XFS, ZFS.

6.5a RAID Levels

RAID (Redundant Array of Independent Disks) combines multiple disks for performance, reliability, Or both.

LevelMin DisksDescriptionFault ToleranceWrite Performance
01Striping only, no redundancyNoneExcellent
12Mirroring every disk1 diskModerate
23Bit-level striping with Hamming code ECC1 diskPoor
33Byte-level striping with dedicated parity1 diskModerate
43Block-level striping with dedicated parity1 diskModerate
53Block-level striping with distributed parity1 diskGood
64Block-level striping with dual parity2 disksModerate
104Mirrored sets of stripes (RAID 1 + RAID 0)1 disk per mirrorExcellent
506Parity sets of stripes (RAID 5 + RAID 0)1 disk per setGood

Effective capacity. For nn disks of capacity CC:

  • RAID 0: nCnC
  • RAID 1: nC/2nC / 2
  • RAID 5: (n1)C(n - 1)C
  • RAID 6: (n2)C(n - 2)C
  • RAID 10: nC/2nC / 2

Mean time to failure (MTTF). If each disk has \mathrm{MTTF_}{\mathrm{disk}} and the MTTR (mean time to repair) is TrepairT_{\mathrm{repair}}:

\mathrm{MTTF_}{\mathrm{RAID}\;5} \approx \frac{\mathrm{MTTF_}{\mathrm{disk}^2}{n(n-1) \cdot T_{\mathrm{repair}}}}

RAID 5 significantly improves reliability for large arrays, but the rebuild time grows with disk Capacity, increasing the window of vulnerability.

Worked Example 6.2 — RAID Capacity and Reliability

Eight 2 TiB disks. \mathrm{MTTF_}{\mathrm{disk} = 1.2 \times 10^6} hours, Trepair=24T_{\mathrm{repair} = 24} hours.

RAID 0: Capacity = 8×2=168 \times 2 = 16 TiB. No fault tolerance. MTTF = \mathrm{MTTF_}{\mathrm{disk} / 8 = 150000} hours.

RAID 5: Capacity = 7×2=147 \times 2 = 14 TiB. \mathrm{MTTF_}{\mathrm{RAID}\;5} \approx \frac{(1.2 \times 10^6)^2}{8 \times 7 \times 24} = \frac{1.44 \times 10^{12}}{1344} \approx 1.07 \times 10^9 hours.

RAID 6: Capacity = 6×2=126 \times 2 = 12 TiB. Tolerates 2 simultaneous disk failures. MTTF is orders of Magnitude higher than RAID 5.

6.6 FUSE

FUSE (Filesystem in Userspace) allows non-privileged users to create file systems without Modifying the kernel. The FUSE kernel module routes VFS operations to a user-space daemon.

Workflow:

  1. User program calls a file system operation (e.g., read).
  2. VFS routes the call to the FUSE module.
  3. FUSE sends the request to the user-space daemon via /dev/fuse.
  4. The daemon processes the request and responds.
  5. FUSE returns the result to the user program.

Examples: sshfs (mount remote directories over SSH), ntfs-3g (NTFS on Linux), bindfs (remap permissions).

6.7 Free Space Management

MethodDescription
BitmapOne bit per block; 1 = free, 0 = used. Simple and fast.
Linked listFree blocks linked together. Slow to traverse.
GroupingFirst free block stores addresses of nn free blocks.
CountingTrack contiguous free blocks as (start, count) pairs.

6.8 File System Consistency

Consistency checking (fsck). Scans the file system for:

  • Orphaned inodes: Inodes not referenced by any directory entry.
  • Incorrect link counts: Link count does not match actual directory entries.
  • Duplicate blocks: Two inodes claiming the same data block.
  • Missing/free blocks: Blocks marked as used but not referenced, or vice versa.

Journaling vs. fsck. Journaling eliminates the need for full fsck after a crash. Only the Journal needs to be replayed, which takes seconds instead of hours for large file systems.