Memory Hierarchy
When we think of memory, we imagine a large array of bytes that the CPU can access in constant time. Modern storage systems are complex, with various types of storage technologies varying in size and access time by the CPU. These storage technologies form a hierarchy based on their access time by the CPU and underlying hardware cost. Memory closer to the CPU has lower access latency but is more expensive, while memory further from the CPU has higher latency but is cheaper. Let’s discuss each of these technologies in detail.
// TBD - memory hierarchy diagram here
RAM (Random Access Memory)
RAM is packaged as an IC chip with memory cells. The basic storage unit is a cell that represents 1 bit. Many of these chips form the Main memory. They come in 2 varieties - SRAM & DRAM. Refer to the table below for the difference between them.
SRAM | DRAM |
---|---|
6 transistors per bit | 1 Transistor per bit |
10x faster than DRAM | 10x Slower than SRAM |
1000x expensive | Cheaper |
A typical desktop system will have no more than a few tens of megabytes of SRAM. | A typical Desktop system has GBs of RAM |
Used in Cache Memories | Used in Main memory, frame buffers of Graphic cards |
Both DRAM & SRAM are volatile memory, i.e., they lose information when switched off. Non-volatile memories are called ROM (read-only memory). These disk storage (ROM) come as rotating (HDD) or flash-based (SSD) disks.
Data flows from the CPU to the main memory (DRAM) via a bus interface. You can imagine buses as a collection of wires that carry addresses, data, and control signals. This is a trivial understanding of how the CPU reads data from memory:
- CPU places the address on the memory bus
- Main memory reads the address from the bus, retrieves the data from that address, and put it back on the memory bus
- The CPU reads and loads the memory bus data in the CPU register.
Please refer to fig (1.1) below to see how different components are linked together. Please note that while the figure is helpful for understanding, real-world designs are far more complex and proprietary.
// TBD parts of the CPU and I/O buses
Disk Storage
They are lower in the memory hierarchy and can store thousands of GBs of information, but accessing them takes orders of magnitude of milliseconds. They are a hundred thousand times slower than DRAM and a million times slower than SRAM.
Imagine a disk as a combination of multiple disks rotating anticlockwise around a spindle. Data is stored on the surface in concentric circles called tracks in sectors. Data is accessed by an arm that moves straight across these tracks on a surface. Multiple arms access data from multiple disk surfaces. I have added a basic diagram (fig. 1.2) of disk geometry.
// TBD - Diagram of disk geometry here
Disk access time depends on three factors:
- Seek time: Time taken by the arm head to move to the correct track on the surface.
- Rotational time: When the arm head moves to the correct track, it waits for the first bit of the sector to arrive underneath it. The time it takes for this to happen is rotation time and depends on the speed of the disk's rotation.
- Transfer time refers to the time taken to read or write the remaining bits of the sector after the first bit is read.
Seek time and rotational time are the main bottlenecks and take roughly the same time, while transfer time is minimal. Hence, the access time of the disk is defined as twice the seek time.
Solid State Drives (SSD)
TBD
How does the CPU access data from the Disk?
This is a broad topic, and I'll only share one technique here to give you an idea. CPU issues commands to I/O devices using a technique called memory-mapped I/O. In a memory-mapped I/O system, memory and peripheral devices share the same address space. A block of addresses is reserved for I/O devices. This is the simple flow of how the CPU reads data:
- CPU issues 3 instructions to the disk controller. First, to initiate a read from disk. Second is the sector block (logical block number) that is to be read from disk. Third is the main memory address where the contents should be stored after reading from disk. CPU then proceeds with other instructions while the disk performs the read.
- The disk controller reads data from the Disk and copies contents to the Memory without the involvement of the CPU. This transfer is also known as Direct Memory Access (DMA) transfer**.**
- After data is saved in the memory, the disk controller sends an interrupt to the CPU, known as an I/O interrupt. An interrupt is a signal emitted by an external device or a software program. CPU stops what it’s working on and acknowledges that the interrupt is received. We discussed Signals in detail here.
Caching
Cache is a general idea that can be applied to all levels of memory hierarchy. The cache works as a staging area for data objects at all levels. If a program needs a data object from level k+1 in the memory hierarchy, it first looks for data at level k.
If the object is found at level k, it’s called a cache hit. If the data is not found at level k, it’s called cache miss. In this case, the cache at level k fetches the object from level k+1. If there are no empty blocks at level k, then a particular block is overridden with the new block. This is called block eviction and is governed by the cache replacement policy. The most common one is the Least Recently Used (LRU).
Storage devices become cheaper, slower, and larger as we go from higher to lower levels. Each of these layers (k) holds cache data from the (k+1) level. Data is transferred back and forth from level k to k+1. Block sizes between any two consecutive levels are fixed, but they can vary as we move through the memory hierarchy. For example, register and L1 use word-size blocks (32 or 64-bit), while data from L4 (main memory) and L5 (disk) use MBs of bytes.
Well-written programs that access data stored in the same or nearby levels of the memory hierarchy tend to show good locality. This is known as the Principle of Locality, and we have discussed it here. It is one of the important concepts for optimizing a program's performance.