TLB stands for translation-lookaside buffer.
A TLB is part of the chip’s memory-management unit (MMU
), and is simply a hardware cache
of popular virtual-to-physical address translations.; thus, a better name would be an address-translation cache
Upon each virtual memory reference, the hardware first checks the TLB to see if the desired translation is held therein; if so, the translation is performed (quickly) without having to consult the page table (which has all translations). Because of their tremendous performance impact, TLBs in a real sense make virtual memory possible. 有了 TLB,在访问 memory 的时候,如果 desired translation 在 TLB,则直接 translation,而不用经过 page table 的转义。
TLB Basic Algorithm
假设:
- linear page table
- a hardware-managed TLB
The algorithm the hardware follows works like this: first, extract the virtual page number (VPN
) from the virtual address (Line 1 in Figure 19.1), and check if the TLB holds the translation for this VPN (Line 2). If it does, we have a TLB hit, which means the TLB holds the translation. Success! We can now extract the page frame number (PFN) from the relevant TLB entry, concatenate that onto the offset from the original virtual address, and form the desired physical address (PA), and access memory (Lines 5–7), assuming protection checks do not fail (Line 4).
If the CPU does not find the translation in the TLB (a TLB miss
), we have some more work to do.
Example: Accessing An Array
assume we have an array of 10 4-byte integers in memory, starting at virtual address 100. Assume further that we have a small 8-bit virtual address space, with 16-byte pages; thus, a virtual address breaks down into a 4-bit VPN (there are 16 virtual pages) and a 4-bit offset (there are 16 bytes on each of those pages).
As you can see, the array’s first entry (a[0]) begins on (VPN=06, offset=04); only three 4-byte integers fit onto that page. The array continues onto the next page (VPN=07), where the next four entries (a[3] … a[6]) are found. Finally, the last three entries of the 10-entry array (a[7] … a[9]) are located on the next page of the address space (VPN=08).
Now let’s consider a simple loop:
The elements of the array are packed tightly into pages (i.e., they are close to one another in space), and thus only the first access to an element on a page yields a TLB miss.
CISC computers use hardware-managed TLB
RISC computers use software-managed TLB
TLB Contents: What’s In There?
A TLB entry might look like this:
$$VPN | PFN | other bits$$
The TLB is known as a fully-associative cache
). The hardware searches
the entries in parallel
to see if there is a match
TLB Issue: Context Switches
With TLBs, some new issues arise when switching between processes (and hence address spaces).
The TLB contains virtual-to-physical translations that are only valid for the currently running process; these translations are not meaningful for other processes. As a result, when switching from one process to another, the hardware or OS (or both) must be careful to ensure that the about-to-be-run process does not accidentally use translations from some previously run process.
Assuming two process run in CPU and the TLB will be like following:
In the TLB above, we clearly have a problem: VPN 10 translates to either PFN 100 (P1) or PFN 170 (P2), but the hardware can’t distinguish which entry is meant for which process.
- One approach is to simply flush the TLB on context switches, thus emptying it before running the next process.
However, there is a cost: each time a process runs, it must incur TLB misses as it touches its data and code pages. If the OS switches between processes frequently, this cost may be high - To reduce this overhead, some systems add hardware support to enable sharing of the TLB across context switches. In particular, some hardware systems provide an
address space identifier (ASID)
field in the TLB.
Thus, with address-space identifiers, the TLB can hold translations from different processes at the same time without any confusion.