Cache
Introduction
Memory is an Important resource in all embedded systems, any program executing on the core will be needing memory for its execution and data transactions. In Programs execution time, memory transaction plays important roles. Faster the memory transaction lesser the execution time. In General system architecture , there will be CPU memory (registers), cache, DDR, flash. These memories are utilized in design in such a way that program’s execution will be faster with lesser cost. Article’s focus on cache memory, will be seeing how cache memory works & plays an important role in speeding the execution.
Cache
Cache memory is one of the fastest memories which plays an important role in faster execution. Cache memories are closely coupled with main core. Cache memories are costlier one, to Make it cost effective cache are divided into three levels.
L3 cache is faster than main memory (DDR/RAM).
As showed in below pictures caches are placed on a SOC (system on chip). L1 & L2 are private and tightly coupled with the core where as L3 is shared among multiple cores. L3 caches connected with DDR/RAM with the help of system bus which are governed by memory controllers.
Now we know what caches are and where they are placed in system let’s see How cache’s works to reduce latency in execution. (The time needed to access data from memory is called "latency.")
How cache help to reduce latency??
Programs are stored into secondary memory i.e. HDD/flash Drive. When Any program is invoked for execution. The initial few execution code segments are pulled into main memory i.e. RAM. Similar when processor start processing code/data the similar code/data segments will be pulled into cache.
when processor want to re-access the similar data it will fetch from cache NOT RAM as Caches are faster, the accessing time is less hence latency is less. When processor finds data in cache it’s called cache hit. There will be cases when needed data segment are not present in cache then it is cache miss, In cache miss case needed code segments will be fetched from RAM and put into cache.
We need to ensure the design should be such way where cache miss probabilities are less for good performance.
Hit Ratio = Cache hit /Number of Access
Miss Ration= 1-Hit Ratio.
Note: For good design cache Hit ratio should be more than 90%
Structural design of cache (Memory wise)
As per showed fig. caches are designed.
Cache Lines has the max number of storages which can be transferred between cache and RAM. Cache line has a valid bit, to identify Memory is valid or not. multiple cache lines are stacked together with index together they form one unit of stack. Multiple stack units together form a cache set. Along with each cache line there is RAM Tag bits Which helps to fetch RAM memory address.
Let’s Evaluate how one memory location stored in cache.
Example:
Consider 32-bit RAM address system which can allocate 4GB of space. Our max Address will be of 32 bit. Same should be able to store by cache too.
Note: All Addresses are logical not physical. Because processor works on logical address only. physical address is computed by MMU.
RAM address: 0x12341200
Tag RAM = 0x12341 (LSB)
Index = 0x2
Offset = 0
Based on Above division one cache line should be able to store 256 bytes. And in next line address will be 0x12341300 where index will be 3 which will be new line. Based on How much cache line size is available above tag and index will be vary. Please refer below cache allocation for clarification.
For Above example we used set-associate type mapping. This address mapping can be done in different ways too.
Mapping types of cache
Based on Above information we can say caches are smaller memory. In case of multiple applications running concurrently there will be different memory requirements which will easily fill up cache. Ultimately which increase cache miss possibility. More cache miss means latency is higher. To overcome this problem without increasing size of cache TLB’s (Translation look-aside buffers) are introduced.
领英推荐
(Just to Know)
A?translation lookaside buffer?(TLB) is a memory?cache?that stores the recent translations of?virtual memory?to?physical memory i.e. page table Entries. It is used to reduce the time taken to access a memory location. Whenever we encounter cache miss scenarios, we can search for similar addresses in TLB is it is present in the TLB we can fetch similar data by using physical address stored in TLB. It will optimize the latency for searching the address in page table which is stored in RAM.
Problem with multi core execution cache
We know how caches are structured and where they are located on SOC. Though cache helps to reduce the latency, but it produces few problems too if not designed well. From fig 1 we know every core has its own L1 & L2 cache. Where L2 can share code and data both. Please follow below scenario to understand cache coherency problem.
Scenario: consider one application has 2 threads T1 & T2. Both access “global_var” protected by Locks. So ideally it is expected that there should be no corruption and values should be expectedly 0 (please follow code).
If your variable in non-volatile then variable write/read operation won’t go till main memory. operations happen over cache and at the end values are committed to main memory RAM. This creates the problem in case of multi core. ?Consider current global variable value is 23, both core executing threads which will modify global_var shared variable.
Shared variable value got corrupted here, data is no more correct.
Note: with similar code you might not see cache coherency problem because Hardware cache coherency protocols will be enabled on your device
Why data corruption happened?
Ans: ?shared variable is allowed to be cached in the core cache. Shared variable was not volatile so all frequent modification will be done on cache (won’t update value in the RAM for every operation). If every time for data fetch if we go to main memory, it kills the purpose of having cache to reduce latency.
?
Solution to the cache coherency Problem
Cache coherency problem can be solved by Hardware as well by software. The problem occurred because we have one copy in the main memory and one in each cache memory When one copy of shared data is changed then other copies of the data ?must be changed other wise inconsistent copies, this creates a cache coherence problem. Solution to this problem will be making discipline that ensures that changes in the values of shared data ?are propagated throughout the system in a timely fashion.
?
We saw cache structure has invalid bit to indicate memory status valid or not. Cache coherency protocol uses similar approaches to indicate to current memory status. Weather it is needed to fetch again or not.
Coherency mechanisms:?
§? Directory-based – In a directory-based system, the data being shared is placed in a common directory that maintains the coherence between caches. The directory acts as a filter through which the processor must ask permission to load an entry from the primary memory to its cache. When an entry is changed, the directory either updates or invalidates the other caches with that entry.
§? Snooping – Snooping is a process where the individual caches monitor address lines for accesses to memory locations that they have cached. It is called a write invalidate protocol. When a write operation is observed to a location that a cache has a copy of and the cache controller invalidates its own copy of the snooped memory location.
§? Snarfing – It is a mechanism where a cache controller watches both address and data to update its own copy of a memory location when a second master modifies a location in main memory. When a write operation is observed to a location that a cache has a copy of the cache controller updates its own copy of the snarfed memory location with the new data.
With above coherency mechanism below coherency protocols are implemented.
Modified – It means that the value in the cache is dirty, that is the value in current cache is different from the main memory
Exclusive – It means that the value present in the cache is same as that present in the main memory, that is the value is clean
Shared – It means that the cache value holds the most recent data copy and that is what shared among all the cache and main memory as well
Owned – It means that the current cache holds the block and is now the owner of that block, that is having all rights on that block
Invalid – This states that the current cache block itself is invalid and is required to be fetched from other cache or main memory.