Introduction to Cache Memory
Memory hierarchy

Introduction to Cache Memory

Microcontrollers are the heart of every embedded system and in the last decades we have seen an increase in their capabilities starting from 32-Bits , 64-Bits microcontrollers to multi level cache supported controllers and due to that many embedded software engineers don't know what is a cache memory as it hasn't been widely used in microcontrollers before so in the following article we will try to answer the following, Why do we need a cache memory, What is a cache memory, How cache memory works , What makes cache memory works, How data is organized between cache memory and processor's main memory , Cache memory types.

1-What is cache memory?

A Cache memory is a fast memory with limited size that is much faster than CPU main memory and more cost efficient than CPU registers it is used as a buffer between High speed CPUs and RAM. But why do you think we need this type of memory in our systems ?

2-Why do we need a cache memory?

To understand the need for a cache memory let's take this example as a use case, Your system consists of a microcontroller which has a processor that runs on a 90MHZ clock frequency this means this processor can do up to 90 million operations at one second with clock period of 11 nanosecond, you run a program over this processor which contains multiple of load and store instructions to and from the RAM ,The RAM would take multiple clock cycles in order to get the data from it to the processor or from the processor to the RAM this latency or delay will put the processor in a state were it is wasting it's time waiting for RAM to finish its job without actually doing any thing and so it is wasting it's clock cycles in a waiting state. This wait comes mainly from the fact that processors has become very fast corresponding to the speed of different memories which makes it not efficient for processor to waste all this time waiting for memory to finish it's operation. So what do you think could be a solution for this problem? well you might think that memories might be slow but what about Registers? we know that registers are very fast what if we could construct our memory from the same technology like registers is not this going to solve the problem? well technically yes but economically no building big memories from the same technology as registers would lead to a huge cost in construction which means an increase in the price of all electronics products so this solution is not applicable. So what do you think about getting a solution that is between both sides a Fast memory with a limited size that could be used to buffer data from slow memories like RAM to and from the processor this should solve the problem actually and this is our solution and this is what we call a cache memory. Now we know what are cache memories and why do we need them but how do they actually work and how does the processors use them?

3- How cache memory works ?

As explained above cache memory is a buffer between RAM and the processor this means that the cache memory will hold the most frequently used and needed data and instructions by the processor inside it and when the processor needs to read or write this data it will interface with the cache memory instead of RAM memory and since Cache memory is faster than RAM we optimized the read and write speed from the memory with the processor allowing the processor to save more clock cycles to spend on more useful tasks to do rather than waste it's time waiting for RAM memory. So now we now basically how Cache is placed inside our system but let's get into more details on how reading and writing in cache actually works, Every time the CPU needs to read a data or an instruction it first goes to the cache memory and check if this data is placed there or not if the data is placed inside the cache then this is known as a cache hit and the data is read from the cache directly but what if the data is not inside the cache? then this is what we call a cache miss and then the data is read from the Main memory and placed inside the cache for further reading.so here we defined a new parameter which is what is called a cache hit ratio we can define a cache hit ration with the following formula, and whenever the cache hit ratio increases this means the more the efficiency of the cache so the engineering problem here is to get the cache hit ratio maximized to 100% as much as possible.

Hit ratio = hit / (hit + miss) =  no. of cache access hits/total accesses        

We discussed how cache memory is used for reading but what about writing? well writing is same as reading as processor need to check if the address is present in the cache which is a cache hit and then the processor would write the data inside the cache the next part that is different than reading how a cache would synchronize the data that has been written to it with the actual main memory to avoid memory corruption? well writing in cache has two modes and they are as the following

3.2.1 Write-Through

In this mode the main memory is directly written after a cache has been written this means that the main memory is going to always have the data correctly but this will decrease the efficiency as every time we write in cache we have to also write in RAM and this takes us to the next writing mode which is copy-back mode.

3.2.2 Copy-Back

In this mode the writing from cache to the main memory is postponed till the point where the cache block is getting replaced with a new word this is done by using a flag line when this line is set and the cache block is to be replaced this triggers the copying of data from cache memory to the main memory.

Now we almost know how CPU reads and writes using caches but there is a thing that we haven't talked about yet in the cache architecture itself and that is the cache line or cache block size.

3.3 Cache line/Cache Block

Well we said earlier when a cache hit is missed the cache will fetch this data from main memory and then places it inside the cache this is not completely true what is true is that the cache would go and fetch this address and associated with it an additional consecutive addresses that equals to a certain number of bytes pre-defined by the architecture of the cache this is known as the cache line or the cache block normal cache blocks is of size 16,32,64 so this mean with every hit miss cache would fetch 16 or 32 or 64 consecutive bytes starting from the address that caused the miss. in this case we could imagine a cache as a memory that is divided into cache lines/cache blocks for example if we have a 8KB cache memory with a 16Bytes cache line then this cache would be divided into 8KB/16B = 512 Cache blocks or cache lines and this is called a direct mapping a thing we will get to explain further in this article.

Now we understand how caches works with the processor but what actually make this architecture working? we said that the cache would buffer the most frequently used and needed data and instructions by the processor how can it do that? this is the next topic in this article.

4-Locality of References

Every program we write exhibits two important properties that is every data/instruction that is accessed by the processor has a high probability to be used again by the processor in a short period of time that is because our code normally contains loops and so many data is accessed multiple of times and that is known as temporal locality of references. The other property that our programs normally exhibits is that normally data/instruction accessed will lead to the data/instructions in the following consecutive addresses in memory to also be accessed in a short period of time by the CPU and this is known as spatial locality of references. But how does those two features enable how caches work? lets see.

4.1-How caches uses locality of references

We said that cache memory buffer the most frequently used and needed data to the processor so how caches knows that? well the answer to this is that they take advantage of the two properties that any program exhibits which is the temporal and spatial locality, Caches take advantage by temporal locality by storing in cache data/instruction that has been requested by the CPU and wasn't found in the cache as it is going to be needed again by the processor in a short time following the principle of temporal locality, Caches also takes advantage by spatial locality by storing a cache line /cache block every time a cache miss occur so consecutive memory addresses to that that has been requested is saved in cache and so spatial locality principle is taken advantage of by cache to keep in the cache the potential data that the processor may need in the future and so an increase in the cache hit ratio will occur leading to increase in the overall performance and speed.

Up till now we know what is caches and how they work and what are the principles they use to make them work but there is a problem till now that we haven't discussed we said that caches are small memory which may range from 8K-64K bytes while main memories are a lot bigger than that so how every data inside the main memory can be stored in this small memory? This is the next topic in this article which is data organization in caches.

5- Cache Organization

Cache is a small memory and main memories like RAM is going to be always bigger than it so how can cache be organized in order to store every data requested to it this is what is the meaning of data organization and we will mainly explain two types of data organization and they are direct mapping organization and set-associative mapping.

5.1- Direct mapping

In direct mapping every address in memory is mapped directly to a cache line/cache block inside cache this is done as follows take for example the below picture that shows direct mapping from main memory to cache and lets take address 0x0000 for example the address in this type of mapping will be divided into three main parts and they are Word offset ,Cache line offset and a tag let's explain every part in detail.

1- Word offset : This part will consists of number of bits that determines the offset of the data needed to be read, written from the cache line. the cache line for example could be of 16 Bytes and we want to read only one byte from it so we would have 4 bits that determine what part of the cache line we want to read/write.

2-Cache line offset : this part determines which cache block/line the data needs to be placed ,for example if we have a 16bytes cache line/ block we would need 4bits of address to express it. and by doing so e could determine which cache block this address should be mapped to in cache.

3-Tag : A tag is the remaining number of bits in an address that distinguish it from other addresses. as when we use direct mapping there will be lots of addresses that maps to the same cache line take for example the two addresses 0x0000 and 0x0080 they both map to cache line 0 so we need a part of the address to differentiate them and this is the usage of the tag part.

No alt text provided for this image


No alt text provided for this image


so mathematically we could describe the direct mapping by the following equation

i = j modulo m
where
i= cache line number
j= main memory address
m= number of total cache lines/cache blocks in the cache memory        

Now we now how direct mapping is working but what happens when two memory addresses is mapped to the same cache line? in direct mapping what would happen is that a replacement of the cache line would occur and this arises a new problem imagine for example I am accessing two addresses in a loop that maps to the same cache line we would be replacing the cache line over and over which is not efficient there must be a way to solve this problem and yes that is associative cache organization.

5.2-Set-Associative cache organization

Set-Associative organization is a type of cache organization that aims to solve the cache line replacement problem that comes from two or more memory addresses being mapped directly to the same cache line/block. The set-associative organization simply distributes the cache as number of sets and every set will consists of a number of cache lines for example if we have 2-set cache with 8 cache line then we will have two sets 0 , 1 and inside every set we will have 4 cache lines.by doing that we could have up to 4 memory addresses being mapped to cache line zero or 1 without any conflicts and this is the main idea od set-associative organization in this type of organization we could have 2-set ,4-set ,8-set organization and this basically means we will divide the cache size into 2 or 4 or 8 sets were every set will contains number of cache lines according to the following formula

Example Cache with 16KBytes , 4-set ,16 Bytes cache line
16KBytes /16 Bytes cache line = 1024 cache lines
1024 /4 sets = 256 cache line per set 

m = v * K
i= j mod v

where
i=cache set number
j=main memory block number
v=number of sets
m=number of lines in the cache number of sets 
k=number of lines in each set k        
No alt text provided for this image


The addresses of the main memory will be divided into the following sections Word offset , Set offset ,Tag where the tag and word offset are the same as direct mapping were set-offset is the number of bits in the address which says at which set this address will be mapped to inside the cache for example if this a 4-set cache then we will need a 2-bits set-offset.

No alt text provided for this image


By using set-associative we could minimize the effect of overlapped cache line and cache line replacement but what if also a cache collide occurs what will happen well in this case there is no way but to do cache line replacement and to do this an algorithm which removes the least used data is used and is known as LRU ( least recently used).


6-Cache memory types.

After we understood how cache works and is organized lets take a look on cache types well cache normally consists of 3 levels L1,L2,L3 caches L1 cache is the closest cache to the CPU and it is divided into i-cache and d-cache i-cache is used for instructions (code) while d-cache is used for (data), L2 cache is also inside the CPU but it acts as unified cache for both i-cache and d-cache, L3 cache is a cache that is used among all CPU cores. in the following image and example for caches inside the i7 processor.

No alt text provided for this image

At this points marks the first part of cache memory article in the next one we will talk about cache friendly code.

Areej Zidan

Embedded SW Engineer at APPRAID TECH | ISTQB? (CTFL | CT-AuT)

2 年
回复
Moaz Al-hady

Ai Engineer (Mid level)

2 年
回复
Ahmed Gad

AUTOSAR | Embedded Linux | IOT | C | C++

2 年

This is a good article ??and we can also mention a TLB as a memory cache type which is used for page address translation.

回复

要查看或添加评论,请登录

Omar Ehab的更多文章

  • CPU Utilization in Embedded systems

    CPU Utilization in Embedded systems

    An important metric that give an insight about the system performance is the CPU load or utilization. In the following…

    30 条评论
  • Flash Memory in Embedded Systems

    Flash Memory in Embedded Systems

    A trivial mistake that lots of beginners in embedded systems do when they get introduced to Flash and EEPROM is they…

    2 条评论
  • Secure Firmware flashing using firmware signing

    Secure Firmware flashing using firmware signing

    The ability to flash a new Firmware to your embedded target using a boot-loader is essential but did you think what if…

    3 条评论
  • Synchronous VS Asynchronous Operations

    Synchronous VS Asynchronous Operations

    One of the concepts that I recognized that lots of people get it wrong is the difference between synchronous and…

    7 条评论
  • Access Synchronization

    Access Synchronization

    Writing Embedded software you will usually encounter the situation where you have to synchronize your tasks either for…

    3 条评论
  • Thread safe and Non-Thread safe Vs Re entrant and Non-Re entrant functions

    Thread safe and Non-Thread safe Vs Re entrant and Non-Re entrant functions

    Multi threading environment could be chaotic if careful design consideration is not taken and one of the most famous…

    4 条评论
  • Interrupt Latency

    Interrupt Latency

    It is known that we use interrupts in embedded software in order to serve asynchronous event almost immediately.so the…

    7 条评论
  • Scheduling analysis and testing using Rate monotonic algorithm with time demand analysis.

    Scheduling analysis and testing using Rate monotonic algorithm with time demand analysis.

    Following up with the article i published before about scheduling and multitasking.In this short tutorial we will see…

  • Functions VS ISR

    Functions VS ISR

    Have you ever thought why interrupt service routines(ISR) in some tool-chains is written in a special format that is…

    2 条评论
  • Introduction to multitasking and scheduling in Embedded systems

    Introduction to multitasking and scheduling in Embedded systems

    This will be a short introduction into multitasking in embedded systems. First what is multitasking? Multitasking is…

    3 条评论

社区洞察

其他会员也浏览了