CPU works. Oh really? But how?
We briefly talked about CPU last time. We already know that the CPU (Central Processing Unit) is the brain of devices, including laptops and computers. This doesn't mean that the CPU is the most important part of a computer. Imagine a human brain, we humans cannot function without a brain, but we cannot live without lungs, heart and other parts of our body too. Like father, like son, like human, like computer. Yet it plays an integral role in performing all the operations and controlling almost everything. Now let's find out:
According to uncle Google, the CPU consists of six main components.
Arithmetic Logic Unit (ALU)
The Arithmetic Logic Unit (ALU) is mainly responsible for 2 tasks. Arithmetic operations and logical operations. The arithmetic unit and the logic unit cannot be separated from each other, because when you do a math calculation, both of them work together to give you the results.
Now open your calculator, write "2 + 2" and press that equal sign (=). This should output you "4". If you see "5", don't panic, it's okay XD. ALU just calculated the output of "2 + 2". We will go back to this calculation later and understand what is happening at a lower level. Let's find out more about arithmetic and logical operations first.
ALU has 4 types of arithmetic operations, such as addition (+), subtraction (-), multiplication (*), and division (/). Interestingly, ALU cannot divide and multiply. Wait, how does it multiply and divide then? ALU uses addition and subtraction to achieve them. Let's see the multiplication case first. ALU uses repeated addition to achieve multiplication. Repeated addition is adding x number y times. For example, instead of "2 * 3 = 6", it calculates "2 + 2 + 2 = 6". This process is vice versa for division. ALU uses repeated subtraction to achieve division. Repeated subtraction is the process of subtracting a number continuously from the large number until the remainder is zero or lesser than the actual number. For example, instead of "6 / 2 = 3", it calculates "6 - 2 - 2 - 2 = 0". We are subtracting the number 2 three times to reach zero, so the answer is 3.
ALU has logical operations at the same time. Logical operations use bitwise operators such as AND, OR, NOT and XOR. Bitwise operators operate on a binary level and they can directly be read by the CPU, more specifically ALU. If you don't know much about bitwise operators, you cannot understand the whole process, so I recommend you at least read the wiki about them. You can see the images below to learn more about how these gates work.
Now we will talk about why ALU needs both arithmetic and logical operations and how it calculates numbers. Combinations of the logic gates above make it possible to calculate arithmetic operations. Let's go back to our "2 + 2 = 4" calculation again. In a binary system, the decimal number 2 is represented as one-zero (10). To add 2 and 2, ALU uses a series of OR, AND and XOR gates. The result of the logical processing steps for "2 + 2" is one-zero-zero (100), which is the binary code for 4. If you are interested in how these logical processing steps are performed, you should read this amazing article about building 32-bit ALU with 2 control lines. You might be wondering, what is a control line? According to Erik Eidt, who has previously worked as Senior Architect at Apple, two control lines are Ainvert and Bnegate. They can be used to invert values before combining them. So we already know ALU calculates numbers, but we still do not know how data is transported to and from the ALU. This is a sign that we need to learn about Control Unit (CU) right away.
Control Unit (CU)
Control Unit (CU) helps the CPU to direct operations. It tells the memory, ALU, and input/output devices of the computer how to respond to instructions received from programs. Shortly, it controls the aforementioned computer's parts by sending control signals. There is a RISC pipeline in computer architecture. RISC stands for Reduced Instruction Set Computer. It was designed to execute one instruction per cycle, five stages in total. Those stages are, Fetch, Decode, Execute, Memory, and Write. There are 5 types of code that can be written. They are load type, store type, branch, jump, or an R Type of code. We will talk about R Type, also known as ALU. R Type uses 4 stages of the RISC pipeline, fetch, decode, execute, and write-only. CU uses R Type to perform its 4 primary functions when working with ALU. They are again fetch, decode, execute and write back.
We know that ALU has data input and output and it can only work them. The CU makes sure that inputs of ALU are loaded so that it can perform operations on it. Once ALU finishes the operation, CU will send the output via a data bus to register or RAM. If we look at the cycle, we can see:
Can you find another sign here? Data bus? Buses? Exactly! It's time to know about buses, not exactly ISUZU or Mercedes buses you see, but CPU buses XD.
Buses
Buses are circuits on the motherboard that connect the CPU to other components. A bus moves instructions and data around the system. They are not part of the CPU, but a gateway between the CPU and other components. You can think of a bus system as an internal connection on the motherboard. There are 3 types of buses, Address bus, Data bus and Control bus.
No more signs this time, let's jump straight to the Registers now.
Registers
Registers yet play an integral role within the CPU. They are a usually small amount of fast storage and they can be quickly accessed by the CPU. Registers can hold different data types such as an instruction, a memory address, or a bit sequence. Now you might have the following questions: Why do we need registers? Why does the CPU not just use RAM instead? The answers are easy! The registers are faster than RAM. CPU accesses RAM at a slower rate compared to registers. Moreover, the registers hold instructions that the CPU is currently processing while RAM holds the data of currently running programs that the CPU needs. The registers usually hold a small amount of data from 32-bits to 64-bits. Here is another question, what does 32-bit or 64-bit mean? This is the capacity of a register. We all know that 1 bit is equal to either zero or one. 32-bit and 64-bit can be considered as a word lengths, meaning that 32-bit can store a sequence of 32 bits. The same goes for 64-bits. You can see this in the picture below.
The RAM of your computer depends on the register capacity. A 32-bit register can handle up to 4 GB only. N-bit value has 2^N possible states. Doing 2 ^ 32 equals 4,294,967,296 bytes, which is nearly 4 GB. Having a 32-bit computer and 8 GB of RAM is useless because your 32-bit system can only take 4 GB RAM. As a result, your computer will use only 4 GB of your 8 GB RAM all the time. However, things are way different about 64-bit registers. 64-bit registers can handle up to 16 exabytes (16 billion gigabytes) of RAM. There are several types of registers. I let you do your research about them, so I will not include information about them.
Clock
The Clock measures the number of cycles your CPU can execute per second. These cycles are measured in GHz (gigahertz). The speed of older CPUs was measured in megahertz, which equals millions of cycles per second. Generally, the higher the clock speed your CPU has, the faster it works. However, these words are not true all the time, say the CPU manufacturing dominant - Intel. For example, a CPU with a higher clock speed from five years ago might be outperformed by a new CPU with a lower clock speed, as the newer architecture deals with instructions more efficiently.
1 GHz means 1 billion cycles per second.
1 MHz means 1 million cycles per second.
You can see in the picture above, how many cycles my laptop's CPU can execute per second.
Cache
The cache is a small amount of fast RAM built within the CPU. The cache is used to temporarily store and hold data and instructions which are likely to be reused by the processor. This makes the processing speed faster because the CPU does not have to wait for data to be fetched from RAM. It can easily fetch a temporary reusable piece of data from the Cache. The CPU can access the cache 10 to 100 times faster than RAM. There are 3 levels of Cache. They are Level 1 cache, Level 2 cache and Level 3. L1 is the main, small-size and yet the fastest cache. The most sophisticated CPUs today have up to 1MB L1 cache. All caches are often located in the CPU, but sometimes L2 and/or L3 lay on the motherboard between the CPU and RAM. L2 is bigger-sized and slower cache than L1, but still faster than RAM. It is mostly used as a secondary cache. Modern CPUs have up to 8MB L2 cache. L3 is the largest and the slowest cache your computer has. The latest L3 caches are up to 32MB and sometimes up to 64MB on server CPUs. Now let's talk about how data travel between CPU, Caches and RAM. The data flow goes from RAM to the L3 cache, then the L2 cache and finally the L1 cache. When the CPU looks for data, it first checks the L1 cache, then L2 and L3. If the CPU finds the data needed from the cache, it is called a cache hit. However, if the CPU cannot find the data from any of these caches, it goes directly to RAM to access it. This condition is known as a cache miss. The cache is also a volatile memory like RAM, meaning that all the data in it will be erased when your computer is shut down.
We have already talked about all parts so far. For the bonus point, we will talk about CPU cores now. What is a CPU core? CPU core itself is a CPU. Years ago, we had only single-core CPUs that can only work on one task at a time. Now, we have multi-core processors and each of those cores can perform different tasks simultaneously. A core can only work on a single task, so if you need to work on multiple tasks at a time, you need multiple cores for that. The more cores your CPU has, the more productive it becomes. We have numbers of different CPUs having from 1 to N cores. You must've seen something like single-core, dual-core and so on when reading system information about computers and laptops. Here is a brief list for you so that you do not get confused with these names.
Single-core = 1 core
Dual-core = 2 cores
Quad-core = 4 cores
Hexa-core =6 cores
Octa-core = 8 cores
The list goes on...
Now you might have the following question: I have a dual-core CPU, but does it have one general Arithmetic Logic Unit, Control Unit, Registers, etc.. or each core has its own computing units? The answer is: Each core has its own computing units and caches. Mostly, L3 caches are shared between all CPU cores and they are not located in the core itself such as L1 and L2 caches.
Most CPUs can use a process called Multithreading. Multithreading is the ability of a processor to perform multiple tasks concurrently. It achieves it by splitting a core into virtual cores and these virtual cores are called threads. Each core can have two threads. Single-core CPU has 2 threads, dual-core CPU has 4 threads and quad-core CPU has 8 threads.
However, there are pros and cons of multithreading too. Speaking of the advantages: If cache misses happen a lot to a thread, other threads can help to retrieve data from RAM. As a result, it leads to faster execution. The major disadvantage is multiple threads interrupting each other when sharing hardware resources such as caches. This results in poor executions even when only one thread is working at a time. Since we talked about Multithreading, let me mention a bit different approach, which is Multiprocessing. Multiprocessing is the use of one or more CPUs within a single computer or laptop. Just imagine having dual-CPU as having twice as a powerful computer. Thanks to dual-CPU to multi-CPU motherboards here.
Let's find an answer to the most interesting question, now!
How can you love your CPU? This is very interesting! Maybe it's time for us to find and love some CPU? Maybe marry one? Bro, are you still calling yourself a man after playing CS:GO for long hours without having a GPU? You still don't know how to love your CPU? Oh come on, buy it a GPU bro, your CPU needs a pair. Joking, joking. The actual question is: Why do we need GPU and why CPU itself is not enough? In my previous article, I mentioned that GPU (Graphic Processing Unit) has a parallel structure which makes it more efficient in rendering images and performing algorithms that process large blocks of data compared to a general-purpose CPU. CPUs have fewer, but more powerful cores while GPUs have a lot, but weaker cores. Communication between a CPU and a GPU happens via the PCI Express. The GPU has massive power for massive parallelism of rendering graphics. Due to this parallelism, a GPU can complete more work in the same amount of time as compared to a CPU. You can think of a CPU as a Lamborghini and a GPU as a train when it comes to the amount of work completed. No matter how fast Lamborghini is, if the task is to transport 400 people to a 100 km distance, a 2-seated Lamborghini will go there 400 times to finish the task. On the other hand, a train can transport all of them at once and at a faster rate compared to Lamborghini transporting 400 people. However, a CPU can never be fully replaced with a GPU, because a GPU cannot be what a CPU is. Why would you bother a slow train to transport just one person, Lamborghini can do it way faster. The GPU has its own limitations. It may work well with a massive amount of calculations and algorithms, but it works very slow compared to ordinary tasks. Thus, it can never replace the CPU. Let it be used the way it was designed to be.
Cheers)
SWE | Founder of “IT Pro” team
2 年Good work!
Android Developer
2 年Good article! I liked it. Can you share the original source? A book? Or something else. Or did you gather all the information from Google?