How does Bitcoin / Blockchain Mining work?

How does Bitcoin / Blockchain Mining work?

From logistics to healthcare, from social media to real estate, from the energy sector to the global economy — Blockchain is predicted to transform almost every single industry in the next ten years. Often described as even more revolutionary than Artificial Intelligence, this technology is entering our lives at mind-blowing rates.

While inspiring overall, this extreme pace also comes with a negative side: it means that education simply cannot keep up. The absence of quality training programs around blockchain means that many people will either miss out on opportunities in this new and exciting field, or even worse — will make premature career choices based on an erroneous understanding of what is possible and what isn’t.

There are lots of concepts that allow Blockchain-powered projects and ideas to exist. It would take a whole online course to cover all of these topics (see end of this post for more details). That’s why today we are going to laser-in on just one concept. For this article I’ve picked perhaps the most used and, at the same time, most misunderstood topic: Mining.

We’ve all heard about Bitcoin mining and miners. We’ve probably even used these terms. But what exactly do these miners do? What is mining all about? Those are the question we will be answering today and we will do this in three parts:

  • Part 1: What’s a cryptographic hash?
  • Part 2: The cryptographic puzzle
  • Part 3: Block configuration

Note: We’re going to look at the example of Bitcoin. Other cryptocurrencies such as Ethereum may use different ideas (e.g. a different type of hash function) and therefore the specifics will vary, however the underlying concepts remain the same.

___

Part 1: What’s a cryptographic hash?

You may have already heard that a blockchain is a series of successive block cryptographically linked together:

But what does this mean and how is this connected to mining? Let’s have a closer look.

An individual block in a blockchain contains the following elements: block number, data stored in the block, hash of the previous block and hash of the current block.

A hash (or cryptographic hash) is a long number which acts as a digital fingerprint of any collection of data. In Bitcoin the SHA256 hashing function is used which generates a 64-digit hexadecimal number. For example, the cryptographic hash of the words in this paragraph is:

C019286295F2CDEC9958BEE25B9603B5F94C76B2CCC69A59CE54872ED26DC479

Note: in the images here hashes are shortened for illustration purposes.

Hashing algorithms have many interesting properties, however today we are most interested in three: 1) the SHA256 function is deterministic — you will always get the same hash output if you recalculate the function with the same input; 2) the SHA256 function is impossible to reverse-engineer. Meaning that you can never know in advance what hash value you will get until you actually calculate it; and 3) if you feed the SHA256 function two even slightly varying inputs (for example, you change a dot for a comma), you will get wildly different outputs.

From our example above, we would input the current block’s number, the data stored in the block and the hash of the previous block into the SHA256 function to get the value of the current block’s hash:

SHA256(Block Number, Data,  Previous Block’s Hash) -> Hash

Now we can see how the blocks are linked — not only does each block reference the previous block’s cryptographic hash — but, in fact, that hash directly affects the value of the current block’s hash. Therefore, if anyone were to tamper with any given block’s data, such action would render not only that specific block’s hash invalid — but also all of the following blocks’ hashes invalid.

Such connection between blocks means that the Blockchain as a whole is much more tamper-proof than standard database structures and other record-keeping methods. And since a Blockchain is in essence a ledger of records, this tamper-proof property is known as the “Immutable Ledger” property.

Okay, great. That’s how blockchains work. But what has this got to do with mining? The straightforward answer is that mining is all about calculating the hash value for the newest block which is being added to the chain. However, it’s not all that simple.

The thing is that the SHA256 function only takes a fraction of a second to calculate. And yet, you may have heard of the numerous mining pools such as BTC.com and AntPool, and even industrial scale mines — all competing to generate the next Bitcoin block. So the question is — why do we need all that computing power?

___

Part 2: The Cryptographic Puzzle

This is where start adding layers of complexity. Buckle up!

Blocks in the blockchain have another field which we have not spoke about yet. This field is called “The Nonce” which stands for number used only once:

The Nonce is an integer number and along with the Block Number, Data and Previous hash the Nonce serves as an input for the SHA256 function to calculate the current block’s hash:

SHA256(Block Number,  Nonce, Data, Previous Block’s Hash) -> Hash

Unlike other components of a block, the Nonce is designed to be totally under our control. This means that now we have a mechanism to vary the current block’s hash while keeping the data inside it intact. Indeed, thanks to the nature of the hash function (property #3 in our discussion above), every time we select a new Nonce for the same block the resulting hash will be a different value.

Alright, that’s great. But what has any of this got to do with mining? This is where we come to the fun stuff.

There is a total of 16?? possible SHA256 cryptographic hash values (each hexadecimal digit has 16 possible values and there are 64 of them in a hash). However, not all of them are valid hashes. Why is that? Well, every two weeks the Bitcoin network will define a minimal target for the hash. Anything above this target will be rejected, anything below — accepted.

The diagram above illustrates the pool (not to be confused with ‘mining pool’) of all possible SHA256 hashes — starting at the bottom with smallest and increasing towards the largest at the top. Somewhere along the vertical we have the target. Note that this diagram is for illustrative purposes only as it is not proportionate — we’ll see why in a bit.

At the time of writing the target is:

0000000000000000005d97dc0000000000000000000000000000000000000000

What is really important in the target is the number of leading zeroes. Just like in the decimal system, leading zeros in a fixed-size number will determine its magnitude. Every leading zero reduces the number’s magnitude by a factor of 16 (ten in the decimal system, but here we’re working with hexadecimals).

There are 18 leading zeros in the current target, meaning that the number of total valid hashes is 16?? (only 64-18=46 non-zero digits remain). Therefore, the probability that a randomly picked hash is valid can be calculated as:

16?? / 16?? = 16^(-18) = 0.00000000000000000002%

In Bitcoin mining terms, this is the probability that any given Nonce value will generate a valid hash for the current block. We can now see why the diagram is out of proportion: the pool of valid hashes in reality is extremely small in comparison to the complete SHA256 pool.

And that’s what the cryptographic puzzle is all about: miners compete to find a Nonce (also called a Golden Nonce) which will generate a valid hash for the upcoming block. Whoever finds it first is allowed to add the block to the chain and get’s their reward of 12.5 Bitcoins. At the time of writing one Bitcoin is worth around $10,000 USD making mining a rather worthwhile activity.

How to read this diagram: the red ‘X’ marks relate to SHA256 hashes while the labels beside them illustrate which Nonce generated which hash value.

The target is defined based on the network’s hashrate (aggregate computational power of all Bitcoin miners). The more miners join the network — the lower the target will be, and therefore the harder it will be to find a suitable hash. The goal of this difficulty algorithm is to ensure that only one new block to is added every 10 minutes. This is part of the Bitcoin monetary policy to control the total number of coins in circulation.

In a nutshell, that’s what the millions and millions of mining machines are doing day and night — they are simply iterating different values of the Nonce in hopes of being the first to find a valid hash for the next block. Once a valid hash if found, the block is added to the chain and the race starts over again, this time for the next block.

___

Part 3: Block configuration

Just when you thought we were done… There’s more.

The Nonce is an integer value with 32 bits of memory allocated to it. Meaning that it has a limited range of around 4 Billion values. This poses two problems:

First, even an average mining device can calculate up to 100 million hashes per second, and therefore will go through the Nonce range in 40 seconds. And that’s an average miner. Mining pools and industrial scale mines are able to go through the Nonce range in fractions of a second.

Secondly, the chance of finding a valid hash is so small that even with 4 billion tries the probability of success is still extremely low:

4 * 10? * 0.0000000002% = 0.0000000001%

So what’s the solution?

For starters, the block contains… you guessed it — another field which we haven’t spoken about yet. This field is a timestamp representing the current Unix time (number of seconds elapsed since 1st January 1970):

The Timestamp is also included in the SHA256 calculation for the hash of the current block that’s being mined:

SHA256(Block Number,  Timestamp, Nonce, Data, Prev. Block’s Hash) -> Hash

And since the timestamp is constantly refreshing (until the block is successfully mined), this effectively resets the Nonce range every second.Why? Well, as we discussed at the very start — even if the inputs of the SHA256 function are varied slightly, this causes the hash to change.

Therefore, if we try all 4 Billion Nonce values for a fixed combination of other inputs (block number, timestamp, data, previous block’s hash) but have no luck finding a valid hash, all we have to do is wait until the the timestamp increases. A change in the timestamp will mean that the combination is now different and if we try all 4 Billion Nonce values again, every time we will get a brand new hash value.

The timestamp solves the problem for the average miner since it will reset before they get to the end of the Nonce range (reminder: average miner takes 40 seconds to do 4 Billion passes). However, for a mining pool or industrial scale miner even one second is too long — as we discussed, they would get through the Nonce range in fractions of a second. So how do they solve the problem? This is where block transaction configuration comes in.

Participants of the Bitcoin network transact with each other all the time. However, a new block is only added once every ten minutes. So where do the transactions go before they are added to a block? New entries are added to a staging area called the mempool. It is then the miners’ job to pick up a batch of these transactions from the mempool and add them to the new block they are mining.

Block size is limited and not all transactions from the mempool will fit into the new block. This means that miners get to pick which transactions will go into the next block. What this also means is that miners can change the configuration of transactions at will (before the block has been successfully mined).

And this is how miners get additional control over the hash. Well… Control isn’t the right word since the hash cannot be reverse engineered or predicted. Variability is a better term here: changing the configuration of transactions creates additional variability in the hash function inputs.

Similar to the timestamp situation, whenever we try out all 4 Billion possible values in the Nonce range and have no luck, all we have to do is slightly alter the combination of transactions which we have selected from the mempool.

The main difference here is that we don’t have to wait. By altering the selected transactions, we can reset our Nonce range at will — therefore we can do this as many times per second as we want. Of course, all this is done algorithmically. This way even mining pools and industrial scale miners can test new hash values continuously without any idle time.

___

Let’s sum up

We’ve covered a lot of ground. Let’s recap the Bitcoin / Blockchain mining process to ensure we haven’t missed anything:

1. A hash is a digital fingerprint of any collection of data. Blocks in a blockchain are cryptographically linked because each one includes the preceding block’s hash in the calculation of its own hash. Tampering with data in any one block will render its and all following blocks’ hashes invalid.

2. The cryptographic puzzle requires miners to find a hash smaller than the set target for it to be valid. Miners search for a valid hash by iterating through a designated parameter within the block called the Nonce. Whoever is first to find a valid hash gets to add the block and collect the reward.

3. The Nonce range contains 4 Billion possible values which is insufficient to find a valid hash with a high degree of certainty. Resetting of the Nonce range is achieved by including the current timestamp and through varying the configuration of transactions included in the the block.

Join us

In March this year we successfully funded a Kickstarter project to create the most comprehensive online course on blockchain the World has ever seen.

Today this course is Live on Udemy.com where over 3,000 students have already signed up to master the concepts of Blockchain, Bitcoin, Smart Contracts and more. If you’d like to become a Blockchain pioneer, then join us on this incredible adventure and take your career to the next level:

https://www.udemy.com/build-your-blockchain-az/?couponCode=MEDIUM90

The link above already includes a special coupon for our Medium readers. Use it to get 90% OFF.

See you inside!

  • Kirill Eremenko
Kumaravel Rajan

Comp. Sc. Graduate student @ TU Munich focusing on cybersecurity and privacy.

3 年

Excellent article. Thanks for this. I was confused reg. why nonce value need to be recalculated. Got my answer here.

回复
Madan Mohan

Associate Director - Consumer & Market Insights - PepsiCo Beverages North America

6 年

Pramod Tiwari

Vanessa Monsequeira

VP of People at Gorilla | Building Employee Experience like a Product | In pursuit of making work suck less | Leadership & Career Coach | Corporate Hippy - views expressed here are my own

6 年
回复
Dr. Kotrappa Sirbi

Data Science Educator, Mentor, Trainer, and Research Development in Machine Learning, AI/DL and Software Engineering .

6 年

Thank you Kirill Eremenko

Randy Thornhill

Director of Organizational Impact at YW Calgary - using data to improve impact every day

6 年

Thanks Kirill. Really useful and well stated summary of a very complex issue. I am interested in how transactions are protected in the mempool.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了