What's Behind a Bitcoin Address?
The Process of Generating a Bitcoin Address
A Bitcoin address to/from which we can receive/send funds (*) is essentially a numeric integer value represented using a special alpha-numeric encoding format called Base58.
Here’s an example of an address (**):
18JE1cT3GpUKrF2A4Ah2zs2DkR5Q2edPhM
* Factually we’re not sending/receiving anything and coins are not “moving” anywhere; we’re only adding entries to the ledger or the Blockchain using transactions, but that’s a different subject. For simplicity we often just say that we are sending or “spending” coins.
** This is a legacy P2PKH (pay-to-public-key-hash) address. While there is another type called P2SH (pay-to-script-hash) which support more complex conditions (e.g., multi-signature) as well as the newer Bech32 encoding scheme, here we will keep things simple.
But this is just the tip of the iceberg, as the story of how an address is generated is truly intriguing. Ready? Let’s dive in!
?
Step 1: The Private Key
A private key is essentially just a number too. That number is randomly generated, and is kept a secret by the owner of that key, that is the wallet owner.
The number of possible private keys (or “key space”) is 2^160 (160 bits long for techies), resulting in:
?
1,461,501,637,330,902,918,203,684,832,716,283,019,655,932,542,976
?
It hard for our human brain to even grasp this number of possible private keys, but here’s an attempt to quantify that in terms we might understand, which corresponds to the amount of sand grains in a square of 131,072 x 131,072 earths next to each other.
That’s quite a lot of sand grains to say the least, therefore trying to guess a private key is rather futile.
Step 2: Deriving a Public Key
The private key is then used to derive a public key. The way it is done is by multiplying private key by another number called a Generator Point.
?
???????????K (Public Key) = k (Private Key) * G (Generator Point)
?
The thing to note here is we’re not using simple grade school arithmetic for this multiplication, but rather a special kind called Elliptic Curve algebra, used in Elliptic Curve Cryptography or ECC for short.
ECC itself is a fascinating subject in cryptography, but in order to keep things simple today we won’t go into that just now.
?
Step 3: The Double Hash
So now that we have a public key, we can move on to the next step where we transform it twice using two hashing algorithms, namely SHA-256 and RIPEMD-160.
领英推荐
A hashing algorithm is a function used to take in data of any size (length) and produce a result having a fixed size.
One property of a hash function is that it is deterministic, meaning it will produce the same output given the same input.
Another interesting property of hashing function, is that while it is easy to go in one direction i.e. calculate a hash from an input, it is hard and even close to impossible to go back i.e. calculate the original input from the hash.
We now execute the next step of double hashing. Given our public key K:
?
???????????K` = SHA256(K) ? produces a 64 characters long value (256 bits or 32 bytes)
???????????K`` = RIPEMD160(K`) ? produces a 40 characters long value (160 bits or 20 bytes)
?
SHA-256 is used first, to prevent a potential weakness if RIPEMD-160 is directly used on the elliptical curve public key (K). Then, RIPEMD-160 is used on the previous result (K`), to reduce the address to a more manageable length of 40 characters, resulting in K``.
We now have K``, the public key hash. In the next and final step, it is encoded/transformed into its final form, the Bitcoin address.
Step 4: The Bitcoin Address
Since a Bitcoin address is often seen and sometimes transcribed by users, it made sense to have the final address short, while also adding an additional layer of protection against typos and other human errors, as well as adding a version field to easily distinguish between different types of addresses (e.g., Bitcoin address, Bitcoin Testnet address).
The hash is prefixed with the version field, then hashed twice (SHA256), then the result’s 4 first bytes are appended as checksum for error detection (e.g., malformed address due to a typo), and finally everything is encoded using a special scheme called Base58.
What’s interesting about the Base58 encoding scheme, is that it produces a string which can only consist of letters from the following alphabet:
?
???????????123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz
?
Notice that some characters are missing here, namely 0 (number zero), O (capital o), l (lowercase L) and I (capital i). These were intentionally removed to avoid mistakes/confusion when reading the address off a screen or paper.
The result is the Bitcoin address, which can safely be shared and used.
Final Thoughts
If you were brave (or bored enough) to read through, then chances are the next time you see an address you might smile to yourself thinking “wow, this thing is deep”.
Understanding the path, it takes only to produce a Bitcoin address is something one cannot just go about without admiring the amount of thought and consideration that was put into the process.
Considering that addresses are only one technical aspect of Bitcoin, the entire system is as truly an amazing technological feat and a thing of beauty, at least in my eyes.
?For those who'd like to dive in deeper, I can highly recommend Andreas Antonopoulos' Mastering Bitcoin book.
-Zacky