Introduction to Error Detection and Correction #1: Parity to Hamming
Simon Southwell
Semi-retired logic, software and systems designer. Technical writer, mentor, educator and presenter.
Introduction
I first came across error correction codes (ECC), back in the 1990s, whilst working on integrated circuits for Digital Data Storage (DDS), based on digital audio tape (DAT) technology, at Hewlett Packard. I wasn’t working on ECC implementation (I was doing, amongst other things, data compression logic) and it seemed like magic to me. However, I had to learn how this worked to some degree as I’d pitched an idea for a test board, using early FPGAs, to emulate the mechanism and channel for the tape drive for testing the controller silicon and embedded software. That is, it could look, from the controller point of view, like a small section of tape than could be written to and read and moved forward and back. To make this useful, the idea was to create a fully encoded set of data, with any arbitrary error types added, which was loaded into a large buffer for the tape data emulation. Therefore, I would have to be able to compile this data, including the encoding of ECC. Fortunately, I had the engineer on hand who was doing the ECC implementation, and he’d usefully written some C models for encoding and decoding which I could adapt for my purposes. This all worked well and was used in earnest for testing hardware and software in all sorts of error scenarios. Indeed, when a colleague from our partner development company was returning home, after secondment to analyse tape errors, we had the idea for his last analysis to be a ‘fake’ error pattern, generated by the tool, that said ‘best wishes’ in an error pattern.
Since then, access to information on ECC has become more readily available, with the expansion of the internet (in its infancy when I was working on DDS), but I have found that most information on this subject to be quite mathematical (necessarily when proving the abilities and limits of algorithms), with very little discussion of how this is turned into actual logic gates. In this article I want to assume that the proof of functionality is well established and concentrate on how to get from these mathematical concepts to actual gates that encode and decode data. I want to take this one step at a time so that I can map the mathematical concepts to real-world logic and hopefully dispel the idea that it’s too complicated. I will start at the beginning with concepts that many of you will already be familiar with and you may skip these sections if you wish, but it is in these sections I will be introducing the logic that fits the terminology, used in the following sections, as we build to more complex ideas
Once we get to Hamming and Reed-Solomon coding there will be logic implementations of the examples that can be accessed on github, along with accompanying test benches, so that these implementations can be explored as executing logic. The logic is synthesisable, but has no state, which it would need for practical use, so it is ‘behavioural’ in that sense, but is just logic gates, with no high-level Verilog used. Also, there will be an Excel Spreadsheet for exploring Hamming codes where any data byte value can be input, and errors added to the channel to see what the Hamming code does to correct or detect these.
ECC has been around for some time (even before the 90s when I was introduced to it), so you may wonder how relevant it is to today’s design problems. Well, Reed-Solomon codes are used, for example, in QR codes (see diagram below), now ubiquitous in our lives, and the newly launched PCI Express 6.0 specification adds forward error correction to its encoding (see my PCIe article).
So, let’s get to it, and the beginning seems like a good place to start.
Basic Error Detection
In this section I want to go through two of the most basic error detection methods that you may already be familiar with—parity and checksums. I want to do this firstly for completeness (some people may be just starting out and not be as familiar with these concepts), but also to introduce some terms and concepts used later.
In all the algorithms we will discuss, though, there is a trade-off between the amount of data we have to add to the basic information (i.e., the amount of redundancy) versus the effectiveness of the encoding for error detection or correction. The engineering choices made rely on an understanding of the channel that the data is transported through (whether wire, wireless or even magnetic or optical media) and the noise and distortion likely to be encountered. None of these methods is 100% and can be defeated so that errors occur. The idea is to make the observed error rate below a certain probability. Often more than one mechanism is employed. This article will not be discussing the analysis of channels and the design choices to make in choosing the ECC scheme, but we will look at the effectiveness of each method.
Parity
This is perhaps the simplest encoding for detecting errors but, as we shall see, it is the foundation of many of the more complex algorithms we will discuss.
Parity, fundamentally, is “adding up ones” to see if there are an odd number of them or an even number of them.?For example, taking an 8-bit byte, if there are an even number of ones (say, 4), then it has even parity, else it is has odd parity. We can encode the data into a codeword to ensure the code has either even or odd parity by adding an extra bit and setting that bit as one or zero to make the codeword even or odd parity. We still need to ‘add up’ the ones from the data to know how to set this bit. This is done using modulo 2 arithmetic. That is, we add 1s and 0s with no carry, so 0 + 0 = 0,?0 + 1 = 1, 1 + 0 = 1, ?but 1 + 1 = 0 as it is modulo 2—the equivalent in C is (x + y)%2. Note that in modulo 2 arithmetic, adding and subtracting are the same thing, so 1 – 1 is 0, just like 1 + 1. The is useful to know as I’ve read articles that mention subtraction and then give example where addition is being used.
You may have noticed that this is the same output as an exclusive OR gate (XOR). Therefore, we can use XOR or XNOR gates to add up the number of ones and either set or clear the added bit for even or odd parity as desired. The Verilog code to calculate the added bit is shown below for odd and even codes:
Here we see the even parity code as the byte bits all XORed together, whilst the odd parity code uses the inverse of this. The byte and parity bit code can then be transmitted, over a UART say, and the receiver performs a parity module 2 sum over the whole codeword. If the sum is 0, then no error occurred. If the sum is 1, then error is detected. This is true whether even or odd parity, so long as the same XOR or XNOR is used byte both transmitter and?receiver.
This method is very limited in protection. It can’t correct any errors and can only detect a single bit in error. A two-bit error will make the code ‘valid’ once more, and the bad data will be considered good. Therefore, it is suitable for channels where bit errors are rare, and certainly the probability of two-bit errors within the space of a codeword length (9 bits in the example) has a probability below the required rate for the system.
In the example a byte was used, so adding a parity bit has a redundancy of 12.5%. This can be reduced by encoding over a larger data size, so 16 bits has a redundancy 6.25%, but the probability of two-bit errors with the codeword now increases. This is why I said an understanding of the channel is important in making design choices. Increasing to 16-bits might be a valid choice for increasing efficiency if the raw bit error rate is low enough to meet design criteria.
Checksum
A checksum is an improvement to basic parity but is still only a detection algorithm. In this case the data is formed into words (bytes, 16-bit words, 32-bit double words etc.), and then a checksum is done over a block of these words, by adding them all up using normal arithmetic, to form a new word which is appended to the data. The checksum word will be of a fixed number of bits, and so is doing modulo 2n summation, where n is the number of bits.
The checksum word might be bigger than the word size of data so, for example, 8-bit byte data might be being used, with a 16-bit checksum word. In all cases all the data words are added up as unsigned numbers, modulo the width of the checksum word. I.e., (D0 + D1 + D2 … +Dn-1) % 2chksum_width. This sum is then two’s complemented and appended to the data. On reception, the same summation is made, including the appended word and if zero, the checksum detected no errors, else an error occurred. Alternatively, the checksum is appended unmodified, and then subtracted from the data summation at the receiver to give 0 for no error—it depends where it is most convenient to do the two’s complement. The diagram below shows some code for generating a checksum, with the data buffer assumed loaded with data.
Like for parity, the checksum adds redundancy, and the trade of is between size of the checksum word, the length of the data block and the probability of errors. In general, checksums perform more efficiently than parity. It comes down to the probability of corrupted data generating the same checksum value as good data. If the checksum word is narrow, say a byte, then the number of messages that can generate a ‘valid’ checksum is 1 in 256. This goes up as the word gets wider but drops again as the data block size is increased.
Another failing of checksums is there is no position information in the encoding. In the example above, all the data in the buffer could be re-arranged in all possible combinations and all give a valid checksum, though only one is correct. This can be somewhat alleviated using cyclic redundancy checks, which we’ll deal with in the next section.
Before leaving checksums, though, I want to briefly mention MD5, which is sometimes referred to as a checksum. Actually, it is a kind of hash function using cryptographic techniques. Its original intention was for it to be infeasible to have two distinct data patterns (of the same size—512-bit blocks for MD5) that produce the same MD5 value (128 bits), which would give some indication that the data received, matching a given MD5 number, was the message intended to be sent. However, this has long since been shown not to be true for MD5, so it is not really used for this purpose anymore. It is still seen in many places though, as it does give high confidence in detecting data being corrupted by channel noise (as opposed to malicious interference) and, in this sense, is used as a checksum. As it’s not a really a checksum, I won’t discuss this further here.
Cyclic Redundancy Checks
For cyclic redundancy checks we need to introduce the concept of a linear feedback shift register, or twisted ring counter. This is like a normal shift register, but the output is feedback to various stages and combined with the shifting data using an XOR gate. The useful property of this arrangement is that, if the feedback pints are chosen carefully, and the shift registers are initialised to a non-zero value and then clocked, the shift registers will the go through all possible values except 0, given the length of the shift register (i.e., modulo n, with n the number of bits in the shift register), without repeating, until it reaches the start point value again. Depending on the feedback points, the values at each clock cycle will follow a pseudo random pattern and, indeed, these types of shift register are used for pseudo-random number generation, with the start value being the ‘seed’.
The set of numbers and their pattern are known as a Galois filed (after évariste Galois, and interesting character worth looking up), or sometimes a finite field. These can (and often are) described using a polynomial. For example, if we have a 16-bit shift register and XOR the output into the shift registers bit inputs at 0, 5 and 12, with the actual output at 16, then this can be described with the polynomial:
(note that x0 is 1, and you will see this written as 1 quite often). By default, this will cycle through its Galois field, but if we add a data input, then this modifies the shift values dependent on that data. The diagram below shows this arrangement of the polynomial with a data input:
If we have a block of data and shift each data bit into the shift register at each clock, then the shift register will have a value which, mathematically, is the remainder when dividing the data by the polynomial (using modulo 2 arithmetic). By appending this remainder (the CRC code) to the data at transmission, then at a reception, pushing the data, including the CRC codeword, through the same LFSR, with the same starting ‘seed’, should return 0—i.e., there is no remainder. If there is, then the data has been corrupted.
The polynomials used are often standardised. The example polynomial is the CRC-CCITT 16 bit standard, but
领英推荐
is the CRC-16 standard. Larger width CRCs are used for better protection, such as 32-bit:
This polynomial is used in, amongst other things,?Ethernet, PKZIP, and PNG. The advantage of the CRC over a checksum, even though it might be the same width, is that the CRC codeword is a function of the data and the order in which the data was input.
The arrangement shown in the above diagram, works well if shifting data in serial form, but often data is being processed as words (bytes, 16-bit words, double words etc.). Serialising this to calculate the CRC codeword is not practical without sacrificing bandwidth. What is required is a way to calculate the CRC in a single cycle as if all n bits of the data had been shifted through the LSFR. For a given set of data bits, from a given CRC starting point, the remainder will be fixed. Therefore, we can generate a table of all possible remainders for dividing the polynomial by the data. For 8-bit data, this is 256 remainders. The data can be pre-calculated and placed in a lookup table (LUT). The C code fragment below generates the LUT for 8-bit data for the 32-bit polynomial shown above:
Note that the set bits in the POLYNOMIAL definition correspond to the polynomial terms, except for an implied x32. To calculate the CRC codeword using the LUT, a 32-bit CRC state is kept and initialised (usually to 0xffffffff, the inverse of a remainder of 0). The block of data bytes are run through the CRC, with each byte XORed with the CRC bottom 8 bits to index into the LUT. This LUT value is XORed with the CRC state shifted down by a byte, which becomes the new CRC value. When all the bytes have been processed like this, the CRC codeword is append as the inverse of the CRC state.
Note that data wider than a byte can be processed in this way but the LUT length, for the arrangement above, is a function of 2 ^DATA_WIDTH, so 16-bit data needs a LUT of 65536 entries and 32-bit data a LUT of length of 4294967296 entries. It can be pipelined, with intermediate CRC values being fed forward and multiple ports into the smaller LUT, or multiple copies of the LUT for each stage, at the expense of more gates and other resources. These, though, are optimisations for speed, but do not change the result of the basic CRC shift register.
It should be noted that for a given CRC polynomial, with the protection properties that it brings, has the same protection if inverted or reversed. For example, the 32-bit polynomial is represented as a value 0x04c11db7 in the LUT generation code. If this 32-bit value is inverted (0xfb3ee248) or bit reversed (0xedb88320), or both (0x12477cdf), then the characteristics remain the same, even if the generated values are different and often this 32-bit polynomial is seen in these alternative forms.
Error Correction
All of the above methods are for error detection only. In this section, and the next article, I want to start talking about error correction and look at a couple of methods that build on what we have looked at so far. These methods are known as forward error correction codes. That is, information is passed forward to allow the receiver to recover the data. This is in contracts with, say, a retry mechanism, where errors are detected, and a message passed back to the transmitter (a NAK, for example) to resend the data. We will look first at Hamming codes and then, in the next article, Reed-Solomon codes.
Hamming Codes
Hamming codes?build upon the concept of parity and by careful placement of multiple parity can be used to locate the position of an error, and thus correct it. Since we are using ones and zeros, knowing the position of an error means we can correct it by simply inverting the bit. The basic principle is that data is encoded into codewords, which are a sub-set of a larger set of values, such that valid codewords do not become other valid codewords in the presence of n bits in error. The n value is known as the Hamming distance.
For example, a 4-bit code with valid codes being 0000b, 0011b, 0101b, 0110b, 1001b, 1111b, would have a Hamming distance of 2 as it would take 2 bits in error to move from one valid code to the next. All eleven other codes are invalid, and thus this example is only 37.5% efficient. With more distance these codes can be used to correct errors by moving valid codes back to the nearest valid code.
Let’s look at a more practical example of Hamming code usage. To keep the example manageable, we will encode 8-bit bytes into code words for the ability to correct a single bit in error. The addition of another parity bit will allow for the detection of a second bit in error. The Hamming code will need four bits, and with a fifth parity bits, the codeword will be 13 bits in total, giving a 62.5% redundancy. This is not so good, but after going through the example we will see how this might be scaled for encoding larger words with better efficiency. Below is a view of a spreadsheet that encodes an 8-bit byte into a 12-bit hamming code and then adds a detection parity bit, and then allows the code to be decoded, through a channel that can add errors. This spreadsheet is available on github, along with the Hamming and Reed-Solomon Verilog source code, so this can be experimented with.
From the top, the data is defined as a set of 8 bits that can be set to any byte value. To construct the Hamming codeword we need to calculate 4 different parity values (even parity in this case) that calculate parity over different bits of the data. The codeword bits are numbered 1 to 12, with a P detection parity bit (more on this later). The parity bits P1, P2, P3, and P4, are positioned at codeword locations 1, 2, 4 and 8. These are the power of 2 locations. The rest of the 12 bits are?filled with the data byte bits. It actually doesn’t matter where the data bits are placed in the remaining spaces, so long as they are extracted in the same manner on decode but placing them in order seems like a good idea.
So, the line marked D has the values of the data byte, and the parity bits are calculated on the next four lines. P1 is calculated for even parity across D0, D1, D3, D4, and D6, as marked by the cells with a x. The other four parity values are calculated in a similar manner but choosing different data bits for inclusion, as shown. We will discuss the choice of data bit inclusion shortly.
The code line, then, contains the final codeword. Over this whole codeword the detection parity bit is calculated for even parity. The spreadsheet then has an error ?‘channel’ where errors can be added (none in the example above), before getting the received codeword (marked rx code). To decode this, the parity is calculated for the bit positions as during encoding, giving four results marked e0 to e3 on the spreadsheet. When all zeros, as in the above example, there is no error, and the data bits can be extracted from the codeword as good.
Notice the pattern of x positions on the e0, to e3 decoding (and I left this until now, as it is easer to see, I think). Looking down the cells the x positions (if representing 1 with spaces as 0) encode values 1, 2, 3, and so on, in binary, for the 12 positions of the codeword. This is how location of an error will be detected. Let’s introduce a single bit error.
In the above example D1 is corrupted at position 5, and on decode e0 and e2 are set, giving a value of 5. This is the index of the corrupted bit, so now the bit can be inverted and the corrected code at the bottom is the same as that transmitted. This works for any position, including for the parity bits. So, by careful encoding and positioning of the parity bits, the codeword can move an invalid codeword back to a valid codeword in the presence of a single bit error. What if we have two errors?
In the above example an additional error is introduced at position 7. The error bits now index position 2, which has no error, but it is inverted anyway as there is no indication that this isn’t correct at this point. The corrected codeword is now wrong, but the parity is now odd, which is incorrect, and so the double error is detected. It can’t be corrected but at least its presence is known. There is another failure mode where two errors decode to give an error index that is greater than 12—i.e., it indexes off the end of the codeword. This is also detecting a double bit error. I won’t show a spreadsheet example here but get hold of the spreadsheet and try for yourself.
So, what does the Verilog look like for this? An encoder Verilog module is shown below.
This code is purely combinatorial (one could register the output if required) and simply calculates the four parity bits from the relevant input data positions. The 12-bit hamming code is then constructed, placing the parity and data in the correct locations, and then the detection parity bit is calculated over the entire hamming code. The output is then the parity bit and hamming code concatenated. The diagram below shows the decoder.
In this code, the error bits are calculated from the relevant input code positions. From this error index a ‘flip mask’ is generated by shifting a 1 by the index. This is a 16-bit number to cover all possible value for e. A corrected code is then calculated by XORing it with the flip_code, bits 12 down to 1. Bit 0 isn’t used, as an error code of 0 is the no-error case. If the index is greater than 12, then this is an invalid code, which we will deal with shortly. The corrected code plus parity bit is then used to generate the parity for the entire message. The output data is extracted from the relevant positions within the code, before an error status calculated as either a parity error (when error index non-zero), or an invalid index (flip_mask bit above 12 set). And that’s all there is to it. This code is available on github, along with a simple test bench to run through all possible data and one and two bit errors.
It was stated before that encoding a byte this way is not too efficient. In more practical usage, such as protecting RAM data, 64-bit data can be encoded with 7 bits of hamming parity, and an 8th detection parity bit for 72 bits in total. Hopefully it can be seen from the index pattern on the spreadsheet how this may be expanded for 64-bit data by simply following the indexing in binary, with the parity bits at the power of 2 locations. Many RAM components come in multiple of 9, 18 or 36 bits just so 72-bit words (or multiples thereof) can be constructed. For example, the Micron 576Mb RLDRAM 3 comes in 18- and 36-bit wide versions.
I have implemented Hamming codecs for DDR DRAM in just this manner for memory systems in high performance computers, where error rates over the system skyrocket as the amount of RAM used becomes so much larger, with thousands of compute nodes.
Conclusions
In this article we have gone from simple parity for low level error detection, to a more useful way of employing multiple parity for error correction, with a detection fallback. I have tried to simplify the mathematical concepts into practical logic-based implementations, with example logic that can be used in real synthesisable modules. Checksums give a simple method for improving detection over parity but have no position protection. CRCs introduce the concept of polynomials and Galois fields and then reduce this to practical logic, with methods to improve encoding efficiency. For the Hamming code, a spreadsheet implementation is used for exploring the codes more immediately and visually and revealing the pattern of the encoding process to allow indexing to error locations giving the means for correction. (Verilog and spreadsheet available on github.) Each of the methods discussed trade complexity against expected channel error rates and nature, and the appropriate method depends on the design requirements.
In the next article will be explored Reed-Solomon codes, which takes parity even further, also using the Galois fields we saw for CRC, to be tolerant of burst errors as well as random (Gaussian) errors.