Wednesday, March 3, 2010

Wednesday Math, Vol. 111: Checksum

Way back in the fall of 2008, I wrote about error correcting codes, the clever idea of Richard Hamming that could take the kinds of mistakes machines make when sending data and figure out what the correct message was. This means that any old string of data is not a legitimate message. Another way to detect errors, though not to correct them, is called checksum, where we take some of the data, massage it a little, add it up and check the sums, where only certain sums are legitimate.

Consider a credit card number, which is 16 digits long. If every every 16 digit number was a legitimate credit card number, there would be 10 quadrillion possible card numbers, more than a million each for every man, woman and child on the planet. Even the least responsible person on the planet doesn't have a million credit cards. A lot of number patterns do not correspond to legitimate credit card numbers, either now or in the future. What the checksum part of the credit card will do is catch the kinds of mistakes humans are prone to.

Let's consider an 8 digit number instead.

3481 7717

The kinds of mistakes humans are prone to make are switching two numbers that are next to each other (adjacent transpositions), like typing 3841 instead of 3481, or typing a mis-duplication, like 7117 when we meant 7717. This checksum method will split the digits into two groups by alternating back and forth with each number.

3481 7717

So this splits the number into two groups of digits, 3 8 7 1 and 4 1 7 7. Here is one way we could do a check sum.

Step 1. Double each digit. If double the digit becomes more than 9, add the digits together.

3 8 7 1 becomes 6 16 14 2. 16 and 14 are too big, so they become 1+6 = 7 and 1+4 = 5, respectively. 6 7 5 2 is the new set of digits.

4 1 7 7 becomes 8 2 14 14 becomes 8 2 5 5.

Step 2. Add the digits in each group.
6+7+5+2 = 20
8+2+5+5 = 20

If both sums are divisible by 10, the number I typed in was valid. If I switch two adjacent numbers or type a xxy instead of xyy, this will mess up one or both of the checksums.

The way checksum usually works is that one digit in each of the groups is the checksum digit. Effectively, we could say the last two digits in the number are forced to be certain values by the first six digits' values. This means that instead of 100 million different legal 8 digit codes, there are only 1 million legal 6 digit codes, and the last two digits are the checksum digits for each of the alternating groups. (For credit cards, losing two digits to the checksum means 10 quadrillion patterns are shrunk down to 100 trillion, still a very big number.) This loss of legal codes means that we can't have an accidental identity theft when someone types in a credit card number, at least by the way people usually mistype a string of digits.