It is officially on, now.#
This challenge isn't conceptually hard, but it involves actual error-prone coding. The other challenges in this set are there to bring you up to speed. This one is there to qualify you. If you can do this one, you're probably just fine up to Set 6.
There's a file here. It's been base64'd after being encrypted with repeating-key XOR.
- Let KEYSIZE be the guessed length of the key; try values from 2 to (say) 40.
- Write a function to compute the edit distance/Hamming distance between two strings. The Hamming distance is just the number of differing bits. The distance between:
this is a test
is 37. Make sure your code agrees before you proceed.
- For each KEYSIZE, take the first KEYSIZE worth of bytes, and the second KEYSIZE worth of bytes, and find the edit distance between them. Normalize this result by dividing by KEYSIZE.
- The KEYSIZE with the smallest normalized edit distance is probably the key. You could proceed perhaps with the smallest 2-3 KEYSIZE values. Or take 4 KEYSIZE blocks instead of 2 and average the distances.
- Now that you probably know the KEYSIZE: break the ciphertext into blocks of KEYSIZE length.
- Now transpose the blocks: make a block that is the first byte of every block, and a block that is the second byte of every block, and so on.
- Solve each block as if it was single-character XOR. You already have code to do this.
- For each block, the single-byte XOR key that produces the best looking histogram is the repeating-key XOR key byte for that block. Put them together and you have the key.
This code is going to turn out to be surprisingly useful later on. Breaking repeating-key XOR ("Vigenere") statistically is obviously an academic exercise, a "Crypto 101" thing. But more people "know how" to break it than can actually break it, and a similar technique breaks something much more important.
No, that's not a mistake.#
We get more tech support questions for this challenge than any of the other ones. We promise, there aren't any blatant errors in this text. In particular: the "wokka wokka!!!" edit distance really is 37.
💭 Sounding a little overwhelming. Let's take a look.
💭 Trying all the 256 possible ASCII characters combination for key lengths 2 to 40 will result in successful cracking, but it looks like an utterly stupid way of attacking the problem, so we'll not use that approach, especially since there is a saner recommended approach.
Let's begin by writing a Hamming distance calculator.
"Always work with binary, see how the machines see."
OK, I got
37, but I wonder if there is a slicker way of finding the number of differing bits than doing
diff.toString(2).match(/1/g).length. My method feels "analogous", yo quiero algo "digital".
After some research, I came up with this alternative implementation that works at the bit level rather than use costly string and regex operations.
On performance testing, the bit-level implementation was found to be significantly faster - as expected.
Back to the problem in hand.
Let's give a try at implementing the solution based on the instructions, it is a very clear explanation about how to go about guessing the key size. Once we have the key size, the journey starts.
There we have the probable key size. The fact that we can crack the encryption if we know the key size feels pretty miraculous.
Let's further work on the instructions and update the code. I have captured my thought process in the code comments, don't miss them.
The approach is pretty logical - yet magical. Despite knowing how it works, I still find the outcome pretty amazing - like a sneaky BJJ submission technique. Props to Friedrich Kasiski (29 November 1805 – 22 May 1881).
- The algorithm used in the solution is based on the Kasiski examination.
- Charles Babbage (26 December 1791 – 18 October 1871) seems to have independently broken the Vigenère cipher too.
- Following the instructions in the challenge to a tee did the reverse of helping. Tip: use guidelines but don't be bound by them.