"The question is not whether we will be extremists, but what kind of extremists we will be. . . The nation and the world are in dire need of creative extremists." -- Martin Luther King Jr.
Published by: schrockblock, on 2009-07-09 00:33:58
So first off, for those of you looking for the very basics of cryptography, I would like to recommend taking a look at Arrexels article "Cryptography- The VERY Basics." It is a nice little article on a simple form of encrypting messages, though we will briefly go over this method here as well.
Alright then! Cryptography is essentially the art of protecting information. It does so by changing the information into an unreadable form (as opposed to Steganography, which simply hides the existence of such information). One of the simplest ways to do this is what is called "Monoalphabetic Substitution."
Monoalphabetic Substitution
This form is what is described in Arrexels article. Essentially, the idea is to replace letters with other letters in a predetermined fashion. For instance, we can make a lookup table for encrypting and decrypting in this system like so:
a b c d e f g h i j k l m n o p q r s t u v w x y z
b c d e f g h i j k l m n o p q r s t u v w x y z a
If we want to encrypt the word "plaintext," we first take the first letter, p, and look it up in the first row of our table, and find the letter right below it: q. So "p" becomes "q." Doing this lookup all the way through gives the encrypted form: "qmbjoufyu." To reverse the process, look at the second row of the table, and find the corresponding letter in the first row.
This type of code is vulnerable to frequency analysis, and, more interestingly, things called Markov Chain Monte Carlo Methods (a very interesting article on them found here, though not terribly accessible to newbies). For computer security, it is therefore rather impractical.
Polyalphabetic Substitution
In the above example, only one ciphertext alphabet (as the second row above is called) was used. In the case of Polyalphabetic Substitution, more than one is used. One way to do this is to apply one alphabet to every other letter starting with the first one, and another alphabet for every other letter starting with the second. This may be a little confusing, so here is an example. First, let us make a pair of look up tables:
a b c d e f g h i j k l m n o p q r s t u v w x y z
b c d e f g h i j k l m n o p q r s t u v w x y z a
and:
a b c d e f g h i j k l m n o p q r s t u v w x y z
c d e f g h i j k l m n o p q r s t u v w x y z a b
where we want to encode "plaintext" again. So, we take the first letter "p", and look it up in the first lookup table. As before, it turns into "q." For the second letter, we use the second lookup table, turning "l" into "n." For the third letter, we use the first table again, and so on. Proceeding in this way we get: qnbkovfzu.
As you can see, the letter "t" is encoded as a "v" in one place and a "u" in another. Naturally, this sort of code is more secure than Monoalphabetic Substitution. It is still, however, vulnerable to (modified) frequency analysis, and a version of the MCMC Methods. So again, it is rather impractical.
Transposition
Transposition is an entirely different type of beast altogether. With a transposition cipher, rather than changing letters to different letters with a lookup table, you just move the letters around. For instance, say we want to encrypt, as per usual, "plaintext." Let us write the word in, say, two columns:
p t
l e
a x
i t
n
Now, we simply read off the rows as our encrypted text. In this case, it would be: "pt le ax it n."
This is just an example. As you can imagine, there are many, many ways of scrambling the letters (362,880 different ways for "plaintext" as a matter of fact). However, computers are fast enough to unscramble such things without too much trouble, so, in addition to some other considerations (which we will get to), modern cryptography does not employ this method either.
A special note on hashing
For hackers, one of the most important types of cryptography you may encounter is that of hashing. Hashing refers to encrypting something SO WELL that no one, I mean no one, can get the original information back again. Think of it as one way coding, that is, once you ENcode it, you cannot DEcode it. You may be asking yourself, "Now why on earth would you want to do such at thing? Encrypt information so that no one can ever read it again? Preposterous!" But fear not! there is a rational reason to do this.
Say you have a password that only you know, and you want to keep it in a not-so-secure place. This password is very important to you though, as it allows you to log on to your computer. How can your computer compare the password you enter when logging on, to the password you stored in a not-so-secure place, without letting anyone see what your password actually is? The answer is hashing. Consider this algorithm:
1. Hash your stored password.
2. Put in the not-so-secure place.
3. Next time you log on, hash the password you enter.
4. Compare the entered password hash to the stored password hash.
5. If they are the same, you get logged on, otherwise, you stay locked out.
As you can see, the computer can check to see if your password matches without storing the password in a readable state. Some common hashing algorithms are MD5, SHA 1, and Blowfish (I have heard tell, though, that it was recently shown that two different passwords may have the same hash in MD5, but that could just be a rumor).
"But wait," you say, "I've done several several Realistic challenges where I have decoded hashes! What is going on??" Well, when you "decoded" those hashes what you were actually doing is is taking every possible letter combination, encrypting that as a hash, and comparing it to the hash you saved. So you were not decoding per se, but rather checking to see what text gets encoded as your saved hash. Basically, you cheated.
AES/Rijndael*
Rijndael (prounounced rain-doll) is the encryption standard used by the United States government. As you can imagine, it is very complicated. I will briefly explain (in VERY basic terms), but I will have to assume some knowledge of some Group Theory, binary, and XORing. Anyone who is familiar with AES, understand, I am about to GREATLY oversimplify the algorithm for the sake of clarity, and you will have to forgive me. I know this is not exactly the way it works, but I think this explanation covers the underlying process in an edifying and understandable way.
First, start out with your message, and a password to encrypt it with. Convert it to binary. Call the binary "m." XOR your "m" with your password "k." Call the result "y."
Now for the meat of the AES. AES involves a few functions I will call "MixColumns," "ShiftRows," and "SubBytes." Do not worry about what they do for now, we will get to that in a second. Lets call the AES output "c." Then:
c = k XOR MixColumns(ShiftRows(SubBytes(y)))
Great! Now, for what those functions actually do... SubBytes
For this, we will be working in the field Z_2[x]/<x^8+x^4+x^3+x+1>. Z_2[x] is the ring of polynomials with coefficients being either 0 or 1. Since x^8+x^4+x^3+x+1 is an irreducible polynomial, we know that Z_2[x]/<x^8+x^4+x^3+x+1> will be a field. We will call this field "F." Nuff said.
Here comes the binary part (other than XORing, of course). Take one byte of "y" and make every zero or one the coefficient in a polynomial in F. For instance, say your byte is 10110101. Then your polynomial will be
x^7 + 0x^6 + x^5 + x^4 + 0x^3 + x^2 + 0x + 1
= x^7 + x^5 + x^4 + x^2 + 1
Now, since F is a field, we know that all elements in it (except for zero, of course) have an inverse. SubBytes finds this inverse, and replaces the original byte with the one generated by the inverse polynomial (just like we found the polynomial from the byte, we can do the reverse and find a byte from a polynomial). ShiftRows
ShiftRows step is a transposition step. It writes the bytes in blocks of 16, and "shifts the rows":
1 5 9 13
2 6 10 14
3 7 11 15
4 8 12 16
turns into
1 5 9 13
6 10 14 2
11 15 3 7
16 4 8 12
Then it reads off the columns as the new order in each block. MixColumns
MixColumns is like SubBytes, except this time, we are working in the field F[t]/<t^4+1>. Moreover, you do not use single bytes, you use groups of four bytes. The first byte determines the coefficient of t^3, the second determines the coefficient of t^2 and so on. Then you multiply this by a given polynomial also in this field, and use the result as your new set of four bytes.
And there you have it! AES/Rijndael in basic, abbreviated form. If you want some more details, you can try the Wikipedia page on AES, though that is a little thick. Otherwise, PM me if you have basic questions, and if there is enough demand, I will write an article just on AES.
Public Key Cryptography
And finally, here is the mainstay of cryptography today. Since I do not want this article to be a book, I will cover the Diffie-Hellman Key exchange, and if you guys want to know about RSA, PM me or let me know and I might do an article on that. The Problem
In all the past sections, I have described crypto-systems that have a fatal flaw: the require a code book, of some sort or another. With substitution ciphers, you have the lookup tables. With Transposition, you have how you scrambled the letters. With AES, you have the password. And the problem with code books is, both the sender and the receiver of the coded message have to have a copy. If two people are communicating over insecure channels, and they wish to begin exchanging coded messages, how can they do so? If one of them sends a code book over the insecure channel, an eavesdropper will have a copy as well. What to do, what to do... The Solution
Public Key Cryptography. Say Alice and Bob wanna exchange password protected emails using AES. How do they arrange to get the same password without sending it to each other? One good way is called the Diffie-Hellman Key Exchange. Now, this gets a little complicated, so I recommend you write it down as you read, like taking notes, so you can really understand it. I think you will find that while it looks scary, its really quite simple.
Alice picks a number "e," a number "p" and a number d. Then she calculates e^d mod p, or in english, the remainder of e raised to the power of d when divided by p (if you do not understand this, look up modular arithmetic). Let us call this number "a" for Alice. Then, Alice sends e, p, and a to Bob, but keeps "d" a secret. Bob then picks a number "c" to keep secret for himself. After that, he finds e^c mod p (let us call it "b" for Bob), and sends it to Alice. Next, Alice raises b to the d (mod p), and Bob raises a to the c (mod p). So, written out, we have:
a^c = (e^d mod p)^c = e^(dc) mod p
for Bob, and
b^d = (e^c mod p)^d = e^(dc) mod p
for Alice. You will notice that in the end, both Alice and Bob have the same number. This is their very own secret password, that no one but they know.
At first, this may not seem to be secure. After all, you're broadcasting what a and b are right? So shouldnt an eavesdropper be able to figure out what d and c are? Technically yes. But this problem is referred to as "The Discrete Log Problem" in mathematics, and is notoriously difficult. Let me give you an example.
Let e = 7, p = 11, and 7^c mod 11 = 10. What is c? PM me for the answer.
Conclusion
Ok then, thus concludes my overview of cryptography. I hope it was basic enough for those just starting out, and that those more advanced learned something too. If you have any questions, enjoyed this article, or would like an article on some other topic, either comment, or PM me, I will try to answer as best I can. Thank you!
* This section was compiled using notes from a talk I gave on AES, which in turn made use of some notes on the subject by Professor Susan Loepp at Williams College.[sys][/sys]
Cast your vote on this article 10 - Highest, 1 - Lowest
Regarding "(I have heard tell, though, that it was recently shown that two different passwords may have the same hash in MD5, but that could just be a rumor).": this is not a rumour. It's actually easy to prove.
Each md5 hash is 32 hexadecimal digits long. This gives a finite number of combinations, but each input can be of any length. This means there's a finite amount of output but an infinite amount of input which means that every output has an infinite amount of input. This is called collisions, when two input strings gives the same output.
(Sorry if it's not entirely clear, I'm tired (yawn).)
sry but u lost me at AES/Rijndael* :(
it's prolly coz i don't know any of those AES and XOR and i go to high school so my math isn't astonishing.
some links with basic tutorials about those stuff would be helpful. thx
schrockblock - 03:05 am Thursday August 06th, 2009
B4d3-- hmm. Without a more solid math background, understanding AES will be difficult. It depends on what you would like to do... if you want to do programming involving AES, I recommend looking up someone else's implementation. Otherwise, here's a link to a site that will give you the basics of the theory behind AES:
http://www.math.niu.edu/~beachy/aaol/contents.html#index
Well I'm new to hacking but I'm familiar with the basics of encryption and found this a really good article for building on what I know, that said I did get a bit lost when you went in to detail on AES since I don't know anything about Group Theory or XORing but I'm guessing the reason you didn't include them was because they are complicated topics worthy of articles in themselves... so if anyone is reading this and thinks they can explain it then feel free to write one, otherwise I'll have to look it up else where I guess.
schrockblock - 07:27 pm Monday December 12th, 2011
Unfortunately, iskilled, we're looking for a whole number (hence the term 'discrete'). So c is not an irrational number, and it is also not negative.
Moreover, I think if you tried that c, you would find that the remainder of 7^c when dividing by 11 would not be 10.
HackThisSite is is the collective work of the HackThisSite staff, licensed under a CC BY-NC license.
We ask that you inform us upon sharing or distributing.
Nice Article :) Very good explanation