Hey guys,

I just submitted an article detailing the probability and statistics behind the birthday attack. While I understand the concept behind it, I still have a question.

For MD5, a 128-bit hash, we must generate 3.1e19 hashes for a 75% chance of finding a collision. However, since the whole point of it is to check the hashes against one another, that means we have to store each hash so that we can compare it later. So with 3.1e19 hashes at 16 bytes per hash, that works out 430 exabytes. That isn't remotely feasible to store the hashes, much less the strings that generated each hash, so how do they do it? Do they use some form of data compression to accomplish this? Any information about this would be greatly appreciated.

EDIT: Never mind, I think I figured it out. Rainbow tables, perhaps?