"If you think technology can solve your security problems, then you don't understand the problems and you don't understand the technology." -Bruce Schneier
Editors Note:
I know there is already another article about this challenge here, but in my opinion the approach chosen there has some flaws, which I am going to explain later on. Also, this is my first article submitted on HTS and, given the fact that English is not my mother tongue, I hope that there will not be too much to criticise in here.
Introduction:
The goal of this article is not to give a full solution to the challenge, instead I will try to explain the idea behind one "best practice" solution, the software-engineers point of view, so to speak. After reading carefully, you should be able to implement a well-working solution yourself, in a programming language of your choice. (I used Qt/C++ with the Qt-Creator)
Scientific Approach:
Yes, programming is science, if you think first, before starting to write your code :)
a few points to think about:
- how do I manage input/output
- what can be calculated on startup, to keep the actual calculations short, so minimize time between in- and output (this is a quite important point, since the time to submit your solution is rather short.)
- what has to be done to get from input to output?
the last one is easy: we have a scrambled word and have to look up in a list, what the original word looks like and return the original word. Im going to leave in- and output definition to you, since I dont know which programming language you will use.
But there are some problems. for example: how do I check, if a word is an anagram of another one?
The naive approach would be to check every permutation of the letters and look if it fits an entry of the wordlist. Its not fast and not very beautiful, but it would work, fair enough. But lets think about this some more: another possibility to check for anagrams would be to count the different letters and check if the numbers for every letter are the same. As you can easily see, this approach is a lot better, especially for longer words where comparing each combination of letters would take you forever.
I think you can see where these thoughts lead:
if we can find a way to generate a unique integer for every element of our wordlist, we can generate these before our actual input/output phase which would save us lots of time.
So, the task to solve now would be the following: find en efficient way to calculate a unique integer for a word, where this integer has the useful quality, that it is equal for anagrams, which can easily be achieved by calculation it from counting the different letters and taking these numbers into calculation.
A harder part is the unique. How can wie ensure, that the integers, that are euqal for anagrams are NOT equal for words that are no anagrams? This is one of the most obvious flaws of the other article, since the answer to this question there would be: we dont care, because the probability is small. Well, for a software engineer small is only small enough if it is zero.
So let us think about this point some more:
We know about two unique representations of integers. the integer itself (which is really no use for us right now) and the prime factorisation (which is more promising)
We could for example represent the word abcd as (2 * 3 * 5 * 7)
A little explanation:
we define an alphabet, which contains avery letter, we may have in our wordlist, you can choose yourself if you want to have it case sensitive or not. next, we need a differnent prime for each of the letters in the alphabet, I recommend using the smallest you can find ;)
That would give us the following mapping:
{ a => 2, b => 3, c => 5, d => 7 }
now count the letters in the word you want to represent as integer, and build you result int as follows:
pseudocode:
CODE :
result = 1
for each letter:
....result *= power(prime[letter], count(letter,word)
some examples:
aaaa => 2^4
aabb => 2^2 * 3^2 = bbaa = baba = ...
and so on.
as you can see, the required equalness of integers for words that are anagrams is given. If you give it some thought, you will notice that the uniqueness that we required is also given. If you dont want to spend the time neccessary, you will just have to believe me :)
another point that comes to mind now, is that, if we choose to calculate these numbers for every item in our wordlist on programm startup, we could sort them after their integer value, which we could just call Gödelnumber, since the process I explained before is also known as gödelisation.
After sorting the words after their gödelnumber, all we have to do on runtimg is calculate the gödelnumber of the input words an look up the value in our sorted list. A maximum of efficiency and thus a minimum of wasted time for runtime calculations.
in fact, this was everything you need to know th solve this challenge the "best practise" way and I thank you very much for reading and (hopefully) understanding :)
now, our program could look something like this:
pseidocode:
CODE :
inputString = doInput()
for each word:
....inputInteger calculateIntegerValue(word)
....outputString += lookUp(inputInteger)
return outputString;
Outro:
this was a very simplified way of saying "it took me longer to write the article than it took me to solve the challenge" and I hope this on the one hand helps those of you that could not have solved it otherwise and on the other hand is a nice introduction into the scientific way of doing things for those who read just out of interest.
Thanks,
TheJokeR
Cast your vote on this article 10 - Highest, 1 - Lowest
Comments: Published: 24 comments.
HackThisSite is the collective work of the HackThisSite staff, licensed under a CC BY-NC license.
We ask that you inform us upon sharing or distributing.