Hack This Site

Editors Note:
I know there is already another article about this challenge here, but in my opinion the approach chosen there has some flaws, which I am going to explain later on. Also, this is my first article submitted on HTS and, given the fact that English is not my mother tongue, I hope that there will not be too much to criticise in here.

Introduction:
The goal of this article is not to give a full solution to the challenge, instead I will try to explain the idea behind one "best practice" solution, the software-engineers point of view, so to speak. After reading carefully, you should be able to implement a well-working solution yourself, in a programming language of your choice. (I used Qt/C++ with the Qt-Creator)

Scientific Approach:
Yes, programming is science, if you think first, before starting to write your code :)
a few points to think about:
- how do I manage input/output
- what can be calculated on startup, to keep the actual calculations short, so minimize time between in- and output (this is a quite important point, since the time to submit your solution is rather short.)
- what has to be done to get from input to output?

the last one is easy: we have a scrambled word and have to look up in a list, what the original word looks like and return the original word. Im going to leave in- and output definition to you, since I dont know which programming language you will use.
But there are some problems. for example: how do I check, if a word is an anagram of another one?
The naive approach would be to check every permutation of the letters and look if it fits an entry of the wordlist. Its not fast and not very beautiful, but it would work, fair enough. But lets think about this some more: another possibility to check for anagrams would be to count the different letters and check if the numbers for every letter are the same. As you can easily see, this approach is a lot better, especially for longer words where comparing each combination of letters would take you forever.
I think you can see where these thoughts lead:
if we can find a way to generate a unique integer for every element of our wordlist, we can generate these before our actual input/output phase which would save us lots of time.
So, the task to solve now would be the following: find en efficient way to calculate a unique integer for a word, where this integer has the useful quality, that it is equal for anagrams, which can easily be achieved by calculation it from counting the different letters and taking these numbers into calculation.
A harder part is the unique. How can wie ensure, that the integers, that are euqal for anagrams are NOT equal for words that are no anagrams? This is one of the most obvious flaws of the other article, since the answer to this question there would be: we dont care, because the probability is small. Well, for a software engineer small is only small enough if it is zero.
So let us think about this point some more:
We know about two unique representations of integers. the integer itself (which is really no use for us right now) and the prime factorisation (which is more promising)
We could for example represent the word abcd as (2 * 3 * 5 * 7)
A little explanation:
we define an alphabet, which contains avery letter, we may have in our wordlist, you can choose yourself if you want to have it case sensitive or not. next, we need a differnent prime for each of the letters in the alphabet, I recommend using the smallest you can find ;)
That would give us the following mapping:
{ a => 2, b => 3, c => 5, d => 7 }
now count the letters in the word you want to represent as integer, and build you result int as follows:

pseudocode:
CODE :
result = 1
for each letter:
....result *= power(prime[letter], count(letter,word)

some examples:
aaaa => 2^4
aabb => 2^2 * 3^2 = bbaa = baba = ...

and so on.

as you can see, the required equalness of integers for words that are anagrams is given. If you give it some thought, you will notice that the uniqueness that we required is also given. If you dont want to spend the time neccessary, you will just have to believe me :)

another point that comes to mind now, is that, if we choose to calculate these numbers for every item in our wordlist on programm startup, we could sort them after their integer value, which we could just call Gödelnumber, since the process I explained before is also known as gödelisation.
After sorting the words after their gödelnumber, all we have to do on runtimg is calculate the gödelnumber of the input words an look up the value in our sorted list. A maximum of efficiency and thus a minimum of wasted time for runtime calculations.

in fact, this was everything you need to know th solve this challenge the "best practise" way and I thank you very much for reading and (hopefully) understanding :)

now, our program could look something like this:

pseidocode:
CODE :
inputString = doInput()
for each word:
....inputInteger calculateIntegerValue(word)
....outputString += lookUp(inputInteger)
return outputString;

Outro:
this was a very simplified way of saying "it took me longer to write the article than it took me to solve the challenge" and I hope this on the one hand helps those of you that could not have solved it otherwise and on the other hand is a nice introduction into the scientific way of doing things for those who read just out of interest.

Thanks,

TheJokeR

Comments:
Published: 24 comments.

oasis - 08:25 pm Friday October 29th, 2010

Intersting...Think i will try to complete it the easy way first, and then try this way :P

fashizzlepop - 04:23 am Saturday October 30th, 2010

I think it's easier to just sort them ASCIIbetically then compare.

TheGuiTarJokeR - 11:28 am Saturday October 30th, 2010

the article is not supposed to provide the "easiest" solution ;)
but going this way would theoretically enable you to solve a similar challenge with about 1000 scrambled words of input in also about 15 seconds.
and i just like the idea of achieving the "best" solution ;)

SpecialCharacter - 03:18 pm Saturday October 30th, 2010

Nice Article! I learned something. I solved this the dirty regexp/charArray way...

Avery17 - 05:54 pm Sunday October 31st, 2010

This is exactly how I did mine. I did it in C and PHP. I think this is the best method in my opinion. I've yet to get a collision and I set it up to parse 200 words and it got them all within seconds. This is how professional anagram finders do it.

Just added a timer to my C program, it parses 200 words correctly in less than a second. It generates 2 UINX timestamps, one before the program runs and one after.

Got 200/200 words.
START 1288547861 : END 1288547861

at 1000 words I noticed a difference of 1 second.

Got 1000/1000 words.
START 1288548880 : END 1288548881

yourmysin - 08:18 pm Tuesday November 09th, 2010

This article should be cleaned up. It's extremely hard to read because of the poorly designed paragraphs.

TheGuiTarJokeR - 04:52 pm Friday November 12th, 2010

I'm not sure if this is possible after submission, but I'll see what I can do about that :)

yoho139 - 01:46 pm Monday November 15th, 2010

It helped me to highlight while I was reading if that makes a difference to you :P

zarpa - 07:53 am Monday November 22nd, 2010

Good solution, though a bit academic for me :-)

I did it a bit differently, with just a self-written permute program and some bash shell scripting....

mutantsrus - 06:27 pm Saturday December 11th, 2010

I'm glad I wasn't the only one who used the prime number/anagram theory to solve this. *looks up* It also seems I wasn't the only one to use shell scripting either. :D

ThaKing - 12:19 am Monday December 20th, 2010

used an array of integer arrays to compare
eg.
Wordlist(0)(255) = 1 means first word in wordlist has 1 character with ascii val 255.
check if the integer array for each word in the list is equal to an integer array generated for the scrambled word and return the original.

sh3llz - 01:34 am Tuesday January 18th, 2011

I used a similar method when I completed this challenge using flash/as3. Originally I was planning on using exactly the same method, but decided not to due to the limitations of the scripting language. Instead I had the word list broken into an array and the scrambled words also made into an array. I then looped through the scrambled words and compared them to only words in the word list that were the same length/similar. The comparison was also a loop statement, which instead of converting the letters into to numbers, determined if the words were the same using the as3 search function. It probably wasn't best practice, but the script runs damn fast.

sam_2108 - 04:07 am Thursday February 10th, 2011

May be i missed something in the article, but what is you have two strings like aabb and baba. Both are different strings but won't their values be same? Also, i recall atleast one word had special characters. What about it?
I solved the program in a different approach (in C) where you create an structure of word and a character array of 256 (ASCII chart). For every character in the word chart, set a bit in the array and increment it for every other 2 count. once the array is complete, simply generate the same structure for the input word and compare the array.

Duncskunk - 07:02 pm Monday March 28th, 2011

I enjoyed your article alot... I agree, the best answer is the one to go foe ;)

deathclaw - 11:38 am Monday April 18th, 2011

Probably, if you want to save some time, you should look at the eleventh extbasic exercise.

Prince Of The Elite - 01:45 pm Friday June 17th, 2011

Hmmm... Indeed a great idea and indeed will solve the challenge in few seconds but don't you think it'll take much time for development and coding ??! I mean that will take much work. I think I'll do it my way first. Good article though !

acidaddict - 03:16 am Wednesday July 13th, 2011

you found a better solution than me. I was using the recursive method.

You have a better big O than me. Your method is O(n) and mine is O(n^2) :(

modwind - 09:19 am Saturday July 16th, 2011

Hashing is a way to solve the challenge. But using prime numbers with standard integer types may lead to overflow for long words like in extbasic 11. Especially if there are both letters, numbers and special symbols. I used the same approach as fashizzlepop - sorted letters of each word alphabetically and then used a simple string search function.

adigahacker - 04:26 pm Thursday July 21st, 2011

the fastest way i was able to do it in using PHP is to take the words list in array, split each word to an array and sort it, implode the split and sorted word array and store them in another array,

did the same with those words given by the challenge,

do 2 foreach loops with a counter that resets after each word is check, on match to return the word that has the index of the counter in the very first array from the original words array.

lines of code:27
time to get all words in pure procissing(without submitting the result or getting the words): 0.19 seconds.

tried other things but most are just a waste of time.

adigahacker - 04:27 pm Thursday July 21st, 2011

damn, i did not understand my own comment because of the "Array" word!

nDoped - 09:14 pm Sunday October 09th, 2011

i'm fairly new to all of this. I solved this in challenge in java by reading each file into a char array, then used a for loop nested in another for loop to sort each element of each array, set them to a string, compare the strings, then save the corresponding word from the wordlist into a solution array when there was a match. My question, is java looked down on? I mean, it seems more respectable to complete this in C/C++ or another more 'techy' language. Is this true? I want to be more competent in other languages, but java is all i know at the moment...well, and some matlab...if that counts.

ggoodie - 03:52 am Tuesday January 17th, 2012

Mine looped through the list of scrambled words and made a product and a sum of the characters' ASCII codes which it then compared to words on the list.

Also, I don't have much programming experience so I wrote mine in javascript XD.

stabf - 04:23 pm Friday August 03rd, 2012

I did the same as nDoped! I really felt like that was the best compromise between coding/prep time and speed to completion.

KthProg - 03:07 am Monday January 28th, 2013

mine removed each letter in the scrambled word from the wordlist, the empty entries were the only ones which could have been matches, i checked each of these possible matches to make sure they were the same length as the scrambled word. whichever one was the same length was the match.

it runs in under one second.

reading the file line by line and storing it in an array i think helps with speed too.

HackThisSite is the collective work of the HackThisSite staff, licensed under a CC BY-NC license.
We ask that you inform us upon sharing or distributing.

_{Page Generated: Sat, 20 Dec 2025 19:11:29 +0000
Web Node: www01 | Page Gen: 0.08s | DB: 11q
Current Code Revision: v3.2.5
(Sun, 22 May 2016 20:29:51 +0000)}

Donate

Challenges

Get Informed

Get Involved

Communicate

About HTS

Partners