Large Arays

Large Arays

Post by ColdwaterQ on Wed Jun 23, 2010 4:46 pm
([msg=40686]see Large Arays[/msg])

I am trying to use a 140,000 word wordlist in a unscrambling program like the one in the first programming challenge and I have two dynamic character arrays that hold the words. The problem is that when The program reads the file into the arrays the program crashes. I am using c++.exe from G++ I was wondering if there is something I can do that will fix this problem and let the program work on any computer mabie just take longer on some. This is just for the heck of it so I have no real restraints I just want to learn some new teqneiques possibly.

Thanks and if you need any more information just hollar.

P.S. I am using bubble search and Selection Sorting so it actually worked pretty fast on the 1000 word wordlist and 10 words.
ColdwaterQ
New User
New User
 
Posts: 4
Joined: Wed Jun 23, 2010 4:40 pm
Blog: View Blog (0)


Re: Large Arays

Post by msbachman on Wed Jun 23, 2010 5:24 pm
([msg=40689]see Re: Large Arays[/msg])

So I'm guessing you're getting seg faults?

I coded this just now and got a segmentation fault when I tried to run it:

Code: Select all
#include <stdio.h>
#include <string.h>

int main()
{
char arr[1500000000];
int i;
for(i=0;i<sizeof(arr);++i)
arr[i]='t';
}


Solution's easy enough though, that's not how you should do it. If you're using a wordlist, you can read in a line (or several even) into a smaller array and work with it however you need to.

If it's something like programming one, like you said, I'd sort one of the words (that you fetch from the web page, not from the word list), and sort and search through each of the words (in the wordlist) in turn in an array gotten by something like fgets.

This might be slower (and you're going to have to do a rewind(FILE * stream) after you find each word), but it should solve the problem you're having.

Good luck.
"I'm going to get into your sister. I'm going to get my hands on your daughter."
~Gatito
User avatar
msbachman
Contributor
Contributor
 
Posts: 681
Joined: Mon Jan 12, 2009 10:22 pm
Location: In the sky lol
Blog: View Blog (0)


Re: Large Arays

Post by ColdwaterQ on Wed Jun 23, 2010 9:26 pm
([msg=40707]see Re: Large Arays[/msg])

I see what you are saying but then I have to do a linear search which is much slower than binary search. Is there a way to access a specific line from a file? so I can bounce arrownd to different lines and still do a binary search.

Thanks for the suggestion I screw arownd with it when I get home if no one replies in the next few hours.
ColdwaterQ
New User
New User
 
Posts: 4
Joined: Wed Jun 23, 2010 4:40 pm
Blog: View Blog (0)


Re: Large Arays

Post by Skiddie Killer on Thu Jun 24, 2010 6:16 am
([msg=40718]see Re: Large Arays[/msg])

You can use the seekg() and seekp() functions for moving the read/write pointer inside a file.Also,you can use the tellp() and tellg() functions which return the current position of the pointer,and they might be useful.
I'm not sure if this is what you're looking for,maybe there is a better way to do it.I don't have much experience with file I/O.Hope this helps.
User avatar
Skiddie Killer
New User
New User
 
Posts: 46
Joined: Sat May 22, 2010 6:46 am
Blog: View Blog (0)


Re: Large Arays

Post by andro1d on Tue Jul 13, 2010 6:57 am
([msg=41720]see Re: Large Arays[/msg])

msbachman wrote:So I'm guessing you're getting seg faults?

I coded this just now and got a segmentation fault when I tried to run it:

Code: Select all
#include <stdio.h>
#include <string.h>

int main()
{
char arr[1500000000];
int i;
for(i=0;i<sizeof(arr);++i)
arr[i]='t';
}


Solution's easy enough though, that's not how you should do it. If you're using a wordlist, you can read in a line (or several even) into a smaller array and work with it however you need to.

If it's something like programming one, like you said, I'd sort one of the words (that you fetch from the web page, not from the word list), and sort and search through each of the words (in the wordlist) in turn in an array gotten by something like fgets.

This might be slower (and you're going to have to do a rewind(FILE * stream) after you find each word), but it should solve the problem you're having.

Good luck.


Will need to be signed char arr, although using a 1 dimensional array of this size is silly.
andro1d
New User
New User
 
Posts: 7
Joined: Fri Jul 09, 2010 11:39 pm
Blog: View Blog (0)


Re: Large Arays

Post by thetan on Tue Jul 13, 2010 9:34 am
([msg=41726]see Re: Large Arays[/msg])

ColdwaterQ wrote:P.S. I am using bubble search and Selection Sorting so it actually worked pretty fast on the 1000 word wordlist and 10 words.

LOLOLOLOLOLOL, dude bare minumum use quicksort ....... http://www.manpagez.com/man/3/qsort/

Anyways, seeing how it's C++ i wouldn't even use that.

If you insist on using linear data (arrays) then you can use std::vector + std::sort(). std::sort() depending on your implementation is typically a hybrid between quicksort and something else in the lower bounds. (code not tested so excuse any errors.)
Code: Select all
#include <vector>
#include <fstream>
#include <algorithm>

int main(int argc, char **argv)
{
    std::vector<std::string> v;
    std::fstream f ("test.txt", std::fstream::in);
   
    while (!f.eof())
    {
        std::string s;
        std::getline(f, s);
        v.push_back(s);
    }
   
    std::sort(v.begin(), v.end());
   
    // if you wish to do something with the ordered results
    for (std::vector<std::string>::iterator it = v.begin();
        it != v.end();
        ++it
    )
    {
        // do something with the results in order
    }
   
    // or if you wish to perform a binary search on the orderd results
    // NOTE: this assumes all values are unique. if not it's no biggy
    std::string t = "some string to match";
    std::string::iterator it = std::binary_search(v.begin(), v.end(), t);
    if (*it = t)
    {
        // we have an exact match in O(log n) time
    }
}


However, seeing how it is indeed c++ i wouldn't sort at all and instead i'd take advantage of the native STL O(log n) search trees. std::set is one of these. So this is more along the lines of what i would do:
Code: Select all
#include <set>
#include <fstream>

int main(int argc, char **argv)
{
    std::set<std::string> st;
    std::fstream f ("test.txt", std::fstream::in);
   
    while (!f.eof())
    {
        std::string s;
        std::getline(f, s);
        st.insert(s);
    }
   
    // sort is done on insertion so no need to sort it again.
   
    // if you wish to do something with the ordered results
    for (std::set<std::string>::iterator it = st.begin();
        it != st.end();
        ++it
    )
    {
        // do something with the results in order
    }
   
    // Traverse the tree to search in O(n) time
    std::string t = "some string to match";
    std::set<std::string>::iterator it = st.find(t);
    if (it != st.end())
    {
        // we found a match in O(n) time
    }
}


To make it even better you can cut out those costly string comparisons. The typical way to do this is to implement a hashing algorithm that will digest the string down to an int or a long long and when done right can make a lookup an O(1) operation. Theirs tons of detailed research done on hashing so i'm going to leave that up to you to look into. From there you can either implement a hash table or what i would do is i'd implement a lookup tree with hashed keys. This way you'd still get O(n) however instead of comparing strings every iteration, you'd compare a simple integer which is a few orders of magnitude faster.
"If art interprets our dreams, the computer executes them in the guise of programs!" - SICP

Image

“If at first, the idea is not absurd, then there is no hope for it” - Albert Einstein
User avatar
thetan
Contributor
Contributor
 
Posts: 657
Joined: Thu Dec 17, 2009 6:58 pm
Location: Various Bay Area Cities, California
Blog: View Blog (0)


Re: Large Arays

Post by tgoe on Wed Jul 14, 2010 4:16 am
([msg=41786]see Re: Large Arays[/msg])

I wouldn't do it that way guys. The method of sorting is actually irrelevant because the dictionary is static. i.e. 1 sort over n searches (and invocations). Assuming this, you can hardcode a jump table based on word length and/or word prefix to greatly speed up search by slicing the dictionary into relatively tiny pieces per input. Something like a trie. I learned this technique while learning jQuery by creating a basic word search game "Bungle" here: http://widgets.opera.com/widget/14431/(Made for Opera, sorry :() Finding words out of jumbled letters is quite fast this way. If you expect your wordlist to change at all though, thetan's method is better.
User avatar
tgoe
Contributor
Contributor
 
Posts: 633
Joined: Sun Sep 28, 2008 2:33 pm
Location: q3dm7
Blog: View Blog (0)


Re: Large Arays

Post by thetan on Wed Jul 14, 2010 11:39 am
([msg=41804]see Re: Large Arays[/msg])

tgoe is absolutely correct.

It's easy to overlook whats really wrong with performance in this implementation (as even i did). The problem isn't that your search algorithm is a bad performer. The problem is the fact that you're reading a HUGE chunk of data off the disk (disk i/o is the slowest computer operation) and storing it in limited memory (RAM).

imo, the best solution would be to do the entire operation on disk instead of loading anything into memory.

In terms of performance the preferable method would be to use an embedded DB system like BerkeleyDB, or a non embedded server implementation like MySQL (using propper indexing techniques), to store the words indexed on disk in a b-tree or a b+tree ( read about their performance here http://en.wikipedia.org/wiki/B-tree#The ... se_problem ). This way all you have to do is query the DB and the entire lookup operation is handled for you and its REALLY FUCKING FAST.
"If art interprets our dreams, the computer executes them in the guise of programs!" - SICP

Image

“If at first, the idea is not absurd, then there is no hope for it” - Albert Einstein
User avatar
thetan
Contributor
Contributor
 
Posts: 657
Joined: Thu Dec 17, 2009 6:58 pm
Location: Various Bay Area Cities, California
Blog: View Blog (0)


Re: Large Arays

Post by kaffeine on Sun Sep 19, 2010 5:16 am
([msg=46090]see Re: Large Arays[/msg])

The following line will cause a stack overflow:
Code: Select all
char array[10000000];

That is because arrays created that way are placed in stack and default stack size in g++ is 2Mbytes(correct me if I'm wrong).

If that's the case, the solution is dynamic allocation:
Code: Select all
char *array = new char[10000000];

Then you can access 'array' the way you normally access arrays.
kaffeine
New User
New User
 
Posts: 1
Joined: Sun Sep 19, 2010 5:02 am
Blog: View Blog (0)



Return to C and C++

Who is online

Users browsing this forum: No registered users and 0 guests