sync350 wrote:nathandelane, What language would be best for the Web crawler program? I've been using perl, but trying to ease my way back into C++. I'm thinking that perl would be easier to make that with, but could it be done in C++?
I've programmed web crawlers in Ruby, C#, VBScript, and Java. The best language in my opinion would be one the implements a Http Request class of some kind, unless you can program your own from the Socket level classes. .NET implements this very well in System.Net.HttpWebRequest. Java, Ruby, and VBScript do something similar. Perl probably does this in some way, but I don't know Perl very well. Basically look for a Net namespace, package or library. In that look for an Http object or a WebRequest object, or something similar. Besides that you have to parse HTML somehow. Regular expressions work well but are relatively slow. In C# I use a framework named HtmlAgilityPack, which parses it for you and allows you to look for specific XML-based tags by using XPath.
I hope that helps you know where I'm coming from.
You could also easily program a web crawler in web-languages, like JavaScript, PHP, and ASP or ASP.Net (which is either VB.NET or C#.NET based depending on your personal preference).