web crawlers

Wired and wireless telecommunications networks and devices

web crawlers

Post by pretentious on Tue Sep 17, 2013 11:57 pm
([msg=77392]see web crawlers[/msg])

I made a python script to request urls, parse the response for more urls, add them to a list and move on to the next one. I started with yahoo and pulled about 600 links from their main page and i've now got more than 60,000 links in my database. I figure some of you guys have played around with similar stuff, should i be carerful about sending hundreds of http requests to a domain in automation? Like is this just a day in the life of the network admin dealing with this stuff or could someone actually get angry at me for this?
Goatboy wrote:Oh, that's simple. All you need to do is dedicate many years of your life to studying security.

IF you feel like exchanging ASCII arrays, let me know ;)
pretentious wrote:Welcome to bat country
User avatar
pretentious
Contributor
Contributor
 
Posts: 607
Joined: Wed Mar 03, 2010 12:48 am
Blog: View Blog (0)


Re: web crawlers

Post by -Ninjex- on Wed Sep 18, 2013 12:35 am
([msg=77393]see Re: web crawlers[/msg])

If you draw too much noise, they will probably ban your IP.
Usually if you take a look into the TOS, they will explain in detail what you are and are not allowed to do.

Google for instance asks for you to use their API which limits to the amount of pages you can pull per day. If you instead decide to scrap their engine and request 100's of thousands of URL's a day, you will end up banned.

(I believe, if I remember correctly)
If you're not willing to learn, no one can help you. If you're determined to learn, no one can stop you.⠠⠵
The absence of evidence is not evidence of absence.
I can explain it for you, but I can't understand it for you.
User avatar
-Ninjex-
Addict
Addict
 
Posts: 1221
Joined: Sun Sep 02, 2012 8:02 pm
Blog: View Blog (0)


Re: web crawlers

Post by tgoe on Wed Sep 18, 2013 1:55 am
([msg=77394]see Re: web crawlers[/msg])

At least pretend to honor the robots.txt. Some sites are more specific than others, LOL.
User avatar
tgoe
Contributor
Contributor
 
Posts: 638
Joined: Sun Sep 28, 2008 2:33 pm
Location: q3dm7
Blog: View Blog (0)



Return to Telecommunications

Who is online

Users browsing this forum: No registered users and 0 guests