" I think the very concept of an elite commission deciding for the American people who deserves to be heard is profoundly wrong." --former Congressman Newt Gingrich on the "Commission on Presidential Debates"
Published by: thedotmaster, on 2009-08-18 09:10:19
Web Interaction Using Python
Introduction
In a number of the HTS programming missions you are asked to interact with the site from a program that you have written, as opposed to using a webbrowser. There are plenty of other applications for web interaction, however. I have written a few python scripts to download various data from websites (e.g. http://python.pastebin.com/f268e6319 )
I will cover two ways of getting data from a website (and in fact, sending data too). If there are any problems with the article, leave a comment.
All examples have been written in Python 2.6. There are quite a few differences between 2.6 and 3.0, but the only ones that should apply in the code snippets in this article involve the print function.
In Python 2.6 a simple hello world is this:
CODE :
print "Hello World"
In Python 3.0 it looks like this:
CODE :
print("Hello World")
It's a good idea, and I will switch to 3.0 when it is finally worn in, but for the moment I'm sticking with 2.6.
If there are problems with any of the code running as 3.0, try using the 2to3 script (It came preinstalled with Xubuntu for me.. not sure about on windows etc).
Anyway, now that's all covered, on with the article.
The Url Libraries
First of all we will start with a tutorial on the URL libraries. These are urllib and urllib2.
Let's immediately get started with some code.
CODE :
Pretty simple code really, and for a lot of things it's all you need to know. It fetches the website "http://example.com" and stores the data as an instance on which we use the read() function to return the data retrieved from the site. Here are the functions: instance.read() This returns the data retrieved from the site. instance.info() This returns the HTTP message from the server, it has a lot of useful information in it including cookie info and server type. instance.geturl() Returns the URL that was requested - seems pointless but we'll cover it in a second and you'll see why there is a point. instance.getcode() Returns the HTTP status code. (e.g. 404, 200)
It's worth messing around with those a bit, rather than just taking my word for what they do.
I'll now just show a use of the geturl() function:
CODE :
import urllib2
url = "http://google.com" # After google, try 'http://example.com'
website = urllib2.urlopen(url)
if url == website.geturl():
print "Website not redirected."
else:
print "Website redirected you."
Why you'd want to do that, I don't know, but there's bound to be a use for it sometime. But that is one application of the geturl() function anyway.
Let's do a HTTP POST request now. They're pretty easy really, but can look a little complicated, so don't worry.
Before you look at the code, you might want to set up a server (or get some webspace) so you can test this out. A little PHP script like below will do the trick:
CODE :
<?php
echo $_POST['test'];
?>
And before anyone says anything about XSS - get lost - it's a testpage that will be up for 10 minutes on a server that noone cares about. But if you really are that bothered, you can use strip_tags() around that. (I say this because I can tell there'll be someone who will try and pipe up a clever comment).
Now then, we'll be introducing a new module for this (though it isn't strictly necessary, it's the best way I reckon). I will import the single function as we don't need any other functions from the module.
Okay, let's go:
CODE :
import urllib2
from urllib import urlencode # new module and function
url = "http://localhost/test.php"
data = {'test':'lolwut'}
# you can add as much info as you want to this dictionary
# "test" is the label for the data, so that PHP script above
# should display "lolwut".
encoded_data = urlencode(data)
# remember that this is from that imported module, normally you'd
# use this: urllib.urlencode(data) if you used a normal import.
website = urllib2.urlopen(url, encoded_data)
print website.read() # That was pretty easy, right?
Pretty straightforward, right?
Let's go onto HTTP Basic Authentication. This is more tricky. Here's the skeleton code for opening more advanced things, including HTTP authentication.
CODE :
Okay, that's a lot more complicated. Note the "openerDirective"s. They are basically a way of adding headers to the urlopen requests.
You can have numerous opener directives, or just the one. You build them into an opener using the build_opener() function then install it, using install_opener(). After that, you can request a site and it will include the headers that you have specified.
Let's look at creating a HTTP Basic Authentication header.
I plan to write another article soon about cookies in Python, both as part of CGI and as part of requests with Urllib2.
Now I will move onto sockets and raw HTTP requests, and include cookies in that.
Socket Programming in Python
Socket programming is a really useful thing to learn - it's a must really, especially if you want to learn about security.
Again, we'll get some code out there straight away:
CODE :
import socket
s = socket.socket()
host = "www.example.com"
port = 80
addr = (host, port)
s.connect(addr)
s.send("Something to send..")
print s.recv(1024)
# 1024 is the buffer size, you don't need to worry about it
# much right now.
s.close()
There we are. We've created a socket, connected to "www.example.com" on port 80 then sent "Something to send.." and received something back, which has been printed out. Then we closed the socket, which isn't strictly necessary - but good practice.
Here's some better stuff to send, however:
CODE :
GET /index.html HTTP/1.1\r\n
Host: www.example.com\r\n
That's a simple HTTP GET request, asking for "index.html".
Here's a post request:
CODE :
POST /index.php HTTP/1.1\r\n
Host: www.example.com\r\n
Content-Length: 11\r\n
\r\n
hello=world\r\n
Now let's add a cookie to a HTTP GET:
CODE :
GET /index.html HTTP/1.1\r\n
Host: www.example.com\r\n
Set-Cookie: hello=world\r\n
There are other socket modes that can be set, this article is a very basic introduction. I would recommend reading this article if you want to learn more: http://www.amk.ca/python/howto/sockets/
Conclusion
Hopefully this article will help you begin to interact with the Internet using Python. It's just the beginning and I will work on follow-up articles. Good luck and thanks for reading.
dotty.
Cast your vote on this article 10 - Highest, 1 - Lowest
Comments: Published: 16 comments.
HackThisSite is the collective work of the HackThisSite staff, licensed under a CC BY-NC license.
We ask that you inform us upon sharing or distributing.