Profile

Click to view full profile
Hi, I'm Veerapat Sriarunrungrueang, an expert in technology field, especially full stack web development and performance testing.This is my coding diary. I usually develop and keep code snippets or some tricks, and update to this diary when I have time. Nowadays, I've been giving counsel to many well-known firms in Thailand.
view more...

Saturday, May 28, 2011

Fetch Web Pages in Python using urllib2

urllib2 is an extensible library for opening URLs. Mostly, we will use fetch data from URLs via HTTP. We can use it to access normal Website, HTTP Authentication, Web Proxy, and etc. urllib2 is included in python standard library.

Example:

import urllib2
# Normal usage
f = urllib2.urlopen('http://wwww.google.com')
print f.read()

# Use under Proxy
auth_handler = urllib2.ProxyBasicAuthHandler(urllib2.HTTPPasswordMgrWithDefaultRealm())
auth_handler.add_password(realm=None,uri='proxy-name',user='username',passwd='password')
opener = urllib2.build_opener(auth_handler)
urllib2.install_opener(opener)
f = urllib2.urlopen('http://wwww.bing.com')
print f.read()

In the example, It use GET to fetch data. If you want to change it to POST, you have to put data attribute in the method.

The basic usage is "urllib2.urlopen(url[, data][, timeout])".

More details at: http://docs.python.org/library/urllib2.html

1 comment:

  1. Working with python is awesome.I am scraper and doing scraping since last 5+ years.I have made custom scraper for my clients.

    ReplyDelete