How to use urllib2 in Python - (2023)


While the title of this post is Urllib2, we will show a few
Examples of where to use urllib as they are often used together.

This will be an introductory post of urllib2 which we will go to
Focus on fetching urls, requests, posts, user agents and error handling.

For more information, see the official documentation.

Also, this article was written for Python version 2.x

HTTP is based on requests and responses - the client makes requests and
Servers send replies.

A program on the Internet can work as a client (access resources) or as a
a server (provides services).

A URL identifies a resource on the Internet.

What is Urllib2?

urllib2is a Python module that can be used to fetch URLs.

It defines functions and classes to support URL actions (Basic and Digest
authentication, redirects, cookies, etc.)

The magic starts with importing the urllib2 module.

What is the difference between urllib and urllib2?

While both modules do URL request related things, they have different ones

urllib2can accept a Request object to set the headers for a URL request,
screamingaccepts only one url.

screamingprovides the urlencode method used for generation
of GET query strings,urllib2does not have such a function.

Because of thisscreamingAndurllib2are often used together.

See the documentation for more information.


What is Urlopen?

urllib2provides a very simple interface in the form of the urlopen function.

This feature is capable of fetching URLs using a variety of different protocols
(HTTP, FTP, …)

Just pass the URLVacation()get one„dateiartic“Handle the removed data.

(Video) Urllib - GET Requests || Python Tutorial || Learn Python Programming

Additionally,urllib2provides an interface to handle common situations –
like basic authentication, cookies, proxies and so on.

These are provided by objects called handlers and openers.

Get URLs

This is the most basic way of using the library.

Below is how to make a simple request using urllib2.

Start by importing the urllib2 module.

Put the answer in a variable (answer)

The response is now a file-like object.

Read the data from the response into a string (html)

Do something with this string.

noteIf the URL contains a space, you must parse it with urlencode.

Let's see an example of how this works.

import urllib2response = urllib2.urlopen('')print = do somethingresponse.close() # best practice for closing the file Note: you can also use a URL starting with "ftp:", "file:" etc.).

The remote server accepts the incoming values ​​and formats a plain text response
to send back.

The return value ofVacation()allows access to the headers of the HTTP server
by thedie Info()method and the data for the remote resource via methods like
read()Andreading lines ().

Also the file object returned byVacation()is iterable.

Simple urllib2 script

Let's show another example of a simple urllib2 script

import urllib2response = urllib2.urlopen('')print "Response:", response# Gets the URL. This gets the real URL. print "The URL is: ", response.geturl()# Get the code print "This gets the code: ", response.code# Get the headers. # This returns a dictionary-like object describing the retrieved page, # specifically the headers sent by the server .info()['date']# Get the server part of the header sprint "The server is: ",[' server']# Get all datahtml = "Get all data: ", html# Get only the length print "Get the length :", len(html)# Shows that the file object is iterable for the response line: print line.rstrip()# Note that the rstrip removes the trailing newlines and carriage returns before # printing the output.

Download files with Urllib2

This little script downloads a file from the website

import urllib2# file to write tofile = "downloaded_file.html"url = ""response = urllib2.urlopen(url)#open thefile for writingfh = open(file, "w")# read from request while writing to filefh.write( You can also use the with statement:with open(file, ' w') as f: f.write(

The difference in this script is that we use "wb" which means we use the "
File binary.

import urllib2mp3file = urllib2.urlopen("")output = open('test.mp3','wb')output.write( .schließen()

Urllib2 requests

The Request object represents the HTTP request that you make.

In its simplest form, you create a request object that specifies the URL you want

Calling urlopen with this request object returns a response object for the URL

The request function under the urllib2 class accepts both URL and parameters.

(Video) python urllib2 module

If you don't include the data (and only pass the URL), the request will be made
is actually a GET request

If you include the data, the request made is a POST request, where the
url is your post url and the parameter is http post content.

Let's look at the example below

import urllib2import urllib# Specify urlurl = ''# This packs the request (it doesn't) request = urllib2.Request(url)# Sends the request and catches the response response = urllib2 .urlopen (request)# Extract the response html = Print it out and print html

You can set the outgoing data on the request to be sent to the server.

In addition, you can pass on additional data information (“metadata”) about the data or
the About request itself to the server - this information is sent as HTTP

If you want to POST data, you must first create the data in a dictionary.

Make sure you understand what the code is doing.

# Prepare the dataquery_args = { 'q':'query string', 'foo':'bar' }# This url encodes your data (that's why we need to import urllib above)data = urllib.urlencode(query_args)# Send HTTP POST requestrequest = urllib2.Request(url, data)response = urllib2.urlopen(request) html = Print the result print html

user agents

The way a browser identifies itself is through the User-Agent header.

By default, urllib2 identifies itself asPython-urllib/x.y
where x and y are the major and minor version numbers of the Python version.

This could confuse the website or simply not work.

With urllib2 you can add your own headers with urllib2.

The reason you want to do that is because some websites don't like it
searched for programs.

When you create an application that accesses other people's web resources,
It's polite to include real user agent information in your requests,
so that they can more easily identify the source of the hits.

When you create the request object, you can add your headers to a dictionary,
and use add_header() to set the user agent value before opening the request.

That would look something like this:

# Import module import urllib2# Define url = ''# Add your headersheaders = {'User-Agent' : 'Mozilla 5.10'}# Build the Inquiry. request = urllib2.Request(url, None, headers)# Get the response response = urllib2.urlopen(request)# Print the headersprint response.headers

You can also add headers with add_header().

Syntax: Request.add_header(key, val)


The example below uses Mozilla 5.10 as the user agent, and that's something too
appears in the web server log file.

import urllib2req = urllib2.Request('')req.add_header('User-agent', 'Mozilla 5.10')res = urllib2.urlopen(req)html = html

This is shown in the log file.
„GET /HTTP/1.1? 200 151 „-“ „Mozilla 5.10?


The urlparse module provides functions for parsing URL strings.

(Video) Python Tips - Simple File Downloader Using urllib2 Module

It defines a standard interface to break the Uniform Resource Locator (URL).
Strings into several optional parts called components, known as
(Schema, Location, Path, Query and Fragment)

Suppose you have a URL:

Thethe planwould be http

TheLocationwould be

TheAwayis index.html

We do not haveInquiryAndFragment

The most common functions are urljoin and urlsplit

import urlparseurl = ""domain = urlparse.urlsplit(url)[1].split(':')[0]print "The domain name of the URL is: ", domain

For more information on urlparse, see the officialDocumentation.


If you're passing information through a URL, you need to make sure it uses only
certain allowed characters.

Allowed characters are all alphabetic characters, digits and some special characters
Characters that have meaning in the URL string.

The most commonly encoded character is theSpaceCharacter.

You'll see this character whenever you see a plus sign (+) in a URL.

This represents the space.

The plus sign acts as a special character that represents a space in a URL

Arguments can be passed to the server by encoding and appending them
to the URL.

Let's look at the example below.

import urllibimport urllib2query_args = { 'q':'query string', 'foo':'bar' } # You must pass a dictionary encoded_args = urllib.urlencode(query_args)print 'Encoded:', encoded_argsurl = 'http://python .org/?' + encoded_argsprint urllib2.urlopen(url).read()

If I were to print that now, I would get an encoded string like this:

Python's URL code takes variable/value pairs and creates a proper escape file
Query String:

from urllib import urlencodeartist = "Kruder & Dorfmeister"artist = urlencode({'ArtistSearch':artist})

This equates the artist variable to:

Output : ArtistSearch=Kruder+%26+Dorfmeister

error handling

This section of error handling is based on the information from the great article:
Urllib2 - The missing manual

(Video) Python Basics Urllib Urlopen

uropen increasedURL errorif it cannot process a response.

HTTP erroris the subclass ofURL errortriggered in the special case of HTTP URLs.

URL error

URLError is often thrown because there is no network connection,
or the specified server does not exist.

In this case, the thrown exception has a "reason" attribute,
This is a tuple containing an error code and a textual error message.

Example of URLError

req = urllib2.Request('')try: urllib2.urlopen(req)außer URLError, e: print e.reason(4, 'getaddrinfo failed')
HTTP error

Each HTTP response from the server includes a numeric "status code".

Sometimes the status code indicates that the server cannot fulfill
the request.

The default handlers handle some of these responses for you (e.g.
if the response is a "redirect" asking the client to fetch the document
from another URL, urllib2 will do that for you).

For those it can't handle, urlopen throws an HTTPError.

Typical errors are "404" (page not found), "403" (request forbidden),
and "401" (requires authentication).

If an error is thrown, the server responds by returning an HTTP error code
and an error page.

You can use the HTTPError instance as a response on the returned page.

This means that in addition to the code attribute, there are also read, geturl,
and information, methods.

req = urllib2.Request('')try: urllib2.urlopen(req)außer URLError, e: print e.code print
from urllib2 import Request, urlopen, URLErrorreq = Request(someurl)try: response = urlopen(req)except URLError, e: if hasattr(e, 'reason'): print 'We could not reach a server.' print 'Reason: ', e.reason elif hasattr(e, 'code'): print 'The server could not fulfill the request.' print 'Error code: ', e.codeelse: # everything is fine

Please take a look at the links below to learn more about Urllib2

Sources and further reading


Recommended Python training

Course: Python 3 for beginners

(Video) Python Urllib2 (1)

15+ hours of video content with guided instruction for beginners. Learn how to build real applications and master the basics.

Sign up now


1. Python HTTP Request (urllib2)
2. Python 3 Programming Tutorial - urllib module
3. Python Urllib Introduction For HTTP Task | Send Request | Get Response
(Parwiz Forogh)
4. Open url with python 🐍
(Mr Code)
5. [SOLVED] python urllib2- Python3 error: "Import error: No module name urllib2"
(World Crawler)
6. Python: code to open a URL using urllib library
(The Indian Geek)
Top Articles
Latest Posts
Article information

Author: Madonna Wisozk

Last Updated: 03/19/2023

Views: 5787

Rating: 4.8 / 5 (48 voted)

Reviews: 87% of readers found this page helpful

Author information

Name: Madonna Wisozk

Birthday: 2001-02-23

Address: 656 Gerhold Summit, Sidneyberg, FL 78179-2512

Phone: +6742282696652

Job: Customer Banking Liaison

Hobby: Flower arranging, Yo-yoing, Tai chi, Rowing, Macrame, Urban exploration, Knife making

Introduction: My name is Madonna Wisozk, I am a attractive, healthy, thoughtful, faithful, open, vivacious, zany person who loves writing and wants to share my knowledge and understanding with you.