Python HTTP at Lightspeed ⚡ Part 2: urllib3 and requests (2023)

In my previous post I covered how to use the basehttpModule. Now let's go to a higher level and see how to use urllib3. Then we will reach even higher horizons as we learn about inquiries. But first, a quick definition of urllib and urllib3.

The backstory

Once upon a time, when people rocked Python 2, there were these libraries called httplib and urllib2. Then Python 3 happened.

In Python 3, httplib has been redesigned to http.client, which you saw in Part 1, and urllib2 has been split into several submodules in a new module called urllib. urllib2 and urllib contained a high-level HTTP interface that didn't bother you with the details of http.client (formerly httplib). Aside from that, this new URLIB was missing a long list of critical features, such as:

  • thread security
  • connection pooling
  • Client-side SSL/TLS inspection
  • File uploads with multipart encoding
  • Helpers for retrying requests and handling HTTP redirects
  • Support for gzip and deflate encoding
  • Proxy support for HTTP and SOCKS

To address these issues, urllib3 was created by the community. It's not a core Python module (and probably never will be), but it doesn't need to maintain compatibility with urllib.

urllib is not covered here because urllib3 can do almost everything it does and has some additional features, and the vast majority of programmers use urllib3 and queries.

Now that you know the difference between urllib and urllib3, here is an urllib example (the only one here) that uses thehttp.cookiejar.CookieJarClass from Part 1:

>>> import urllib.request>>> import http.cookiejar>>> Politics = http.Cookie tins.DefaultCookiePolicy(... blocked_domains=["anzeigen.net", ".ads.net"])>>> cj = http.Cookie tins.CookieJar(Politics)>>> opener = screaming.Inquiry.build_opener(screaming.Inquiry.HTTPCookieProcessor(cj))>>> R = opener.open("http://beispiel.com")>>> Str(Typ(R))"<Klasse 'http.client.HTTPResponse'>"

Installation

Neither urllib3 nor requests are included in a standard Python installation (if your Python was packaged from a distribution, they may be present). So they need to be installed with pip.pip3 install 'urllib3[secure, socks]' 'requests[socks]'should install it for you. Thesecurepart installs certificate-related packages required by urllib3 andParishinstalls SOCKS protocol related packages.

urllib3

Of course you have to import it firstimport urllib3, and for those of you reading Part 1, this is where it gets interesting. Instead of creating a connection directly, create onePoolManagerObject. This does the connection pooling and thread safety for you. There's one tooProxyManagerObject for forwarding requests via an HTTP/HTTPS proxy, and aSOCKSProxyManagerfor SOCKS4 and SOCKS5 proxies. It looks like this:

>>> import urllib3>>> out of urllib3.post.socks import SOCKSProxyManager>>> Proxy = urllib3.ProxyManager('http://localhost:3128/')>>> Proxy.Inquiry('RECEIVE', 'http://google.com/')>>> Proxy = SOCKSProxyManager('socks5://localhost:8889/')

Note that HTTPS proxies cannot connect to HTTP websites.

urllib3 also has a logger that logs many messages. You can optimize verbosity by importing and invoking the Logger moduleLogging.getLogger("urllib3").setLevel(dein_Level).

As aHTTP connectionimhttpmodule, urllib3 has aInquiry()Method. It is called aspoolmanager.request('GET', 'http://httpbin.org/robots.txt'). Similar tohttp, this method also returns a class namedHTTPAntwort.But don't be fooled!This is nohttp.client.HTTPResponse. That is aurllib3.response.HTTPResponse. The urllib3 version has some methods not defined inhttp, and these will prove very useful and convenient.

As this explainsInquiry()method returns aHTTPAntwortObject. It has aDataMember representing the response content in a JSON string (encoded as UTF-8 bytes). To check it you can use:

import jsonpress(json.charges(Answer.Data.decode('utf-8'))

Creating a query parameter

A query parameter looks likehttp://httpbin.org/get?arg=value. The easiest way to construct something like this is to have a string containing everything up to and including the question mark, and then pass the argument/value pairs to as a dictionaryurllib.parse.urlencode()(And,screaming) and concatenate this with your original string.

Here is a summary. Any parameter in this table that can be specified must be a dictionary. The response contains several JSON keys, some of which include:

parametersInquiry()JSON key as response
N / A"Origin"
headers"headers"
Felder(HEAD/GET/DELETE)"Arguments"
codedURLParameter (POST/PUT)"Arguments"
Felder(POST/PUT)"form"
codedBodymit Content-Type application/json inheaders"json"
'filefield': (filename, filedata, mime_type)InFelderParameter"files"
binary dataBodywith any content type inheadersParameter"Data"

HTTPS in urllib3

There is additional boilerplate code that needs to be added to use certificates and hence HTTPS in aPoolManager, but has the benefit of throwing an error if the connection can't be secured for some reason:

>>> import certificate>>> import urllib3>>> Swimming pool = urllib3.PoolManager(... cert_reqs='CERT_REQUIRED',... ca_certs=certificate.Wo())>>> Swimming pool.Inquiry('RECEIVE', 'https://google.com')(NO Exception)>>> Swimming pool.Inquiry('RECEIVE', 'https://abgelaufen.badssl.com')(throws urllib3.exceptions.SSLError)

Some extra goodies

Similar tohttp,urllib3Connections support request timeouts. For even more control, you can atime outobject to specify separate connection and read timeouts (all exceptions are obtained aturllib3.Exceptions):

>>> Swimming pool.Inquiry(... 'RECEIVE', 'http://httpbin.org/delay/3', time out=2.5)MaxRetryError caused von ReadTimeoutError>>> Swimming pool.Inquiry(... 'RECEIVE',... 'http://httpbin.org/delay/3',... time out=urllib3.time out(connect=1.0))<urllib3.Answer.HTTPAntwort>>>> Swimming pool.Inquiry(... 'RECEIVE',... 'http://httpbin.org/delay/3',... time out=urllib3.time out(connect=1.0, read=2.0))MaxRetryError caused von ReadTimeoutError

Something, thathttpdoes not have is repeating requests. urllib3 has this because it is a high-level library. It isDocumentationcouldn't explain it better:

urllib3 can automatically retry idempotent requests. The same mechanism also handles redirects. You can control the retries with the request() retries parameter. By default, urllib3 retries requests 3 times and follows up to 3 redirects.

To change the number of repetitions, simply specify an integer:

>>> Swimming pool.Requests('RECEIVE', 'http://httpbin.org/ip', repeated=10)

To disable all retries and redirection logic, specify retries=False :

>>> Swimming pool.Inquiry(... 'RECEIVE', 'http://nxdomain.example.com', repeated=INCORRECT)New connection error>>> R = Swimming pool.Inquiry(... 'RECEIVE', 'http://httpbin.org/redirect/1', repeated=INCORRECT)>>> R.Status302

To disable forwards but keep the retry logic, specify forward=False :

>>> R = Swimming pool.Inquiry(... 'RECEIVE', 'http://httpbin.org/redirect/1', redirect=INCORRECT)>>> R.Status302

Similar totime out, there is one tooRepeat-Object to set the maximum retries and redirects separately. It's done like this:retries=urllib3.Retry(3, redirect=2). The request is triggeredMaxRetryErrorwhen too many requests are made.

Instead of passing aRepeatObject for each request you can also specifyRepeatobject in thePoolManagerconstructor so that it applies to all requests. The same applies totime out.

Requests

Requests uses urllib3 under the hood and makes it even easier to make requests and get data. For one, keep-alive is 100% automatic, compared to urllib3 where it is not. It also has event hooks that invoke a callback function when an event is raised, e.g. B. when receiving a reply (but that is an advanced feature and is not covered here).

For requests, each request type has its own function. So instead of creating a connection or pool, you get (for example) a URL directly. Many of the keyword parameters used in urllib3 (see table above) can also be used identically for queries. See all exceptions belowRequests.Exceptions.

import RequestsR = Requests.receive('https://httpbin.org/get')R = Requests.Post('https://httpbin.org/post', Data={'Taste':'Wert'})R = Requests.set('https://httpbin.org/put', Data={'Taste':'Wert'})R = Requests.extinguish('https://httpbin.org/delete')R = Requests.Kopf('https://httpbin.org/get')R = Requests.options('https://httpbin.org/get')# You can disable redirects if you wantR = Requests.options('https://httpbin.org/get', Allow_redirects=INCORRECT)# Or set a timeout for the number of seconds a server has to respondR = Requests.options('https://httpbin.org/get', time out=0,001)# Set connection and reading timeouts at the same timeR = Requests.options('https://httpbin.org/get', time out=(3.05, 27))# How to pass query parameters ('None' keys are not added to the request):R = Requests.receive('https://httpbin.org/get', Parameter={'key1': 'value1', 'key2': 'Wert2'})# If a key has a list value, add a key/value pair for each value in the list:R = Requests.receive('https://httpbin.org/get', Parameter={'key1': 'value1', 'key2': ['Wert2', 'Wert3']})# Headers can also be added:R = Requests.receive('https://httpbin.org/get', headers={'User-Agent': 'my-app/0.0.1'})# And only in requests (not urllib3) is there a cookies keyword argument.R = Requests.receive('https://httpbin.org/get', Cookies=Diktat(cookies_are='Work'))

The value returned by these calls is yet another type of response object. This time it's oneInquiries.Answer(At least it wasn't anyone elseHTTPAntwort🙂). This object contains a wealth of information, e.g. B. the time the request took, the JSON of the response, whether the page was redirected and even your ownCookieJarType. Here is a running list of the most useful members:

  • r.status_codeAndr.ground: Numeric status code and human-readable reason.
  • URL: The canonical URL used in the request.
  • Text: The text retrieved from the request.
  • Contents: Die Bytes-Version vonText.
  • json(): Attempts to return the JSON ofText. ElevatedValueErrorif this is not possible.
  • coding: If you know the correct encoding forText, put it like this hereTextcan read properly.
  • apparent_encoding: The encoding that requests guessed it.
  • raise_for_status(): ElevatedRequests.Exceptions.HTTPErrorif the request encountered one.
  • OK: True ifStatuscodeis less than 400, otherwise False.
  • is_redirectAndis_permanent_redirect: Whether the status code was a redirect or whether it was a permanent redirect.
  • headers: headers in the response.
  • Cookies: Cookies in the response.
  • Story: All response objects from redirected URLs to get to the current URL, sorted from oldest to newest.

Here's how you would save the response output to a file:

with open(file names, 'wb') if fd: for Piece In R.iter_content(chunk_size=128): fd.write(Piece)

Here's how to stream uploads without reading the entire file:

with open('solid body', 'rb') if F: Requests.Post('http://some.url/streamed', Data=F)

In the event of a network error, requests are triggeredconnection error. When the request timeout expires, it is triggeredtime out. And when too many redirects have been made, it increasesTooManyRedirects.

Proxys

HTTP, HTTPS, and SOCKS proxies are supported. Inquiries is also sensitive to theHTTP-PROXYAndHTTPS_PROXYEnvironment variables, and when set, requests automatically use these values ​​as proxies. Within Python, you can specify the proxies to use in the parameter:

# Instead of socks5 you could also use http and https.Proxys = { 'http': 'socks5://user:pass@host:port', 'https': 'socks5://user:pass@host:port'}Requests.receive('http://beispiel.org', Proxys=Proxys)

session objects

Ameetingcan persist cookies and some parameters across requests and reuse the underlying HTTP connection for the requests. It uses a urllib3PoolManager, which significantly increases the performance of HTTP requests to the same host. It also has all the main request API methods (all the request methods you saw above). They can also be used as context managers:

with Requests.meeting() if S: S.receive('https://httpbin.org/cookies/set/sessioncookie/123456789')

And we're done

This concludes the Python HTTP series. Are there any mistakes here? Let me know so I can fix them.

Top Articles
Latest Posts
Article information

Author: Maia Crooks Jr

Last Updated: 04/07/2023

Views: 5777

Rating: 4.2 / 5 (43 voted)

Reviews: 82% of readers found this page helpful

Author information

Name: Maia Crooks Jr

Birthday: 1997-09-21

Address: 93119 Joseph Street, Peggyfurt, NC 11582

Phone: +2983088926881

Job: Principal Design Liaison

Hobby: Web surfing, Skiing, role-playing games, Sketching, Polo, Sewing, Genealogy

Introduction: My name is Maia Crooks Jr, I am a homely, joyous, shiny, successful, hilarious, thoughtful, joyous person who loves writing and wants to share my knowledge and understanding with you.