Cara menggunakan python-requests preparedrequest

Requests merupakan modul Python yang bisa kamu gunakan untuk mengirim berbagai request HTTP. Requests adalah library yang memiliki banyak fitur mulai dari melempar parameter dalam URL sampai mengirim header khusus dan verifikasi SSL.

HTTP mendefinisikan seperangkat metode permintaan untuk menunjukkan tindakan yang diinginkan yang akan dilakukan untuk sumber daya tertentu. Meskipun mereka juga bisa menjadi kata benda, metode permintaan ini kadang-kadang disebut sebagai verba HTTP. Masing-masing menerapkan semantik yang berbeda, namun beberapa fitur umum dapat digunakan bersamaan, contohnya Metode permintaan dapat berupa safe, idempotent, atau cacheable.

Pada kesempatan kali ini kita akan mempelajari lebih jauh tentang Python Requests. selama ini kita sudah sering mendengar atau bahkan menggunakan http methods GET dan POST. taukah kamu bahwa HTTP Methods memiliki sedikitnya 7 methods apa saja itu ?

Requests is an elegant and simple Python library built to handle HTTP requests in python easily. It allows you make GET, POST, PUT and other types of requests and process the received response in a flexible Pythonic way.

Contents

  1. Introduction to Requests Library
  2. What is a GET and POST request?
  3. GET Method
  4. Status Code
  5. Contents of the Response Object
  6. The Content
  7. Full HTML source as Text
  8. Retrieving an image from website
  9. Headers
  10. How to set Query String Parameters
  11. POST Method
  12. PUT Method
  13. DELETE Method
  14. PATCH Method
  15. HEAD Method
  16. Request Header
  17. Inspecting the request made
  18. Authentication
  19. Time out
  20. SSL Certification

Introduction to Requests Library

Requests is an elegant and simple Python library built to handle HTTP requests in python easily. But what is a HTTP request? HTTP is a set of protocols designed to enable communication between clients and servers.

A client is typically a local computer or device similar to what you are using to view this page. A HTTP request is the message sent (or received) from your local computer to a web server hosting, typically hosting a website.

For example, when you go to any internet website from your browser, the browser is sending a HTTP request and receives an appropriate ‘response’ from the host server.

Requests is an easy-to-use library with a lot of features ranging from passing additional parameters in URLs, sending custom headers, SSL Verification, processing the received response etc.

What is a GET and POST request?

A GET request is used to request data from a specific server. It is the most common type of request.

This is synonymous to you visiting the homepage of a website from your browser. Another common type of request is the POST request, which is used to send data to a host server for further processing, like, to update a resource, such as a database.

What is this synonymous to in real world?

For example, most data that you submit through forms in various websites is sent and processed as a POST request. Besides this, using requests you can add additional content like header information, form data, multipart files, and parameters via simple Python libraries. You don’t need to manually append the query strings to your URLs.

What does that practically mean?

For example, if you search for something, say the string ‘babies’ in google, your browser sends a GET request to Google server by appending the query string to the url.

So if you check the url in your address bar, you should see something like:

print(r.status_code)
7 Sometimes, there are more information making the query strings complex to construct.

With

print(r.status_code)
8 library, you don’t have to explicity construct such query strings. But rather pass it as additional parameter to
print(r.status_code)
9.

What makes requests really stand out is, the received response is packages as a standardized

200
0 object. It will contain all the response data (status, content, cookies etc). This makes further inspection and manipulation reliable and convenient.

Let’s start by downloading the requests library using:

200
1.

MLPlus Industry Data Scientist Program

Struggling to find a well structured path for Data Science?

Build your data science career with a globally recognised, industry-approved qualification. Solve projects with real company data and become a certified Data Scientist in less than 12 months and get Guaranteed Placement. .

Tap to know more

Cara menggunakan python-requests preparedrequest

Get Free Complete Python Course

Build your data science career with a globally recognised, industry-approved qualification. Get the mindset, the confidence and the skills that make Data Scientist so valuable.

!pip install requests

Then import the library to use it in your program use the

200
2 command.

import requests
from pprint import pprint  # prettyprint

Now let’s look into the details of request library and some of the important features.

GET Method

Lets try to get some information from the official python website – https://www.python.org/. You can use any public site. All you need to do is call

200
3.

When you ping a website or portal for information it’s considered as ‘making a request’.

print(r.status_code)
9 is used exactly for this purpose. You need to specify the web address that you need to ping as an argument to the function.

r = requests.get('https://www.python.org/')

The information that we got from the website will be stored in the

200
0 object we created
200
6. You can extract many features from this response object, like if you need to get the cookies that server sent, all you need to do is print
200
7.

Now as I have requested the data from the website, to check whether it has worked properly, let’s try priniting the

200
6 value.

print(r)
<Response [200]>

You can see that we have got

200
9. Let’s look into what this means.

STATUS Code

Status codes are issued by a server in response to a client’s request made to the server. Use the

if r.status_code == 200:
    print('Success')
elif r.status_code == 404:
    print("Page not found")
0 command to return the status code for your request.

print(r.status_code)
200

We have got a response of 200 which means the request is success. A response of 200 means Success. A response of 300 means Redirected. A response of 400 means Client Error.

A response of 500 means Server Error. A response of 404 means Page Not Found Error. In general, any status code less than 400 means, the request was processed successfully.

If it is 400 and above, some sort of error occurred. You may need to use the if else statement about whether to further proceed into the code depending upon the status code you recieved from the server, as you don’t need to further run your program after you have got an error. Use the following code below in that case:

if r.status_code == 200:
    print('Success')
elif r.status_code == 404:
    print("Page not found")
Success

 

 

Contents of the Response object

if r.status_code == 200:
    print('Success')
elif r.status_code == 404:
    print("Page not found")
1 function is used to get details about what all useful information we can get from the data we retrived.

r_attribs = [c for c in dir(r) if not c.startswith("_")]
r_attribs
import requests
from pprint import pprint  # prettyprint
0

You can see that there are several commands available such as

if r.status_code == 200:
    print('Success')
elif r.status_code == 404:
    print("Page not found")
2,
if r.status_code == 200:
    print('Success')
elif r.status_code == 404:
    print("Page not found")
3,
if r.status_code == 200:
    print('Success')
elif r.status_code == 404:
    print("Page not found")
4,
if r.status_code == 200:
    print('Success')
elif r.status_code == 404:
    print("Page not found")
5 etc. You can also use
if r.status_code == 200:
    print('Success')
elif r.status_code == 404:
    print("Page not found")
6 command to get more info about each of these. I’m showing some of the help info that’s useful below.

import requests
from pprint import pprint  # prettyprint
1
import requests
from pprint import pprint  # prettyprint
2

The Content

The output from the

print(r.status_code)
9, that is, the
200
0 object contains many useful information. Use the
if r.status_code == 200:
    print('Success')
elif r.status_code == 404:
    print("Page not found")
9 command to get the access to the raw data we recieved as output.

This is the raw output of the html content behind the URL we requested, which in this case is https://www.python.org/.

import requests
from pprint import pprint  # prettyprint
3
import requests
from pprint import pprint  # prettyprint
4

While

Success
0 gives you access to the raw bytes of the response, you need to convert them into a string using a character encoding such as UTF-8.

You get that directly by using another stored attribute in the response called the

Success
1.

The Full HTML source as Text

Use the

Success
2 command to get the content from the website as a unicode response. Unicode is a standard way for encoding characters. Unicode string is a python data structure that can store zero or more unicode characters. This output should match with what you see when you right click on the webpage and view its page source.

import requests
from pprint import pprint  # prettyprint
5
import requests
from pprint import pprint  # prettyprint
6

Retrieving an image from the website

Use the same

print(r.status_code)
9 command to retrieve the image. I am using the image from the url –
Success
4 The received response is also a
200
0 object. The image is stored in
if r.status_code == 200:
    print('Success')
elif r.status_code == 404:
    print("Page not found")
9, which you can write to a file.

This means, whatever be the content of the received response, be it text or image, it is stored in the

if r.status_code == 200:
    print('Success')
elif r.status_code == 404:
    print("Page not found")
4 attribute.

import requests
from pprint import pprint  # prettyprint
7

The image from the website will be downloaded to the folder in which you are running the program.

Headers

Most webpages you visite will contain header, which contains various metadata. Use the

Success
8 command to access the information in the header of the page. What does
Success
9 contain? If you look at the output from the
Success
9, you’ll see that it is actually a serialized JSON content.

More information like metadata about the response, it is stored in the header. It gives you many information such as the content type of the response payload, a time limit on how long to cache the response, and more. This will return you a dictionary-like object, allowing you to access header values by key.

import requests
from pprint import pprint  # prettyprint
8
import requests
from pprint import pprint  # prettyprint
9

As you can see, it gives information about the content type, last modified date, Age of the website etc.. You can access each one of these by considering the ouput from the function as dictionary.

r = requests.get('https://www.python.org/')
0
r = requests.get('https://www.python.org/')
1

Advanced Functions

Now as we have looked into the basics of the requests library, lets dive into some of the advanced functions From now onwards, I will be using the website – http://httpbin.org/ to retrive as well as send and store information. Let’s try the commands that you have learned so far.

How to Set Query String Parameters

Often the response given by the server differs based on the query you send. Like, you want to view the 2nd page of a 10 page article, instead of the first page.

Or you want to search for a particular term in a website. In such cases, you will send additional parameters as part of the URL as a query. For example:

print(r.status_code)
7 will return the search results of ‘babies’.

Depending on the device you are using, location, referring source, user etc, these queries can easily get complicated. So instead of adding it in the url directly, using

print(r.status_code)
9, you can pass it as a separate parameter using the
r_attribs = [c for c in dir(r) if not c.startswith("_")]
r_attribs
3 argument. Let’s add couple of parameters to the query string, that is,
r_attribs = [c for c in dir(r) if not c.startswith("_")]
r_attribs
4 and
r_attribs = [c for c in dir(r) if not c.startswith("_")]
r_attribs
5 of
r_attribs = [c for c in dir(r) if not c.startswith("_")]
r_attribs
6.

This essentially translates as “http://httpbin.org/?page=5&count=10“.

r = requests.get('https://www.python.org/')
2
r = requests.get('https://www.python.org/')
3

You can see that I have first created a dictionary for the parameters and then I passed it into the

r_attribs = [c for c in dir(r) if not c.startswith("_")]
r_attribs
7 function. And we got a json response back from the httpbin website.

To check whether the parameter passing has worked properly, use

r_attribs = [c for c in dir(r) if not c.startswith("_")]
r_attribs
8 to check the parameters we have passed.

r = requests.get('https://www.python.org/')
4
r = requests.get('https://www.python.org/')
5

It has worked properly as the page is set to 5 and the count is set to 10. You can also pass it as a tuple or a byte which will give the same output.

r = requests.get('https://www.python.org/')
6
r = requests.get('https://www.python.org/')
7

POST Method

The POST method is used to submit data to be further handled by the server. The server typically understands the context and knows what to do with the data.

Generally it’s used while submitting a web form or when uploading a file to the server. The

r_attribs = [c for c in dir(r) if not c.startswith("_")]
r_attribs
9 function allows you to do this. Let’s look into an example with
r_attribs = [c for c in dir(r) if not c.startswith("_")]
r_attribs
6 website.

r = requests.get('https://www.python.org/')
8
r = requests.get('https://www.python.org/')
9

You can see in the form type the

import requests
from pprint import pprint  # prettyprint
01 and the
import requests
from pprint import pprint  # prettyprint
02 has been recorded. If you need to pass some form values into it, then you need to look into the source of the url and find out what kind of values the form expects.

To process the received json response, iterate through the contents of

import requests
from pprint import pprint  # prettyprint
03.

Or if you know what the contents, you can access it directly as you would with a

import requests
from pprint import pprint  # prettyprint
04.

print(r)
0
print(r)
1

Post function can be used to send large amount of data (text / binary) data.

PUT Method

The PUT method requests that the data you are sending to be stored under the supplied URL. If the URL refers to an already existing resource, it is modified and if the URL does not point to an existing resource, then the server creates the resource with that URL. As you can see, the PUT is somewhat similar in functionality to POST.

So what is the difference between PUT and POST? The difference is, the POST method sends data to a URI and the the receiving resource understands the context and knows how to handle the request. Whereas, in a PUT method, if there is a file in the given URI, it gets replaced. And if there isn’t any, a file is created.

Besides, not matter how many times you execute a given PUT request, the resulting action is always the same. This property is called idempotence. Whereas, for a POST method, the response need not always be the same. This makes the PUT method idempotent and the POST method is not. To make a PUT request, use the

import requests
from pprint import pprint  # prettyprint
05 method.

print(r)
2
print(r)
3

Generally in practice,

import requests
from pprint import pprint  # prettyprint
06 function is used for updating operations and
import requests
from pprint import pprint  # prettyprint
07 function is used for creating operations.

DELETE Method

The delete() method sends a DELETE request to the specified url. DELETE requests are made for deleting the specified resource (file, record etc). A successful response should be:

  1. 200 (OK) if the response includes an entity describing the status.
  2. 202 (Accepted) if the action has not yet been enacted
  3. 204 (No Content) if the action has been enacted but the response does not include an entity.
print(r)
4
print(r)
5

The

import requests
from pprint import pprint  # prettyprint
08 function will request the server to delete a resource that you have specified in the URL. The client, however, cannot be guaranteed that the operation has been carried out.

PATCH Method

The PATCH method is a request method supported by the HTTP protocol for making partial changes to an existing resource. The main difference between the PUT and PATCH method is that the PUT method uses the request URL to supply a modified version of the requested resource.

And it replaces the original version of the resource. Whereas, the PATCH method only supplies a set of instructions to modify the resource. This means, the PATCH request needs to contain only the changes that needs to be applied to a resource and not the entire resource. Although it resembles PUT, it typically contains a set of instructions that tell how a resource residing at a URI should be modified to produce a new version. Use the

import requests
from pprint import pprint  # prettyprint
09 command to implement this. So when to use PATCH? Whenever you want to make only partial changes to the resource.

print(r)
6
print(r)
7

HEAD Method

import requests
from pprint import pprint  # prettyprint
10 function is useful for retrieving only the meta-information written in response headers, without having to transport the entire content like the
r_attribs = [c for c in dir(r) if not c.startswith("_")]
r_attribs
7 command.

This method is often used for testing hypertext links for validity, accessibility, and recent modification. You can do this by using the

import requests
from pprint import pprint  # prettyprint
12 command with the web address and data as argument.

print(r)
8
print(r)
9

Let’s run the same as a GET request and see the difference.

<Response [200]>
0
<Response [200]>
1

Notice, we received only header content with

import requests
from pprint import pprint  # prettyprint
12. Rest of the content is ignored. So, it will save time and resource if you are interested only in the header content.

Request Header

A request header is an HTTP header that can be used in an HTTP request, and that doesn’t relate to the content of the message.

To customize headers, you pass a dictionary of HTTP headers to

r_attribs = [c for c in dir(r) if not c.startswith("_")]
r_attribs
7 using the
if r.status_code == 200:
    print('Success')
elif r.status_code == 404:
    print("Page not found")
2 parameter. Through the ‘Accept’ header, the client tells the server what content types your application can handle.

<Response [200]>
2
<Response [200]>

Inspecting the request made

When you make a request, the requests library prepares the request before actually sending it to the destination server. Request preparation includes things like validating headers and serializing JSON content.

Only after preparing the request, the request will be sent to the destination server. You can view the PreparedRequest by accessing the json.

<Response [200]>
4
<Response [200]>
5

This helps you in getting access to informations like payload, URL, headers, authentication, and more.

<Response [200]>
6
<Response [200]>
7

You can see that I have stored the ouput of the

import requests
from pprint import pprint  # prettyprint
16 in a dictionary and I can accesing different information from the dictionary separately.

Authentication

Authentication helps a service understand who you are. You provide your credentials to a server by passing data through the Authorization header or a custom header defined by the service. You need to use the

import requests
from pprint import pprint  # prettyprint
17 command to do this.

<Response [200]>
8
<Response [200]>
9

What we are doing is we are giving our data to the server by passing data through the

import requests
from pprint import pprint  # prettyprint
18 header. If we go into the
import requests
from pprint import pprint  # prettyprint
19 website, you can see that the basic authentication format for the website is of the form
import requests
from pprint import pprint  # prettyprint
20. In this the
import requests
from pprint import pprint  # prettyprint
21 and
import requests
from pprint import pprint  # prettyprint
22 will be what we have specified.

The authentication output comes out to be ‘true’ which means that our username and password is correct. If our password is wrong, we wont be getting any output for authentication.

print(r.status_code)
0
print(r.status_code)
1

You can see that if I use the wrong username, I am getting an 401 error. When you pass your username and password in a tuple to the auth parameter, requests is applying the credentials using HTTP’s Basic access authentication scheme under the hood.

Time out

When you send a request to a server, your system typically waits for a certain amount of time for the other server to respond. If this takes too much times, then there is a possibility that your system will hang.

Time-out is set to make sure that the if the website is not responding for a certain amount of time it has to stop loading. If we dont set the timeout then the website will be loading forever if the server is not responding. Use the

import requests
from pprint import pprint  # prettyprint
23 command to set the time limit in seconds.

print(r.status_code)
2

In the above case, I have set the time limit to 5 seconds.

You can also pass a tuple to timeout with the first element being a connect timeout (the timeout allows for the client to establish a connection to the server), and the second being a read timeout (the time it will wait on a response once your client has established a connection):

print(r.status_code)
3

This means that the request should establish a connection with server before 3 seconds and the data must be recieved within 7 seconds after the connection is established.

If the request times out, then the function will raise a Timeout exception. You can handle this exception as well by importing

import requests
from pprint import pprint  # prettyprint
24 from
import requests
from pprint import pprint  # prettyprint
25.

print(r.status_code)
4

SSL Certificate Vertification

SSL Certificates are small data files that digitally bind a cryptographic key to an organization’s detail. An organization needs to install the SSL Certificate onto its web server to initiate a secure session with browsers.

Once a secure connection is established, all web traffic between the web server and the web browser will be secure. If the data you are trying to receive or send is sensitive, then it is done by establishing a encrypted connection using SSL. requests library does this for you by default. If you don’t want to do this, then we can set the paramter verify to be false in the

r_attribs = [c for c in dir(r) if not c.startswith("_")]
r_attribs
7 function.