How to generate user-agent header for web requests in Python
In previous post, we saw how to modify user-agent header in wget, curl and httpie programs. In this post, I will show you how to modify user-agent header in Python’s popular requests module. There are several reasons for modify user agent, one of which is to trigger a different response from a website. Many website offer different content based on user-agent header. You can find user-agent header details here.
In Python, one of the most popular libraries to query web servers is the requests module. The requests module allows you to pass header information using the headers option –
1. Simplest use case, without a header
import requests requests.get('http://linuxfreelancer.com/status')
And this is how it is logged on the web server side, Apache in this case –
76.1.2.3 [20/May/2018:01:08:37 -0400] "GET /status HTTP/1.1" 200 359 "-" "python-requests/2.18.4" 1798
The user-agent is simply showing as “python-requests/2.18.4”, and some website might even block this to prevent web crawlers. So the next step is to modify this.
2. Modify user-agent header
headers = {'User-Agent': 'Mozilla/5.0 (Android 5.1; Tablet; rv:50.0) Gecko/50.0 Firefox/50.0'} requests.get('http://linuxfreelancer.com/status', headers=headers)
And this is what the access log entry looks like on the web server side –
76.1.2.3 [20/May/2018:01:11:29 -0400] "GET /status HTTP/1.1" 200 359 "-" "Mozilla/5.0 (Android 5.1; Tablet; rv:50.0) Gecko/50.0 Firefox/50.0" 1289
As you can see above, the user-agent entry has several identifiers which is not easy to remember. The best way would be to programatically generate valid user-agents for different platforms.
3. Generate valid user-agents
The user_agent module is used for generating random and yet valid web user agents. You can install it with ‘pip install user_agent’.
This module generate user-agent strings for differnt devices types such as desktop, smartphone and table, as well as OS types (Windows, Linux, Mac, Android …). Let us try it in a virtual environment –
virtualenv /tmp/venv source /tmp/venv/bin/activate pip install user_agent
Now run Python in an interactive mode –
import requests from user_agent import generate_user_agent In [8]: generate_user_agent() Out[8]: 'Mozilla/5.0 (Windows NT 6.2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2803.5 Safari/537.36' In [9]: generate_user_agent() Out[9]: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:49.0) Gecko/20100101 Firefox/49.0' In [11]: generate_user_agent(os='linux') Out[11]: 'Mozilla/5.0 (X11; Ubuntu; Linux i686 on x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2942.3 Safari/537.36' In [13]: generate_user_agent(device_type='tablet') Out[13]: 'Mozilla/5.0 (Linux; Android 4.4; HTC Desire 616 dual sim Build/JDQ39) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2938.21 Safari/537.36' In [14]: generate_user_agent(device_type='desktop', os='win') Out[14]: 'Mozilla/5.0 (Windows NT 6.1; rv:45.0) Gecko/20100101 Firefox/45.0' In [20]: generate_user_agent(navigator='chrome', os='linux', device_type='desktop') Out[20]: 'Mozilla/5.0 (X11; Ubuntu; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2894.26 Safari/537.36'
This allows us to generate from random to specific valid user-agent header information. We can then pass this randomly generated user-agent text to the requests module header option, and we will view our web server logs to validate –
from user_agent import generate_user_agent import requests requests.get('http://linuxfreelancer.com/status', headers={'User-Agent': generate_user_agent(navigator='firefox', os='linux')})
Log entry –
76.1.2.3 – – [20/May/2018:01:28:24 -0400] “GET /status HTTP/1.1” 200 359 “-” “Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0” 1486
References –
http://docs.python-requests.org/en/master/
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/User-Agent