Recently I was trying to download numerous files from a certain website using a shell script I wrote. With in the script, first I used wget to retrieve the files, but I kept on getting the following error message –
HTTP request sent, awaiting response... 403 Forbidden
2012-12-30 06:17:45 ERROR 403: Forbidden.
Then hoping that this was just a wget problem, I replaced wget with curl. It turned out that Curl would actually create a file with the same name as the one being download, but to my surprise the file was not downloaded. Instead, it contained an html file with 403 Forbidden message.
403 Forbidden
Forbidden
You don't have permission to access /dir/names.txt on this server.
What was surprising is that I could download the files using Firefox, Internet Explorer, elinks and even text based browser ‘lynx’. It seems that the website was blocking access from client browsers with certain ‘User-Agent’ header field. So the trick was to simply modify the User-Agent to a ‘legitimate’ one. Both curl and wget support the altering of User-Agent header field. You can use below commands to change the User-Agent parameter –
USER_AGENT="User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.12) Gecko/20101026 Firefox/3.6.12" wget --user-agent="$USER_AGENT" -c http://linuxfreelancer.com/status.html curl -A "$USER_AGENT" -O http://linuxfreelancer.com/status.htmlIn addition to wget or curl, a much easier to use CLI HTTP client httpie can be used. Passing custom HTTP headers is intuitive using httpie, installation and usage details can be accessed here. Modifying the User-Agent header using httpie is shown below –
USER_AGENT="User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.12) Gecko/20101026 Firefox/3.6.12" http http://linuxfreelancer.com/ "$USER_AGENT"All commands –
USER_AGENT="User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.12) Gecko/20101026 Firefox/3.6.12" wget --user-agent="$USER_AGENT" -c http://linuxfreelancer.com/status.html curl -A "$USER_AGENT" -O http://linuxfreelancer.com/status.html http http://linuxfreelancer.com/ "$USER_AGENT"
View all posts in this blog – https://linuxfreelancer.com/all-posts