Archive for the ‘ Linux ’ Category

This is my first attempt in response to a question posed in one of StackExchange sites for Unix/Linux – How do you compare two folders and copy the difference to a third folder?. The scripts compares the latest directory, given as argument one, to an old directory, argument two, and creates a difference directory if it doesn’t exist, third argument, and copies the files and directories which exist only in latest directory into the difference directory. It also copies files which are different in latest directory as compared to the old one, to the difference directory. Make sure to put the arguments in the right order – latest directory first, old directory next, and the difference directory last.

Sample usage:

daniel@linubuvma:~/scripts/python$ python copy_difference.py /tmp/test/current /tmp/test/old /tmp/test/difference

(Silent output is good).

daniel@linubuvma:~/practice/python$ ls -1R /tmp/test/current/
/tmp/test/current/:
dirc
extra
newone
one
three
two

/tmp/test/current/dirc:

/tmp/test/current/extra:
extra2
fourth

/tmp/test/current/extra/extra2:

/tmp/test/current/newone:
file2
fileone
daniel@linubuvma:~/practice/python$ ls -1R /tmp/test/old
/tmp/test/old:
extra
newone
one
two

/tmp/test/old/extra:

/tmp/test/old/newone:
file2
daniel@linubuvma:~/practice/python$ ls -1R /tmp/test/difference
ls: cannot access /tmp/test/difference: No such file or directory
daniel@linubuvma:~/practice/python$ python copy_difference.py /tmp/test/current /tmp/test/old /tmp/test/difference
daniel@linubuvma:~/practice/python$ ls -1R /tmp/test/difference
/tmp/test/difference:
extra
newone
three
two

/tmp/test/difference/extra:
fourth

/tmp/test/difference/newone:
fileone

Here is the Python script.


#!/usr/bin/env python

import os, sys
import filecmp
import re
import shutil
holderlist=[]

def compareme(dir1, dir2):
    dircomp=filecmp.dircmp(dir1,dir2)
    only_in_one=dircomp.left_only
    diff_in_one=dircomp.diff_files
    dirpath=os.path.abspath(dir1)
    [holderlist.append(os.path.abspath( os.path.join(dir1,x) )) for x in only_in_one]
    [holderlist.append(os.path.abspath( os.path.join(dir1,x) )) for x in diff_in_one]
    if len(dircomp.common_dirs) > 0:
        for item in dircomp.common_dirs:
            compareme(os.path.abspath(os.path.join(dir1,item)), os.path.abspath(os.path.join(dir2,item)))
        return holderlist

def main():
 if len(sys.argv) > 3:
   dir1=sys.argv[1]
   dir2=sys.argv[2]
   dir3=sys.argv[3]
 else:
   print "Usage: ", sys.argv[0], "currentdir olddir difference"
   sys.exit(1)

 if not dir3.endswith('/'): dir3=dir3+'/'

 source_files=compareme(dir1,dir2)
 dir1=os.path.abspath(dir1)
 dir3=os.path.abspath(dir3)
 destination_files=[]
 new_dirs_create=[]
 for item in source_files:
   destination_files.append(re.sub(dir1, dir3, item) )
 for item in destination_files:
  new_dirs_create.append(os.path.split(item)[0])
 for mydir in set(new_dirs_create):
   if not os.path.exists(mydir): os.makedirs(mydir)
#copy pair
 copy_pair=zip(source_files,destination_files)
 for item in copy_pair:
   if os.path.isfile(item[0]):
    shutil.copyfile(item[0], item[1])

if __name__ == '__main__':
 main()

Getting the URLs in your favorites or bookmarks as a plain list.

I have tons of pages that i bookmarked in my Firefox browser in a Linux box and wanted to get a simple listing of these URLs with titles.

1. Export books marks to a JSON file
2. Extract JSON file to get a simple list

1. How to Export bookmars in Firefox as JSON.
Go to Bookmarks menu
Show All Bookmarks
Import and Backup (click the down arrow to expand it)
Backup
Save (Make sure JSON is selected at the right bottom corner)

The file will be saved something like ‘bookmarks-2013-12-07.json’, the format is ‘bookmarks-yyyy-mm-dd.json’. Write down the path where you saved this file, we will need it for the next step.

2. Get a simple list out of the JSON format file

We are going to use the json module for python to load the file into a python list object and print the lines containing URLs. Make sure you set the ‘bookmarks_path’ variable to the path where you saved the bookmarks file.


#!/usr/bin/env python
'''extract a list of URLs from Firefox exported bookmars JSON file '''

import sys
import os
import json
import io

def Usage():
    print "{0} Path-to-bookmarks-file".format(sys.argv[0])
    sys.exit(1)

if len(sys.argv) < 2:
    Usage()

bookmark_file = sys.argv[1]

#Does the file exist?
if not os.path.isfile(bookmark_file):
    print "{0} not found.".format(bookmark_file)
    sys.exit(1)

# Load JSON file
fp_data = io.open(bookmark_file, encoding='utf-8')
try:
    jdata = json.load(fp_data)
except ValueError:
    print "{0} not valid JSON file".format(bookmark_file)
    sys.exit(1)
fp_data.close()


#Recursive function to get the title and URL keys from JSON file

def grab_keys(bookmarks_data, bookmarks_list=[]):
  if 'children' in bookmarks_data:
    for item in bookmarks_data['children']:
      bookmarks_list.append({'title': item.get('title', 'No title'),
                             'uri': item.get('uri', 'None')})
      grab_keys(item, bookmarks_list)
  return bookmarks_list


def main():
  mydata=grab_keys(jdata)
  for item in mydata:
    myurl = item['uri']
    if myurl.startswith('http') or myurl.startswith('ftp'):
      print item['uri'], "  ", item['title']

if __name__=="__main__":
  main()

Save this file, say as ‘get_bookmars.py’, and running it will give an output similar to the one below –

[root@localhost]# python get_bookmarks.py
https://www.google.com/ Google
https://aws.amazon.com/ Amazon Web Services, Cloud Computing: Compute, Storage, Database
http://docs.python.org/3/py-modindex.html Python Module Index â Python v3.3.3 documentation
http://www.linuxhomenetworking.com/wiki/#.UqMjHddn21E Linux Home Networking
http://www.zytrax.com/books/dns/ DNS for Rocket Scientists - Contents
http://www.centos.org/ Centos
http://wiki.centos.org/ Wiki
http://www.centos.org/docs/6/ Documentation
http://www.centos.org/modules/newbb/ Forums

Another way of approaching the problem is to export the bookmarks as HTML file and then dump it as text file. Here I used ‘lynx’ (Install it using ‘yum install lynx’ in CentOS/RHEL/Fedora) to dump the file and grepped for the URLs –

[root@localhost]# lynx –dump bookmarks.html | egrep ‘[0-9]+\.[[:space:]]+http’
3. https://www.google.com/
4. https://aws.amazon.com/
5. http://docs.python.org/3/py-modindex.html
6. http://www.linuxhomenetworking.com/wiki/#.UqMjHddn21E
7. http://www.zytrax.com/books/dns/
9. http://www.centos.org/
10. http://wiki.centos.org/
11. http://www.centos.org/docs/6/
12. http://www.centos.org/modules/newbb/

[root@localhost]# lynx –dump bookmarks.html | egrep ‘[0-9]+\.[[:space:]]+http’ | awk ‘{print $2}’
https://www.google.com/
https://aws.amazon.com/
http://docs.python.org/3/py-modindex.html
http://www.linuxhomenetworking.com/wiki/#.UqMjHddn21E
http://www.zytrax.com/books/dns/
http://www.centos.org/
http://wiki.centos.org/
http://www.centos.org/docs/6/
http://www.centos.org/modules/newbb/

In order to use this script, you need to do certain things in advance –

1. Download youtube-dl, a script which allows you to download videos

https://github.com/rg3/youtube-dl

2. Install ffmpet: an audio/video conversion tool.
Ubuntu users can run the following commands –

  apt-get install ffmpeg libavcodec-extra-53

Note: More details can be found here.

Usage Example: –

 ./musicdownloader.sh http://www.youtube.com/watch?v=8tHu-OwzwPg BereketMengstead-mizerey.mp3
#!/bin/bash

downloader=`which youtube-dl`
ffmpeg=`which ffmpeg`
bitrate=192000

ARGC=$#
LINK=$1
FILENAME=$2
SAVEDFILE=$(basename $0)_mymusic123.mp4

if [ $ARGC -ne 2 ]; then
  echo "Usage: $(basename $0) url-link output-file"
  echo "Example: $(basename  $0)  http://www.youtube.com/watch?v=fQZNiMckKbI Azmari-ethio01.mp3"
  exit
fi

$downloader -f 18 $LINK -o $SAVEDFILE  &&  $ffmpeg -i $SAVEDFILE -f mp3 -ab $bitrate -vn $FILENAME

if [ $? -eq 0 ];
then
 echo "File saved in " $FILENAME
 rm $SAVEDFILE
fi

Recently I was trying to download numerous files from a certain website using a shell script I wrote. With in the script, first I used wget to retrieve the files, but I kept on getting the following error message –

HTTP request sent, awaiting response... 403 Forbidden
2012-12-30 06:17:45 ERROR 403: Forbidden.

Then hoping that this was just a wget problem, I replaced wget with curl. It turned out that Curl would actually create a file with the same name as the one being download, but to my surprise the file was not downloaded. Instead, it contained an html file with 403 Forbidden message.

403 Forbidden
Forbidden

You don't have permission to access /dir/names.txt on this server.

What was surprising is that I could download the files using Firefox, Internet Explorer, elinks and even text based browser ‘lynx’. It seems that the website was blocking access from client browsers with certain ‘User-Agent’ header field. So the trick was to simply modify the User-Agent to a ‘legitimate’ one. Both curl and wget support the altering of User-Agent header field. You can use below commands to change the User-Agent parameter –

USER_AGENT="User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.12) Gecko/20101026 Firefox/3.6.12"

wget --user-agent="$USER_AGENT" -c http://linuxfreelancer.com/status.html

curl -A "$USER_AGENT" -O http://linuxfreelancer.com/status.html

In addition to wget or curl, a much easier to use CLI HTTP client httpie can be used. Passing custom HTTP headers is intuitive using httpie, installation and usage details can be accessed here. Modifying the User-Agent header using httpie is shown below –

USER_AGENT="User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.12) Gecko/20101026 Firefox/3.6.12"
http http://linuxfreelancer.com/ "$USER_AGENT"

All commands –

USER_AGENT="User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.12) Gecko/20101026 Firefox/3.6.12"
wget --user-agent="$USER_AGENT" -c http://linuxfreelancer.com/status.html

curl -A "$USER_AGENT" -O http://linuxfreelancer.com/status.html

http http://linuxfreelancer.com/ "$USER_AGENT"

View all posts in this blog – https://linuxfreelancer.com/all-posts

Amazon provides extensive tools to manage virtual machines hosted on Amazon web services(AWS). It is very easy to launch VMs, and a lot easier to destroy or terminate VMs! It might be unintentional in the later case, with just one mis-click and a second confirmation you could end up terminating a critical production server. There is now way of bringing back a terminated VM in AWS. Once it is gone, it is gone forever.

So what steps should you follow to prevent unintended data loss?

1. Make sure the virtual machines are properly labelled in the EC2 dashboard – under “Name”. This can be done by simply right clicking a VM, and selecting “Add/Edit Tags”. If you have so many servers without proper tags, you might unintentionally terminate the wrong server.

2. Enable Termination Protection – Right click on the VM and select “Termination Protection”. Make sure the Termination protection is Enabled. If by any chance you decide to terminate your VM, you have to disable the termination protection on this option, and then go back to the dashboard to terminate your VM.

Extract MP3 from Youtube

Download audio in mp3 format from Youtube

Got your favorite youtube video and yet you don’t have it in an audio format such as mp3 to play it offline? With open source tools, you can grab that video and convert it to mp3 at no cost.

Prepare a directory for downloading mp4 format files from youtube.

mkdir /home/youtube

Tools you need

1. youtube-dl: A python script to download videos from youtube – http://rg3.github.io/youtube-dl/download.html

Using curl –

sudo curl -L https://yt-dl.org/downloads/latest/youtube-dl -o /usr/local/bin/youtube-dl
sudo chmod a+rx /usr/local/bin/youtube-dl

Using wget –

sudo wget https://yt-dl.org/downloads/latest/youtube-dl -O /usr/local/bin/youtube-dl
sudo chmod a+rx /usr/local/bin/youtube-dl

Using pip –

sudo pip install --upgrade youtube_dl

2. ffmpeg: an audio/video conversion tool.

 apt-get install ffmpeg libavcodec-extra-53

Procedure
1. Make the youtube link ready for download and then download the mp4 using youtube-pl script
2. Use ffmpeg tools to convert mp4 to mp3

Sample download

Let us download the following youtube link

youtube-dl -f 18 -t http://www.youtube.com/watch?v=dAG2qxvYwsY

options: -f is for file format of the youtube video (check youtube-dl documentation for the whole list)

Next, convert it to mp3

ffmpeg -i Freselam_Mussie_s_Tsinih_Zeytibli-dAG2qxvYwsY.mp4 -f mp3 -ab 192000 -vn Freselam_Mussie_s_Tsinih_Zeytibli-dAG2qxvYwsY.mp3

options: -i for input
         -f for output file format
         -ab for bit rate
         -vn for Disable video recording.

References –

http://rg3.github.com/youtube-dl/

http://rg3.github.io/youtube-dl/download.html