Getting the URLs in your favorites or bookmarks as a plain list.

I have tons of pages that i bookmarked in my Firefox browser in a Linux box and wanted to get a simple listing of these URLs with titles.

1. Export books marks to a JSON file
2. Extract JSON file to get a simple list

1. How to Export bookmars in Firefox as JSON.
Go to Bookmarks menu
Show All Bookmarks
Import and Backup (click the down arrow to expand it)
Backup
Save (Make sure JSON is selected at the right bottom corner)

The file will be saved something like ‘bookmarks-2013-12-07.json’, the format is ‘bookmarks-yyyy-mm-dd.json’. Write down the path where you saved this file, we will need it for the next step.

2. Get a simple list out of the JSON format file

We are going to use the json module for python to load the file into a python list object and print the lines containing URLs. Make sure you set the ‘bookmarks_path’ variable to the path where you saved the bookmarks file.


#!/usr/bin/env python
'''extract a list of URLs from Firefox exported bookmars JSON file '''

import sys
import os
import json
import io

def Usage():
    print "{0} Path-to-bookmarks-file".format(sys.argv[0])
    sys.exit(1)

if len(sys.argv) < 2:
    Usage()

bookmark_file = sys.argv[1]

#Does the file exist?
if not os.path.isfile(bookmark_file):
    print "{0} not found.".format(bookmark_file)
    sys.exit(1)

# Load JSON file
fp_data = io.open(bookmark_file, encoding='utf-8')
try:
    jdata = json.load(fp_data)
except ValueError:
    print "{0} not valid JSON file".format(bookmark_file)
    sys.exit(1)
fp_data.close()


#Recursive function to get the title and URL keys from JSON file

def grab_keys(bookmarks_data, bookmarks_list=[]):
  if 'children' in bookmarks_data:
    for item in bookmarks_data['children']:
      bookmarks_list.append({'title': item.get('title', 'No title'),
                             'uri': item.get('uri', 'None')})
      grab_keys(item, bookmarks_list)
  return bookmarks_list


def main():
  mydata=grab_keys(jdata)
  for item in mydata:
    myurl = item['uri']
    if myurl.startswith('http') or myurl.startswith('ftp'):
      print item['uri'], "  ", item['title']

if __name__=="__main__":
  main()

Save this file, say as ‘get_bookmars.py’, and running it will give an output similar to the one below –

[root@localhost]# python get_bookmarks.py
https://www.google.com/ Google
https://aws.amazon.com/ Amazon Web Services, Cloud Computing: Compute, Storage, Database
http://docs.python.org/3/py-modindex.html Python Module Index รข Python v3.3.3 documentation
http://www.linuxhomenetworking.com/wiki/#.UqMjHddn21E Linux Home Networking
http://www.zytrax.com/books/dns/ DNS for Rocket Scientists - Contents
http://www.centos.org/ Centos
http://wiki.centos.org/ Wiki
http://www.centos.org/docs/6/ Documentation
http://www.centos.org/modules/newbb/ Forums

Another way of approaching the problem is to export the bookmarks as HTML file and then dump it as text file. Here I used ‘lynx’ (Install it using ‘yum install lynx’ in CentOS/RHEL/Fedora) to dump the file and grepped for the URLs –

[root@localhost]# lynx –dump bookmarks.html | egrep ‘[0-9]+\.[[:space:]]+http’
3. https://www.google.com/
4. https://aws.amazon.com/
5. http://docs.python.org/3/py-modindex.html
6. http://www.linuxhomenetworking.com/wiki/#.UqMjHddn21E
7. http://www.zytrax.com/books/dns/
9. http://www.centos.org/
10. http://wiki.centos.org/
11. http://www.centos.org/docs/6/
12. http://www.centos.org/modules/newbb/

[root@localhost]# lynx –dump bookmarks.html | egrep ‘[0-9]+\.[[:space:]]+http’ | awk ‘{print $2}’
https://www.google.com/
https://aws.amazon.com/
http://docs.python.org/3/py-modindex.html
http://www.linuxhomenetworking.com/wiki/#.UqMjHddn21E
http://www.zytrax.com/books/dns/
http://www.centos.org/
http://wiki.centos.org/
http://www.centos.org/docs/6/
http://www.centos.org/modules/newbb/