Linux – fast file system search with locate and updatedb

Typically

find

command is the most commonly used search utility in Linux. GNU find searches the directory tree rooted at each given starting-point by evaluating the given expression from left to right, according to the rules of precedence.

There is an alternative and fast way of searching for files and directories in Linux though and that is the

locate

command, and it goes hand in had with the

updatedb

utility which keeps an indexed database of the files in your system. The locate tools simply reads the database created by updatedb.

Installation –

sudo apt-get -y install mlocate           [Debian/Ubuntu]
sudo yum -y install mlocate               [CentOS/Redhat]

updatedb is usually has a daily cron job to update the default database(‘/var/lib/mlocate/mlocate.db’). To manually update the database, you can manually run the ‘updatedb’ command first. That will take a while depending on the number of files you have on your system, the last time updatedb ran or other file related changes.

First time – update the default database, run any of the below command depending on your requirements. Most likely, the first and/or third command is what you need.

updatedb
updatedb -U /some/path      # if interested in indexing specific directory that you will search frequently.
updatedb -v                 # verbose mode

Time to search
locate command is the utility to search for entries in a mlocate database.

Some examples –

locate cron         # any file or directory with cron in its name
locate -i cron      # case insensitive
locate -c cron      # only print number of found entries
locate -r 'cron$'   # regex - only files or directories with names ending in cron.
locate -r '/usr/.*ipaddress.*whl$'   # regex for eg. /usr/share/python-wheels/ipaddress-0.0.0-py2.py3-none-any.whl

locate can also print the statistics on count of files, directories, size used by updatedb default directory.

root@cloudclient:/tmp# locate -S
Database /var/lib/mlocate/mlocate.db:
	28,339 directories
	185,661 files
	11,616,040 bytes in file names
	4,481,938 bytes used to store database

Customizing updatedb
updatedb can be customized to output the search database to a different file than the default db, in addition to this we can change the directory to index other than the default root tree. We can then tell locate to use the custom db.

In the below example, I am indexing the files under home directory in /tmp/home.db database, and then run locate to use this custom DB. As you can see the number of files and directories is way lower and thus the search much faster although since it has to scan specific directory.

$ updatedb -U ~ -o /tmp/home.db
$ locate -d /tmp/home.db cron
$ locate -d /tmp/home.db -S
Database /tmp/home.db:
	3,530 directories
	29,943 files
	2,635,675 bytes in file names
	762,621 bytes used to store database

References –

https://linux.die.net/man/8/updatedb

https://linux.die.net/man/1/locate