How to detect changes in a directory with Bash

Sometimes it may be useful to have a script detect whether the contents in a directory have changed since the last time script was run, but excluding some of the files or directories inside. This may be used, among other things, to make backups, for example: there are situations where, instead of doing incremental backups, one might want to do a full backup, but only if there are any changes. Notice that I’m not talking about “monitoring” a directory in real time to detect changes (for that, check out inotify).

The stat way

To check a complete directory without excluding anything inside it, it’s probaly enough to use stat.

You can use a script like this:

The way it works is quite simple. The script somehow needs to keep track of the results of running stat on the directory from one time to the next. To do this, it just uses a text file (specified by the variable OLD_STAT_FILE): if the file exists, it will load its contents in the variable OLD_STAT, if not, it will just set OLD_STAT to any string (‘nothing’, in the example). Then, the script just runs stat on the directory, and compares the result with the OLD_STAT value, if they differ, it means that there was some change in the directory. In such case, you can do whatever you want (backup the directory, or whatever) and then you need to update the contents of the text file where you keep the result of stat -t.

Excluding things from the directory

If we want to check whether the directory has changed but excluding one or more files or directories inside the main directory, we can take a different route. The general process is the same: keep and “old status” of the directory and compare with the “new status”, but instead of using stat we can use a series of commands.

Here’s the code for the new script:

The structure of this script is similar to the previous one. Instead of using stat we are using the sha1sum of something to control whether things have changed or not. The key line in this script is where we assign NEW_SUM, let’s disect it:

  • find $DIR_TO_CHECK/* \! -path "$PATH_TO_EXCLUDE" -print0: First, we get a list of all the files in the directory withfind. Using the parameter -path preceded with \! tells find to exclude the specified path. You may specify more than one path repeating that same structure (\! -path $PATH_1 \! -path $PATH_2 ...). We use -print0 to force findto separate the file names with a null character, instead of a new line (we need that to pipe things properly intoxargs).
  • xargs -0 du -b --time --exclude=$PATH_TO_EXCLUDE: We get a list of all those files (excluding the path or paths that we want to ignore) with their disk usage (size), and a date and time of last modification. That is, each line of the output has the form ” 
  • sort -k4,4: We sort the previous list by the 4th field, which is the full path of the file or directory. This sorted list, in a way, reflects the “status” of the whole directory we want to check. If a file or directory is added, removed or renamed, or the size or modification time of any line in this listing changes, this listing will reflect that change.
  • sha1sum: A sum (for example a sha1 sum) of the previous listing is a way to encode the “status” of our directory (excluding whatever we want to exclude). So, at this point, doing the sha1sum of the list we obtain a piece of data that encodes somehow a particular state of the directory.
  • awk '{print $1}': This is done just to clean up the output of sha1sum.

That’s all there is to it. Each time the script is run, it calculates this key sum. If the current sum differs from the old one (that was read from the file), we can be sure that something changed in our directory.