https://www.jpablo128.com/how-to-detect-changes-in-a-directory-with-bash/
Sometimes it may be useful to have a script detect whether the contents in a directory have changed since the last time script was run, but excluding some of the files or directories inside. This may be used, among other things, to make backups, for example: there are situations where, instead of doing incremental backups, one might want to do a full backup, but only if there are any changes. Notice that I’m not talking about “monitoring” a directory in real time to detect changes (for that, check out inotify
).
The stat
way
To check a complete directory without excluding anything inside it, it’s probaly enough to use stat
.
You can use a script like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
|
DIR_TO_CHECK=‘/dir/to/check’
OLD_STAT_FILE=‘/home/johndoe/old_stat.txt’
if [ -e $OLD_STAT_FILE ]
then
OLD_STAT=`cat $OLD_STAT_FILE`
else
OLD_STAT=“nothing”
fi
NEW_STAT=`stat -t $DIR_TO_CHECK`
if [ “$OLD_STAT” != “$NEW_STAT” ]
then
echo ‘Directory has changed. Do something!’
# do whatever you want to do with the directory.
# update the OLD_STAT_FILE
echo $NEW_STAT > $OLD_STAT_FILE
fi
|
The way it works is quite simple. The script somehow needs to keep track of the results of running stat
on the directory from one time to the next. To do this, it just uses a text file (specified by the variable OLD_STAT_FILE): if the file exists, it will load its contents in the variable OLD_STAT, if not, it will just set OLD_STAT to any string (‘nothing’, in the example). Then, the script just runs stat
on the directory, and compares the result with the OLD_STAT value, if they differ, it means that there was some change in the directory. In such case, you can do whatever you want (backup the directory, or whatever) and then you need to update the contents of the text file where you keep the result of stat -t
.
Excluding things from the directory
If we want to check whether the directory has changed but excluding one or more files or directories inside the main directory, we can take a different route. The general process is the same: keep and “old status” of the directory and compare with the “new status”, but instead of using stat
we can use a series of commands.
Here’s the code for the new script:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
|
DIR_TO_CHECK=‘/dir/to/check’
PATH_TO_EXCLUDE=“/dir/to/check/tmp*”
OLD_SUM_FILE=‘/home/johndoe/old_sum.txt’
if [ -e $OLD_SUM_FILE ]
then
OLD_SUM=`cat $OLD_SUM_FILE`
else
OLD_SUM=“nothing”
fi
NEW_SUM=`find $DIR_TO_CHECK/* \! -path “$PATH_TO_EXCLUDE” –print0| xargs –0 du -b —time —exclude=$PATH_TO_EXCLUDE | sort -k4,4 | sha1sum | awk ‘{print $1}’`
if [ “$OLD_SUM” != “$NEW_SUM” ]
then
echo “Directory has changed. Do shomething!”
# do whatever you want to do with the directory.
# update old sum
echo $NEW_SUM > $OLD_SUM_FILE
fi
|
The structure of this script is similar to the previous one. Instead of using stat
we are using the sha1sum
of something to control whether things have changed or not. The key line in this script is where we assign NEW_SUM
, let’s disect it:
find $DIR_TO_CHECK/* \! -path "$PATH_TO_EXCLUDE" -print0
: First, we get a list of all the files in the directory withfind
. Using the parameter-path
preceded with\!
tellsfind
to exclude the specified path. You may specify more than one path repeating that same structure (\! -path $PATH_1 \! -path $PATH_2 ...
). We use -print0 to forcefind
to separate the file names with a null character, instead of a new line (we need that to pipe things properly intoxargs
).xargs -0 du -b --time --exclude=$PATH_TO_EXCLUDE
: We get a list of all those files (excluding the path or paths that we want to ignore) with their disk usage (size), and a date and time of last modification. That is, each line of the output has the form ”sort -k4,4
: We sort the previous list by the 4th field, which is the full path of the file or directory. This sorted list, in a way, reflects the “status” of the whole directory we want to check. If a file or directory is added, removed or renamed, or the size or modification time of any line in this listing changes, this listing will reflect that change.sha1sum
: A sum (for example a sha1 sum) of the previous listing is a way to encode the “status” of our directory (excluding whatever we want to exclude). So, at this point, doing thesha1sum
of the list we obtain a piece of data that encodes somehow a particular state of the directory.awk '{print $1}'
: This is done just to clean up the output ofsha1sum
.
That’s all there is to it. Each time the script is run, it calculates this key sum. If the current sum differs from the old one (that was read from the file), we can be sure that something changed in our directory.