FTP IIS Passive



Using Windows Firewall with non-secure FTP traffic

To configure Windows Firewall to allow non-secure FTP traffic, use the following steps:

  1. Open a command prompt: click Start, then All Programs, then Accessories, then Command Prompt.
  2. To open port 21 on the firewall, type the following syntax then hit enter:

    netsh advfirewall firewall add rule name="FTP (non-SSL)" action=allow protocol=TCP dir=in localport=21
  3. To enable stateful FTP filtering that will dynamically open ports for data connections, type the following syntax then hit enter:
    netsh advfirewall set global StatefulFtp enable

Important Notes:

  • Active FTP connections would not necessarily covered by the above rules; an outbound connection from port 20 would also need to be enabled on server. In addition, the FTP client machine would need to have its own firewall exceptions setup for inbound traffic.
  • FTP over SSL (FTPS) will not be covered by these rules; the SSL negotiation will most likely fail because the Windows Firewall filter for stateful FTP inspection will not be able to parse encrypted data. (Some 3rd-party firewall filters recognize the beginning of SSL negotiation, e.g. AUTH SSL or AUTH TLS commands, and return an error to prevent SSL negotiation from starting.)

FPT passive /Data port channel

netsh int ipv4 set dynamicport tcp start=10000 num=1000

LXD vs. Docker — or: getting started with LXD Containers


Container technology is not new: it had existed long before the Docker hype around container technology has started after 2013. Now, with Docker containers having reached mainstream usage, there is a high potential of getting confused about available container types like Docker, LXC, LXD and CoreOS rocket. In this blog post we will explain, why LXD is not competing with Docker.

We will show in a few steps how to install and run LXC containers using LXD container management functions. For that, we will make use of an automated installation process based on Vagrant and VirtualBox.

What is LXD and why not using it as a Docker Replacement?

After working with Docker for quite some time, I have stumbled over another container technology: Ubuntu’s LXD (say “lex-dee”). What is the difference to Docker, and do they really compete with each other, as an article in the German “Linux Magazin” (May 2015) states?

As the developers of LXD point out, a main difference between Docker and LXD is that Docker focuses on application delivery from development to production, while LXD’s focus is system containers. This is, why LXD is more likely to compete with classical hypervisors like XEN and KVM and it is less likely to compete with Docker.

Ubuntu’s web page points out that LXD’s main goal is to provide a user experience that’s similar to that of virtual machines but using Linux containers rather than hardware virtualization.

For providing a user experience that is similar to that of virtual machines, Ubuntu integrates LXD with OpenStack through its REST API.  Although there are attempts to integrate Docker with OpenStack (project Magnum), Ubuntu comes much closer to feature parity with real hypervisors like XEN and KVM by offering features like snapshots and live migration. As any container technology, LXD offers a much lower resource footprint than virtual machines: this is, why LXD is sometimes called lightervisor.

One of the main remaining concerns of IT operations teams against usage of the container technology is that containers leave a “larger surface” to attackers than virtual machines. Canonical, the creators of Ubuntu and LXD is tackling security concerns by making LXD-based containers secure by default. Still, any low level security feature developed for LXC potentially is available for both, Docker and LXD, since they are based on LXC technology.

What does this mean for us?

We have learned that Docker offers a great way to deliver applications, while LXD offers a great way to reduce the footprint of virtual-machine-like containers. What, if you want to leverage the best of both worlds? One way is to run Docker containers within LXD containers. This and its current restrictions are described in this blog post of Stéphane Graber.

Okay; one step after the other: let us postpone the Docker in LXD discussion and let us get started with LXD now.

LXD Getting Started: a Step by Step Guide

This chapter largely follows this getting started web page. However, instead of trying to be complete, we will go through a simple end to end example. Moreover, we will add some important commands found on this nice LXD cheat sheet. In addition, we will explicitly record the example output of the commands.


  • Administration rights on you computer.
  • I have performed my tests with direct access to the Internet: via a Firewall, but without HTTP proxy. However, if you cannot get rid of your HTTP proxy, read this blog post.

Step 1: Install VirtualBox

If not already done, you need to install VirtualBox found here. See appendix A, if you encounter installation problems on Windows with the error message “Setup Wizard ended prematurely”. For my tests, I am using the already installed VirtualBox 5.0.20 r106931 on Windows 10.

Step 2: Install Vagrant

If not already done, you need to install Vagrant found here. For my tests, I am using an already installed Vagrant version 1.8.1 on my Windows 10 machine.

Step 3: Initialize and download an Ubuntu 16.0.4 Vagrant Box

In a future blog post, we want to test Docker in LXD containers. This is supported in Ubuntu 16.0.4 and higher. Therefore, we download the latest daily build of the corresponding Vagrant box. As a preparation, we create a Vagrantfile in a separate directory by issuing the following command:

vagrant init ubuntu/xenial64

You can skip the next command and directly run the vagrant up command, if you wish, since the box will downloaded automatically, if nocurrent version of the Vagrant box is found locally. However, I prefer to download the box first, and run the box later, since it is easier to observe, what happens during the boot.

vagrant box add ubuntu/xenial64

Depending on the speed of your Internet connection, you can take a break here.

Step 4: Boot the Vagrant Box as VirtualBox Image and connect to it

Then, we will boot the box with:

vagrant up

Note: if you encounter an error message like “VT-x is not available”, this may be caused by booting Windows 10 with Hyper-V enabled or by nested virtualization. According to this stackoverflow Q&A, running Virtualbox without VT-x is possible, if you make sure that the number of CPUs is one. For that, try to set vb.cpus = 1 in the Vagrantfile. Remove any statement like vb.customize ["modifyvm", :id, "--cpus", "2"] in the Vagrantfile. If you prefer to use VT-x on your Windows 10 machine, you need to disable Hyper-V. The Appendix: “Error message: “VT-x is not available” describes how to add a boot menu item that allows to boot without Hyper-V enabled.

Now let us connect to the machine:

vagrant ssh

Step 5: Install and initialize LXD

Now we need to install LXD on the Vagrant image by issuing the commands

sudo apt-get update
sudo apt-get install -y lxd
newgrp lxd

Now we need to initialize LXD with the lxd init interactive command:

ubuntu@ubuntu-xenial:~$ sudo lxd init
sudo: unable to resolve host ubuntu-xenial
Name of the storage backend to use (dir or zfs) [default=zfs]: dir
Would you like LXD to be available over the network (yes/no) [default=no]? yes
Address to bind LXD to (not including port) [default=]:
Port to bind LXD to [default=8443]:
Trust password for new clients:
Do you want to configure the LXD bridge (yes/no) [default=yes]? no
LXD has been successfully configured.

I have decided to use dir as storage (since zfs was not enabled), have configured the LXD server to be available via the default network port 8443, and I have chosen to start without LXD bridge, since this article was pointing out that the LXD bridge does not allow SSH connections per default.

The configuration is written to a key value store and, which can be read with the lxc config getcommands, e.g.

ubuntu@ubuntu-xenial:~$ lxc config get core.https_address

The list of available system config keys can be found on this Git-hosted document. However, I have not found the storage backend type “dir”, I have configured. I guess, the system assumes that “dir” is used as long as the zfs and lvm variables are not set. Also, it is a little bit confusing that we configure LXD, but the config is read out via LXC commands.

Step 6: Download and start an LXC Image

Step 6.1 (optional): List remote LXC Repository Servers:

The images are stored on image repositories. Apart from the local repository, the default repositories have aliases imagesubuntu and ubuntu-daily:

ubuntu@ubuntu-xenial:~$ sudo lxc remote list
sudo: unable to resolve host ubuntu-xenial
|      NAME       |                   URL                    |   PROTOCOL    | PUBLIC | STATIC |
| images          | https://images.linuxcontainers.org       | simplestreams | YES    | NO     |
| local (default) | unix://                                  | lxd           | NO     | YES    |
| ubuntu          | https://cloud-images.ubuntu.com/releases | simplestreams | YES    | YES    |
| ubuntu-daily    | https://cloud-images.ubuntu.com/daily    | simplestreams | YES    | YES    |
Step 6.2 (optional): List remote LXC Images:

List all available ubuntu images for amd64 systems on the images repository:

ubuntu@ubuntu-xenial:~$ sudo lxc image list images: amd64 ubuntu
sudo: unable to resolve host ubuntu-xenial
|          ALIAS          | FINGERPRINT  | PUBLIC |              DESCRIPTION              |  ARCH  |  SIZE   |         UPLOAD DATE          |
| ubuntu/precise (3 more) | adb92b46d8fc | yes    | Ubuntu precise amd64 (20160906_03:49) | x86_64 | 77.47MB | Sep 6, 2016 at 12:00am (UTC) |
| ubuntu/trusty (3 more)  | 844bbb45f440 | yes    | Ubuntu trusty amd64 (20160906_03:49)  | x86_64 | 77.29MB | Sep 6, 2016 at 12:00am (UTC) |
| ubuntu/wily (3 more)    | 478624089403 | yes    | Ubuntu wily amd64 (20160906_03:49)    | x86_64 | 85.37MB | Sep 6, 2016 at 12:00am (UTC) |
| ubuntu/xenial (3 more)  | c4804e00842e | yes    | Ubuntu xenial amd64 (20160906_03:49)  | x86_64 | 80.93MB | Sep 6, 2016 at 12:00am (UTC) |
| ubuntu/yakkety (3 more) | c8155713ecdf | yes    | Ubuntu yakkety amd64 (20160906_03:49) | x86_64 | 79.16MB | Sep 6, 2016 at 12:00am (UTC) |

Instead of the “ubuntu” filter keyword in the image list command above, you can use any filter expression. E.g. sudo lxc image list images: amd64 suse will find OpenSuse images available for x86_64.

Step 6.3 (optional): Copy remote LXC Image to local Repository:

This command is optional, since the download will be done automatically with the lxc launch command below, if the image is not found on the local repository already.

lxc image copy images:ubuntu/trusty local:
Step 6.4 (optional): List local LXC Images:

We can list the locally stored images with the following image list command. If you have not skipped the last step, you will find following output:

ubuntu@ubuntu-xenial:~$ sudo lxc image list
sudo: unable to resolve host ubuntu-xenial
| ALIAS | FINGERPRINT  | PUBLIC |                DESCRIPTION                |  ARCH  |   SIZE   |         UPLOAD DATE          |
|       | 844bbb45f440 | no     | Ubuntu trusty amd64 (20160906_03:49)      | x86_64 | 77.29MB  | Sep 6, 2016 at 5:04pm (UTC)  |
Step 6.5 (mandatory): Launch LXC Container from Image

With the lxc launch command, a container is created from the image. Moreover, if the image is not available in the local repository, it will automatically download the image.

ubuntu@ubuntu-xenial:~$ lxc launch images:ubuntu/trusty myTrustyContainer
Creating myTrustyContainer
Retrieving image: 100%
Starting myTrustyContainer

If the image is already in the local repository, the Retrievien image line is missing and the container can start within seconds (~6-7 sec in my case).

Step 7 (optional): List running Containers

We can list the running containers with the lxc list command, similar to a docker ps -a for those, who know Docker:

ubuntu@ubuntu-xenial:~$ lxc list
|       NAME        |  STATE  | IPV4 | IPV6 |    TYPE    | SNAPSHOTS |
| myTrustyContainer | RUNNING |      |      | PERSISTENT | 0         |

Step 8: Run a Command on the LXC Container

Now we are ready to run our first command on the container:

ubuntu@ubuntu-xenial:~$ sudo lxc exec myTrustyContainer ls /
sudo: unable to resolve host ubuntu-xenial
bin  boot  dev  etc  home  lib  lib64  media  mnt  opt  proc  root  run  sbin  srv  sys  tmp  usr  var

Step 9: Log into and exit the LXC Container

We can log into the container just by running the shell with the lxc exec command:

ubuntu@ubuntu-xenial:~$ sudo lxc exec myTrustyContainer bash
sudo: unable to resolve host ubuntu-xenial
root@myTrustyContainer:~# exit

The container can be exited just by issuing the “exit” command. Different from Docker containers, this will not stop the container.

Step 10: Stop the LXC Container

The container can be stopped with the sudo lxc stop command:

ubuntu@ubuntu-xenial:~$ sudo lxc stop myTrustyContainer
sudo: unable to resolve host ubuntu-xenial
ubuntu@ubuntu-xenial:~$ lxc list
|       NAME        |  STATE  | IPV4 | IPV6 |    TYPE    | SNAPSHOTS |
| myTrustyContainer | STOPPED |      |      | PERSISTENT | 0         |


We have discussed the differences of Docker and LXD. We have found that Docker focuses on application delivery, while LXD seeks to offer Linux virtual environments as systems.

We have provided the steps needed to get started with LXD by showing how to

  • install the software,
  • download images,
  • start containers from the images and
  • running simple Linux commands on the images.

Next steps:

Here is a list of possible next steps on the path to Docker in LXC:

  • Networking
  • Docker in LXC container
  • LXD: Integration into OpenStack
  • Put it all together

Appendix A: VirtualBox Installation Problems: “Setup Wizard ended prematurely”

  • Download the VirtualBox installer
  • When I start the installer, everything seems to be on track until I see “rolling back action” and I finally get this:
    “Oracle VM Virtualbox x.x.x Setup Wizard ended prematurely”

Resolution of the “Setup Wizard ended prematurely” Problem

Let us try to resolve the problem: the installer of Virtualbox downloaded from Oracle shows the exact same error: “…ended prematurely”. This is not a docker bug. Playing with conversion tools from Virtualbox to VMware did not lead to the desired results.

The Solution: Google is your friend: the winner is: https://forums.virtualbox.org/viewtopic.php?f=6&t=61785. After backing up the registry and changing the registry entry

HKEY_LOCAL_MACHINE\SYSTEM\ControlSet001\Control\Network -> MaxFilters from 8 to 20 (decimal)

and a reboot of the Laptop, the installation of Virtualbox is successful.

Note: while this workaround has worked on my Windows 7 notebook, it has not worked on my new Windows 10 machine. However, I have managed to install VirtualBox on Windows 10 by de-selecting the USB support module during the VirtualBox installation process. I remember having seen a forum post pointing to that workaround, with the additional information that the USB drivers were installed automatically at the first time a USB device was added to a host (not yet tested on my side).

Appendix B: Vagrant VirtualBox Error message: “VT-x is not available”


If you get an error message during vagrant up telling you that VT-x is not available, a reason may be that you have enabled Hyper-V on your Windows 10 machine: VirtualBox and Hyper-V cannot share the VT-x CPU:

$ vagrant up
Bringing machine 'default' up with 'virtualbox' provider...
==> default: Checking if box 'thesteve0/openshift-origin' is up to date...
==> default: Clearing any previously set network interfaces...
==> default: Preparing network interfaces based on configuration...
 default: Adapter 1: nat
 default: Adapter 2: hostonly
==> default: Forwarding ports...
 default: 8443 (guest) => 8443 (host) (adapter 1)
 default: 22 (guest) => 2222 (host) (adapter 1)
==> default: Running 'pre-boot' VM customizations...
==> default: Booting VM...
There was an error while executing `VBoxManage`, a CLI used by Vagrant
for controlling VirtualBox. The command and stderr is shown below.

Command: ["startvm", "8ec20c4c-d017-4dcf-8224-6cf530ee530e", "--type", "headless"]

Stderr: VBoxManage.exe: error: VT-x is not available (VERR_VMX_NO_VMX)
VBoxManage.exe: error: Details: code E_FAIL (0x80004005), component ConsoleWrap, interface IConsole


Step 1: prepare your Windows machine for dual boot with and without Hyper-V

As Administrator, open a CMD and issue the commands

bcdedit /copy "{current}" /d "Hyper-V" 
bcdedit /set "{current}" hypervisorlaunchtype off
bcdedit /set "{current}" description "non Hyper-V"

Step 2: Reboot the machine and choose the “non Hyper-V” option.

Now, the vagrant up command should not show the “VT-x is not available” error message anymore.

S3cmd S3 Sync How-To


Program S3cmd can transfer files to and from Amazon S3 in two basic modes:

  1. Unconditional transfer — all matching files are uploaded to S3 (put operation) or downloaded back from S3 (get operation). This is similar to a standard unix cp command that also copies whatever it’s told to.
  2. Conditional transfer — only files that don’t exist at the destination in the same version are transferred by the s3cmd sync command. By default a md5 checksum and file size is compared. This is similar to a unix rsync command, with some exceptions outlined below.
    Filenames handling rules and some other options are common for both these methods.

Filenames handling rules

Syncget and put all support multiple arguments for source files and one argument for destination file or directory (optional in some case of get). The source can be a single file or a directory and there could be multiple sources used in one command. Let’s have these files in our working directory:

~/demo$ find .

Obviously we can for instance upload one of the files to S3 and give it a different name:

~/demo$ s3cmd put file0-1.msg s3://s3tools-demo/test-upload.msg
file0-1.msg -> s3://s3tools-demo/test-upload.msg  [1 of 1]

We can also upload a directory with --recursive parameter:

~/demo$ s3cmd put --recursive dir1 s3://s3tools-demo/some/path/
dir1/file1-1.txt -> s3://s3tools-demo/some/path/dir1/file1-1.txt  [1 of 2]
dir1/file1-2.txt -> s3://s3tools-demo/some/path/dir1/file1-2.txt  [2 of 2]

With directories there is one thing to watch out for – you can either upload the directory and its contents or just the contents. It all depends on how you specify the source.

To upload a directory and keep its name on the remote side specify the source without the trailing slash:

~/demo$ s3cmd put -r dir1 s3://s3tools-demo/some/path/
dir1/file1-1.txt -> s3://s3tools-demo/some/path/dir1/file1-1.txt  [1 of 2]
dir1/file1-2.txt -> s3://s3tools-demo/some/path/dir1/file1-2.txt  [2 of 2]

On the other hand to upload just the contents, specify the directory it with a trailing slash:

~/demo$ s3cmd put -r dir1/ s3://s3tools-demo/some/path/
dir1/file1-1.txt -> s3://s3tools-demo/some/path/file1-1.txt  [1 of 2]
dir1/file1-2.txt -> s3://s3tools-demo/some/path/file1-2.txt  [2 of 2]

Important — in both cases just the last part of the path name is taken into account. In the case of dir1without trailing slash (which would be the same as, say, ~/demo/dir1 in our case) the last part of the path is dir1 and that’s what’s used on the remote side, appended after s3://s3…/path/ to make s3://s3…/path/dir1/….

On the other hand in the case of dir1/ (note the trailing slash), which would be the same as ~/demo/dir1/(trailing slash again) is actually similar to saying dir1/* – ie expand to the list of the files in dir1. In that case the last part(s) of the path name are the filenames (file1-1.txt and file1-2.txt) without the dir1/ directory name. So the final S3 paths are s3://s3…/path/file1-1.txt and s3://s3…/path/file1-2.txtrespectively, both without the dir1/ member in them. I hope it’s clear enough, if not ask in the mailing list or send me a better wording 😉

The above examples were built around put command. A bit more powerful is sync – the path names handling is the same as was just explained. However the important difference is that sync first checks the list and details of the files already present at the destination, compares with the local files and only uploads the ones that either are not present remotely or have a different size or md5 checksum. If you ran all the above examples you’ll get a similar output to the following one from a sync:

~/demo$ s3cmd sync  ./  s3://s3tools-demo/some/path/
dir2/file2-1.log -> s3://s3tools-demo/some/path/dir2/file2-1.log  [1 of 2]
dir2/file2-2.txt -> s3://s3tools-demo/some/path/dir2/file2-2.txt  [2 of 2]

As you can see only the files that we haven’t uploaded yet, that is those from dir2, were now sync‘ed. Now modify for instance dir1/file1-2.txt and see what happens. In this run we’ll first check with —dry-run to see what would be uploaded. We’ll also add —delete-removed to get a list of files that exist remotely but are no longer present locally (or perhaps just have different names here):

~/demo$ s3cmd sync --dry-run --delete-removed ~/demo/ s3://s3tools-demo/some/path/
delete: s3://s3tools-demo/some/path/file1-1.txt
delete: s3://s3tools-demo/some/path/file1-2.txt
upload: ~/demo/dir1/file1-2.txt -> s3://s3tools-demo/some/path/dir1/file1-2.txt
WARNING: Exiting now because of --dry-run

So there are two files to delete – they’re those that were uploaded without dir1/ prefix in one of the previous examples. And also one file to be uploaded — dir1/file1-2.txt, the file that we’ve just modified.

Sometimes you don’t want to compare checksums and sizes of the remote vs local files and only want to upload those that are new. For that use the —skip-existing option:

~/demo$ s3cmd sync --dry-run --skip-existing --delete-removed ~/demo/ 
delete: s3://s3tools-demo/some/path/file1-1.txt
delete: s3://s3tools-demo/some/path/file1-2.txt
WARNING: Exiting now because of --dry-run

See? Nothing to upload in this case because dir1/file1-2.txt already exists in S3. With a different content, indeed, but --skip-existing only checks for the file presence, not the content.

Download from S3

Download from S3 with get and sync works pretty much along the same lines as explained above for upload. All the same rules apply and I’m not going to repeat myself. If in doubts run your command with —dry-run. If still in doubts ask on the mailing list for a help 🙂

Filtering with —exclude / —include rules

Once the list of source files is compiled it is filtered through a set of exclude and include rules, in this order. That’s quite a powerful way to fine tune your uploads or downloads — you can for example instruct s3cmd to backup your home directory but don’t backup the JPG pictures (exclude pattern), except those whose name begins with a capital M and contain a digit. These you want to backup (include pattern).

S3cmd has one exclude list and one include list. Each can hold any number of filename match patterns, for instance in the exclude list the first pattern could be “match all JPG files” and the second one “match all files beginning with letter A” while in the include pattern may be just one pattern (or none or two hundreds) saying “match all GIF files”.

There is a number of options available to put the patterns in these lists.

  • —exclude / —include — standard shell-style wildcards, enclose them into apostrophes to avoid their expansion by the shell. For example --exclude 'x*.jpg' will match x12345.jpg but not abcdef.jpg.
  • —rexclude / —rinclude — regular expression version of the above. Much more powerful way to create match patterns. I realise most users have no clue about RegExps, which is sad. Anyway, if you’re one of them and can get by with shell style wildcards just use —exclude/—include and don’t worry about —rexclude/—rinclude. Or read some tutorial on RegExps, such a knowledge will come handy one day, I promise 😉
  • —exclude-from / —rexclude-from / —(r)include-from — Instead of having to supply all the patterns on the command line, write them into a file and pass that file’s name as a parameter to one of these options. For instance --exclude '*.jpg' --exclude '*.gif' is the same as --exclude-from pictures.exclude where pictures.exclude contains these three lines:
    # Hey, comments are allowed here ;-)

All these parameters are equal in the sense that a file excluded by a --exclude-from rule can be put back into a game by, say, --rinclude rule.

One example to demonstrate the theory…

~/demo$ s3cmd sync --dry-run --exclude '*.txt' --include 'dir2/*' . s3://s3tools-demo/demo/
exclude: dir1/file1-1.txt
exclude: dir1/file1-2.txt
exclude: file0-2.txt
upload: ./dir2/file2-1.log -> s3://s3tools-demo/demo/dir2/file2-1.log
upload: ./dir2/file2-2.txt -> s3://s3tools-demo/demo/dir2/file2-2.txt
upload: ./file0-1.msg -> s3://s3tools-demo/demo/file0-1.msg
upload: ./file0-3.log -> s3://s3tools-demo/demo/file0-3.log
WARNING: Exiting now because of --dry-run

The line in bold shows a file that has a ,txt extension, ie matches an exclude pattern, but because it also matches the ‘dir2/*’ include pattern it is still scheduled for upload.

This exclude / _include filtering is available for putget and sync. In the future delcp and mv will support it as well.

Duplicity + S3: easy, cheap, encrypted, automated full-disk backups for your servers


Backups are one of those things that are important, but that a lot of people don’t do. The thought of setting up backups always raised a mental barrier for me for a number of reasons:

  • I have to think about where to backup to.
  • I have to remember to run the backup on a periodic basis.
  • I worry about the bandwidth and/or storage costs.

I still remember the days when a 2.5 GB harddisk was considered large, and when I had to spent a few hours splitting MP3 files and putting them on 20 floppy disks to transfer them between computers. Backing up my entire harddisk would have costed me hundreds of dollars and hours of time. Because of this, I tend to worry about the efficiency of my backups. I only want to backup things that need backing up.

I tended to tweak my backup software and rules to be as efficient as possible. However, this made setting up backups a total pain, and makes it very easy to procrastinate backups… until it is too late.

I learned to embrace Moore’s Law

Times have changed. Storage is cheap, very cheap. Time Machine — Apple’s backup software — taught me to stop worrying about efficiency. Backing up everything not only makes backing up a mindless and trivial task, it also makes me feel safe. I don’t have to worry about losing my data anymore. I don’t have to worry that my backup rules missed an important file.

Backing up desktops and laptops is easy and cheap enough. A 2 TB harddisk costs only $100.

What about servers?

  • Most people can’t go to the data center and attach a hard disk. Buying or renting another harddisk from the hosting provider can be expensive. Furthermore, if your backup device resides on the same location where the data center is, then destruction of the data center (e.g. a fire) will destroy your backup as well.
  • Backup services provided by the hosting provider can be expensive.
  • Until a few years ago, bandwidth was relatively expensive, making backing up the entire harddisk to a remote storage service an unviable option for those with a tight budget.
  • And finally, do you trust that the storage provider will not read or tamper with your data?

Enter Duplicity and S3

Duplicity is a tool for creating incremental, encrypted backups. “Incremental” means that each backup only stores data that has changed since the last backup run. This is achieved by using the rsync algorithm.

What is rsync? It is a tool for synchronizing files between machines. The cool thing about rsync is that it only transfers changes. If you have a directory with 10 GB of files, and your remote machine has an older version of that directory, then rsync only transfers new files or changed files. Of the changed files, rsync is smart enough to only transfer the parts of the files that have changed!

At some point, Ben Escoto authored the tool rdiff-backup, an incremental backup tool which uses an rsync-like algorithm to create filesystem backups. Rdiff-backup also saves metadata such as permissions, owner and group IDs, ACLs, etc. Rdiff-backup stores past versions as well and allows easy rollback to a point in time. It even compresses backups. However, rdiff-backup has one drawback: you have to install it on the remote server as well. This makes it impossible to use rdiff-backup to backup to storage services that don’t allow running arbitrary software.

Ben later created Duplicity, which is like rdiff-backup but encrypts everything. Duplicity works without needing special software on the remote machine and supports many storage methods, for example FTP, SSH, and even S3.

On the storage side, Amazon has consistently lowered the prices of S3 over the past few years. The current price for the US-west-2 region is only $0.09 per GB per month.

Bandwidth costs have also lowered tremendously. Many hosting providers these days allow more than 1 TB of traffic per month per server.

This makes Duplicity and S3 the perfect combination for backing up my servers. Using encryption means that I don’t have to trust my service provider. Storing 200 GB only costs $18 per month.

Setting up Duplicity and S3 using Duply

Duplicity in itself is still a relative pain to use. It has many options — too many if you’re just starting out. Luckily there is a tool which simplifies Duplicity even further: Duply. It keeps your settings in a profile, and supports pre- and post-execution scripts.

Let’s install Duplicity and Duply. If you’re on Ubuntu, you should add the Duplicity PPA so that you get the latest version. If not, you can just install an older version of Duplicity from the distribution’s repositories.

# Replace 'precise' with your Ubuntu version's codename.
echo deb http://ppa.launchpad.net/duplicity-team/ppa/ubuntu precise main | \
sudo tee /etc/apt/sources.list.d/duplicity.list
sudo apt-get update


# python-boto adds S3 support
sudo apt-get install duplicity duply python-boto

Create a profile. Let’s name this profile “test”.

duply test create

This will create a configuration file in $HOME/.duply/test/conf. Open it in your editor. You will be presented with a lot of configuration options, but only a few are really important. One of them is GPG_KEY and GPG_PW. Duplicity supports asymmetric public-key encryption, or symmetric password-only encryption. For the purposes of this tutorial we’re going to use symmetric password-only encryption because it’s the easiest.

Let’s generate a random, secure password:

openssl rand -base64 20

Comment out GPG_KEY and set a password in GPG_PW:

GPG_PW='<the password you just got from openssl>'

Scroll down and set the TARGET options:

TARGET='s3://s3-<region endpoint name>.amazonaws.com/<bucket name>/<folder name>'
TARGET_USER='<your AWS access key ID>'
TARGET_PASS='<your AWS secret key>'

Substitute “region endpoint name” with the host name of the region in which you want to store your S3 bucket. You can find a list of host names at the AWS website. For example, for US-west-2 (Oregon):


Set the base directory of the backup. We want to backup the entire filesystem:


It is also possible to set a maximum time for keeping old backups. In this tutorial, let’s set it to 6 months:


Save and close the configuration file.

There are also some things that we never want to backup, such as /tmp/dev and log files. So we create an exclusion file $HOME/.duply/test/exclude with the following contents:

- /dev
- /home/*/.cache
- /home/*/.ccache
- /lost+found
- /media
- /mnt
- /proc
- /root/.cache
- /root/.ccache
- /run
- /selinux
- /sys
- /tmp
- /u/apps/*/current/log/*
- /u/apps/*/releases/*/log/*
- /var/cache/*/*
- /var/log
- /var/run
- /var/tmp

This file follows the Duplicity file list syntax. The - sign here means “exclude this directory”. For more information, please refer to the Duplicity man page.

Notice that this file excludes Capistrano-deployed Ruby web apps’ log files. If you’re running Node.js apps on your server then it’s easy to exclude your Node.js log files in a similar manner.

Finally, go to the Amazon S3 control panel, and create a bucket in the chosen region:

Create a bucket on S3

Enter the bucket name

Initiating the backup

We’re now ready to initiate the backup. This can take a while, so let’s open a screen session so that we can terminate the SSH session and check back later.

sudo apt-get install screen

Initiate the backup:

sudo duply test backup

Press Esc-D to detach the screen session.

Check back a few hours later. Login to your server and reattach your screen session:

screen -x

You should see something like this, which means that the backup succeeded. Congratulations!

--------------[ Backup Statistics ]--------------
Errors 0

--- Finished state OK at 16:48:16.192 - Runtime 01:17:08.540 ---

--- Start running command POST at 16:48:16.213 ---
Skipping n/a script '/home/admin/.duply/main/post'.
--- Finished state OK at 16:48:16.244 - Runtime 00:00:00.031 ---

Setting up periodic incremental backups with cron

We can use cron, the system’s periodic task scheduler, to setup periodic incremental backups. Edit root’s crontab:

sudo crontab -e

Insert the following:

0 2 * * 7 env HOME=/home/admin duply main backup

This line runs the duply main backup command every Sunday at 2:00 AM. Note that we set the HOME environment variable here to /home/admin. Duply is run as root because the cronjob belongs to root. However the Duply profiles are stored in /home/admin/.duply, which is why we need to set the HOME environment variable here.

If you want to setup daily backups, replace “0 2 * * 7” with “0 2 * * *”.

Making cron jobs less noisy

Cron has a nice feature: it emails you with the output of every job it has run. If you find that this gets annoying after a while, then you can make it only email you if something went wrong. For this, we’ll need the silence-unless-failed tool, part of phusion-server-tools. This tool runs the given command and swallows its output, unless the command fails.

Install phusion-server-tools and edit root’s crontab again:

sudo git clone https://github.com/phusion/phusion-server-tools.git /tools
sudo crontab -e


env HOME=/home/admin duply main backup


/tools/silence-unless-failed env HOME=/home/admin duply main backup

Restoring a backup

Simple restores

You can restore the latest backup with the Duply restore command. It is important to use sudo because this allows Duplicity to restore the original filesystem metadata.

The following will restore the latest backup to a specific directory. The target directory does not need to exist, Duplicity will automatically create it. After restoration, you can move its contents to the root filesystem using mv.

sudo duply main restore /restored_files

You can’t just do sudo duply main restore / here because your system files (e.g. bash, libc, etc) are in use.

Moving the files from /restored_files to / using mv might still not work for you. In that case, consider booting your server from a rescue system and restoring from there.

Restoring a specific file or directory

Use the fetch command to restore a specific file. This restores the /etc/password file in the backup and saves it to /home/admin/password. Notice the lack of leading slash in the etc/password argument.

sudo duply main fetch etc/password /home/admin/password

The fetch command also works on directories:

sudo duply main fetch etc /home/admin/etc

Restoring from a specific date

Every restoration command accepts a date, allowing you to restore from that specific date.

First, use the status command to get an overview of backup dates:

$ duply main status
Number of contained backup sets: 2
Total number of contained volumes: 2
 Type of backup set:                            Time:      Num volumes:
                Full         Sat Nov  8 07:38:30 2013                 1
         Incremental         Sat Nov  9 07:43:17 2013                 1

In this example, we restore the November 8 backup. Unfortunately we can’t just copy and paste the time string. Instead, we have to write the time in the w3 format. See also the Time Formats section in the Duplicity man page.

sudo duply test restore /restored_files '2013-11-08T07:38:30'

Safely store your keys or passwords!

Whether you used asymmetric public-key encryption or symmetric password-only encryption, you must store them safely! If you ever lose them, you will lose your data. There is no way to recover encrypted data for which the key or password is lost.

My preferred way of storing secrets is to store them inside 1Password and to replicate the data to my phone and tablet so that I have redundant encrypted copies. Alternatives to 1Password include LastPass or KeePass although I have no experience with them.


With Duplicity, Duply and S3, you can setup cheap and secure automated backups in a matter of minutes. For many servers this combo is the silver bullet.

One thing that this tutorial hasn’t dealt with, is database backups. While we’re backing up the database’s raw files, doing so isn’t a good idea. If the database files were being written to at the time the backup was made, then the backup will contain potentially irrecoverably corrupted database files. Even the database’s journaling file or write-ahead log won’t help, because these technologies are designed only to protect against power failures, not against concurrent file-level backup processes. Luckily Duply supports the concept of pre-scripts. In the next part of this article, we’ll cover pre-scripts and database backups.

I hope you’ve enjoyed this article. If you have any comments, please don’t hesitate to post them below. We regularly publish news and interesting articles. If you’re interested, please follow us on Twitter, or subscribe to our newsletter.

AWS zone code

Code Name
us-east-1 US East (N. Virginia)
us-east-2 US East (Ohio)
us-west-1 US West (N. California)
us-west-2 US West (Oregon)
ca-central-1 Canada (Central)
eu-central-1 EU (Frankfurt)
eu-west-1 EU (Ireland)
eu-west-2 EU (London)
eu-west-3 EU (Paris)
ap-northeast-1 Asia Pacific (Tokyo)
ap-northeast-2 Asia Pacific (Seoul)
ap-northeast-3 Asia Pacific (Osaka-Local)
ap-southeast-1 Asia Pacific (Singapore)
ap-southeast-2 Asia Pacific (Sydney)
ap-south-1 Asia Pacific (Mumbai)
sa-east-1 South America (São Paulo)