Look before you leap into Disk Encryption

Thanks to http://www.debian-administration.org/article/Look_before_you_leap_into_Disk_Encryption

Thinking Disk Encryption give you more peace of mind? Then think again. It’s well known that “fail to plan” means “plan to fail”. But when comes to Disk Encryption I did not see any reasonable planning on disk failure, even though I’ve googled extensively.

Before I go into detail, let’s outline the problem we are trying to solve here — Disk Encryption for *normal* *home* user. They differ from big corporation in that, big corporation will throw away disks once SMART *indicates* the disk is failing, while normal home user will try still to use it until it fails massively. Well, at least I do that, and I buy cheap 3TB Seagate Barracuda drive, which is even cheaper than 2TB Western Digital hard drive, because I know carefully planning makes all the differences.

When I asked the question on the Debian-user mailing-list, the first answer I got was, “have more backups”. If that’s the only answer that comes into your mind, then you are just kidding yourself or not a normal home user as I outlined above, because practically, how would you backup a 3TB hard drive, onto spindles and spindles of DVDs or just buy another 3TB hard drive? What if they fail as well?

So it all boils down to assessing the risk and do proper planning.

How big the risk is with Disk Encryption? A tiny error in the hard drive, your 3TB storage could be gone forever. If someone is saying at the back of their mind “I might still have a chance to salvage the situation as always before”, then he is simply “planning” to fail, because Disk Encryption is designed to fight against forensic analysis, even the pro’s can’t do it. Blindly go into the Full Disk Encryption without knowing how to properly provide a safety net for yourself is going to be a total disaster, because when I googled for answers, all that I got was incidents after incidents that the disk is gone forever.

The solution?

I’m still investigating the situation, and have the following three strategies available. Please jump in if you have more suggestions.

1. Know what to backup. This is very important, especially for Disk Encryption. So far I haven’t backup anything on my encrypted disk yet (because it is impractical to me as a normal home user), but will do the following immediately. Details is from http://lists.debian.org/debian-user/2013/05/msg00025.html:

“I guess what you are referring to can happen if you get bad sectors where the LUKS header resides. This is a single point of failure in LUKS whole-disk encryption, to plan for this you must have current backups (but most likely on another encrypted media, so there is always a tiny probability that this is going to happen there too), and backup the LUKS headers (see command “cryptsetup luksHeaderBackup”). See cryptsetup man for security good practice regarding the headers backups.”

2. Armed with better gadget. Having backups is the passive way to build a safety net, but how about active protection? Markus Gattol, the author of dm-crypt and LUKS full-disk encryption, recommend to use Btrfs. He has been a BtrFS fan as early as August 2008 (http://www.markus-gattol.name/ws/dm-crypt_luks.html). Why? Because “Btrfs is a new copy on write (CoW) filesystem for Linux aimed at implementing advanced features while focusing on fault tolerance, repair and easy administration” (https://help.ubuntu.com/community/btrfs). It is jointly developed at Oracle, Red Hat, Fujitsu, Intel, SUSE, STRATO and many others (https://btrfs.wiki.kernel.org/index.php/Main_Page). In 2008, the principal developer of the ext3 and ext4 file systems, Theodore Ts’o, stated that although ext4 has improved features, it is not a major advance, it uses old technology, and is a stop-gap; Ts’o believes that Btrfs is the better direction because “it offers improvements in scalability, reliability, and ease of management” (http://en.wikipedia.org/wiki/Btrfs). What attract me of the BtrFS fault tolerance features are:

  • Online data scrubbing for finding errors and automatically fixing them for files with redundant copies.
  • Checksums on data and metadata
  • Check out the rest at http://en.wikipedia.org/wiki/Btrfs

3. Quarantine the disk failures. Sharing a secret to all normal home user like me — you don’t need to buy expensive hard drives in hoping that they will not fail. Bad disk sector will happen, regardlessly, no matter how many times more expensive your disk is than mine. Hard disk will fail, no matter what, but what hardly happens is when it fails massively. What I used to do is to mark the bad sectors in inodes as bad and not using them any more. Works great. Don’t believe my words? Check this out: http://www.linuxforum.com/threads/3265-bad-sectors-on-disk, “I have some bad sectors on my hard drive. What I did was to make a partition on the part which has the bad sectors. Then I just do not use that particular partition. It’s been two years now. The rest of the hard drive is still working well, 12-16 hours every day, seven days a week.”

Still don’t feel safe enough? Here is another trick, about how I control where my disk failures occur. My old 1.5T Seagate Barracuda is nearly 10 years old now, living way pass its warranty period. If you to take a look at the SMART status report, you will find the astonishing “DISK FAILURE IS IMMINENT” warning, because the reallocated units is more than 100 times over the disk failing threshold. But still I’m confident that it’s working fine for me. The trick is to quarantine the disk failures. I knew that my Seagate Barracuda will fail much easily than any other brands, so I treat it like a re-writable CD/DVD. How to prolong the life span of re-writable CD/DVD? By minimizing the re-write times.

In brief there are three kind of partitions in my hard disk.:

  • My caches, constantly being writing to and overwritten, but the content is not important. Cope mechanism: constant bad block checks.
  • My documents, which are fairly constantly being written and overwritten, and the content is important. Cope mechanism: triple, quadruple backups.
  • My collections, HUGE amount, no way to backup for me. Cope mechanism: put them into “write-once” partitions. I.e., the whole partition is only updated when new files comes in. No any unnecessary updates what so ever.

Thus, even SMART tells me that my DISK FAILURE IS IMMINENT, even the reallocated units is more than 100 times over the disk failing threshold. I know my file are safe.

Alright, enough bubbling. What’s your idea to cope with disk failures, especially for full-disk encryption?