Sunday, May 26, 2013

Quick and dirty tests

I'm starting to test hash (sha256 and adler32)  and compression (gzip and lzma) algorithms. Right now the program breaks its executable file into pieces, and writes them compressed with gzip using their sha256 hash as a name.
Not much, but a beginning.
Now I'm thinking about how metadata should be stored, how to test for new files and modifications inside a file already backed up.
Next, I'll try to back a file up storing its metadata and restore it in other directory.
In order to detect changes in a file already backed up I need a rolling checksum algorithm. I've found an implementation and asked for permission for using it, giving proper credit, of course.

Sunday, May 19, 2013

Lowering expectations

It's time to admit that peerbackup will never be anything else than vaporware blogware if I don't lower my expectations about it (and start producing some actual code).
The best way of getting a first version done might be simplifying its network requirements. In theory it would be great to have a truly decentralized and distributed backup system, but in real life it might work as well a reduced version of it.
To put it bluntly: node management won't be automatic. A node will add another peer manually. The number of nodes will remain low and known. I'm thinking about an scenario where a group of people (friends, family, co-workers) agree to set a backup network. 
It makes the project infinitely more boring. No real p2p, no real anonymity. 
What remains then? Once the network is set up, a distributed encrypted backup. It will be more similar to an array of disks (or a RAID) than to a bittorrent network.
It may have some benefits, though. First of all, you should be able to tell when a node is down and warn its propietary. It also solves the problem of trusting unknown nodes. You know where your data is. 
But truth to be told, despite some potential benefit, the decision is to sacrifice functionality in order to get anything at all done. If it works and I get it done, there's always the possibility of a better second version.