Speed up restores by different restore methods / options
The current way JetBackup handles restoration can be a rather slow process, and can be optimized *a lot* in various scenarios.
Full Account Restores
Currently how a full account restore works:
- JetBackup asks the backup server to tar.gz the snapshot you want to restore on the backup server, writing it locally on the backup server
- JetBackup then scp the file to the webserver where it have to be restored
- JetBackup executes a /scripts/restorepkg on the .tar.gz file
This can be optimized by rsyncing the snapshot directly as is (uncompressed), from the backup server to the webserver, and then run /scripts/restorepkg since the restorepkg script supports restoring uncompressed backups.
This will save the time it takes to write the .tar.gz file on the backup server (useless IO).
Using "rsync -> restorepkg" to transfer the files will overall be faster than "tar -> scp -> restorepkg", and put less load on the backup server (which might be doing other things).
Using rsync there should be the possibility to enable or disable compression, since in most cases, the compression time from sync actually isn't faster than sending the uncompressed files over a 1 gigabit link (only if you can have a high compression ratio, which to be honest, rarely happens for most hosting accounts)
File Restores
The same principle goes for File restores:
- JetBackup asks the backup server to tar.gz a specific set of files and/or directories
- JetBackup then scp the file
- JetBackup Uncompresses the file on a temporary location
- JetBackup fixes permissions
- JetBackup moves the files to the correct location
Here we have a few additional steps: Uncompress the file to temp location and fix permissions.
The fixing of permissions isn't really an issue, The file system will be fast doing it, since the inodes are already in the disk buffers, since we just uncompressed the files.
Uncompressing a file isn't an issue either - but the fact the files in first place are compressed will be the issue - we're compressing something to copy it, to then uncompress it again - this causes useless IO on both the backup server (since we have to read files to compress, and at same time write the compressed file (Read useless IO))
Once again we can make use of rsync (as you do with backups), to transfer the files to the destination server.
This has the benefit that we save a lot of writes on the backup server, we can transfer them using compression or no compression in rsync - once again, depending on the compression ratio, the compression might actually take longer than the network overhead we'd have by transferring files uncompressed.
Footnotes
[1] I was informed by support that the reason compression is used is due to restoration times were faster by compressing -> scp -> restore - this was measured in the first initial versions
Note: I believe that this is true, but years has passed by, majority of backup servers are either connected to 1 or 10 gigabit networking these days (if not more).
a 1 gigabit connection can utilize 125 megabyte per second (Let's be fair and say 120 megabyte to take care of some TCP/IP overhead)
Most disk arrays (even raid 6) can surely write more than 125 megabyte per second, but it's not write speed that is the issue, it's random IO.
Most backup servers are large capacity, high redundancy setups, either using raid 5, 6, 50 or 60, spinning disks (Nearline SAS or SATA 7.2k disks) - these do awful random IO - therefore limit the amount of random IO you perform - having reads & writes at same time in such kind of servers usually isn't very pretty, and the cause of it, is that effectively you cannot utilize the same amount of actual bandwidth as you can utilize with 1 gigabit and read only IO.
In the 100 megabit days - sure, might be possible, but it really isn't the case anymore - so as much as I love that you have data about it - the fact it was measured in initial versions, isn't the same as measuring it in 2017 - because the world is very different today than it was years ago.
For same reason, I believe we should have the option for selecting tar or rsync as a restoration method - then people can decide whatever they want to use - and what they find fastest.
[2] Compression isn't always optimal - when using compression (such as gzip, or zlib), it's not always optimal, it all depends on the compression ratio you can get from your files, if your compression ratio is less than 1.5 (which is pretty often the case for hosting accounts), then the CPU overhead (even with fast CPUs) are actually higher than the additional transferred bytes of your backups (being as a .tar archive or using rsync with compression).
I suggest that this becomes a configurable option if we want to use compression or not - since it can greatly introduce additional restoration (and backup) time by compressing the transfer or the archive.
We did multiple examples of account restores for all different kind of scenarios - the only time where using compression actually was a benefit, was with an account that contained very few images, and *a lot* of bigger text files - because the amount of bytes saved was so high.
But 99% of the cases resulted in compression just adding additional time to the whole process, sometimes seconds, sometimes many minutes additional - a few examples, all tests were performed where we dropped the servers cache before every transfer, this gives us a very real world example of where files are stored on a backup server, and we want to restore them.
Additionally the backup server were configured with a raid 6, using enterprise grade SATA disks (Ultrastar 7K6000) which are known for rather good random IO, and speeds in general.
The system has at same time been optimized for better IO when it comes to small files such as tweaking the stripe size, buffer sizes etc - meaning our overall IO performance are 20% faster compared to stock CentOS 7 kernel and array settings.
The connection the server was 1 gigabit (shared), and the test being performed between two datacenters, two different networks.
---------------
Account: 4.28 gigabyte in size with 244.072 files:
rsync with compression: 4 minutes and 9 seconds (speedup is 1.75)
rsync without compression : 3 minutes and 30 seconds (speedup is 1.00)
tar stream with extract on destination: 4 minutes and 39 seconds
tar.gz stream with extract on destination: 5 minutes and 30 seconds
tar on backup -> SCP: 4 minutes and 59 seconds + 42 seconds to scp
The method jetbackup uses matches the last one, except JetBackup also uses gzipping - this would only have added time.
---------------
Account: 5.53 gigabyte and 49.665 files
rsync with compression: 1 minute and 29 seconds (speedup is 1.36)
rsync without compression: 1 minute and 1 second (speedup is 1.00)
tar on backup -> SCP: 2 minutes and 2 seconds
---------------
Account: 13.5 gigabyte and 208.289 files
rsync with compression: 5 minutes and 9 seconds (speedup is 1.42)
rsync without compression: 2 minutes and 24 seconds (speedup is 1.00)
tar on backup -> SCP: 6 minutes and 39 seconds (due to random IO and amount of files)
---------------
Account: 23.29 gigabyte and 581.969 files
rsync with compression: 19 minutes and 23 seconds (speedup is 1.08)
rsync without compression: 11 minutes and 39 seconds (speedup is 1.00)
tar on backup -> SCP: 26 minutes and 49 seconds
Hi,
Thank you for your feedback :)
As you mentioned, We actually did took the "rsync -> restorepkg" approach first, but found it was slower.
We measured the restore times with 150 servers (lan connection) and found that rsyncing only one file is much more faster, even if you consider the time it takes you to create the tar.gz file. Rsyncing 100k file over wan can take a long time, while if you do the same with only one file - it will be very fast.
We can however consider taking it out to settings -> you will choose which restore approach you want.
We are currently doing heavy coding for the new 3.2 version is a major core change and improvments, this will be considered as well.
Thanks,
Eli.
Hi,
Thank you for your feedback :)
As you mentioned, We actually did took the "rsync -> restorepkg" approach first, but found it was slower.
We measured the restore times with 150 servers (lan connection) and found that rsyncing only one file is much more faster, even if you consider the time it takes you to create the tar.gz file. Rsyncing 100k file over wan can take a long time, while if you do the same with only one file - it will be very fast.
We can however consider taking it out to settings -> you will choose which restore approach you want.
We are currently doing heavy coding for the new 3.2 version is a major core change and improvments, this will be considered as well.
Thanks,
Eli.
Replies have been locked on this page!