Real-time backup


#1

I saw a feature request for real-time backup from 2014 which was closed without any activity that I could find. Is that still under consideration? I’m trying to find a replacement for CrashPlan and Duplicati seems to have every feature and more except the real-time backup.

Just to make it clear what I’m asking for is a mechanism that scans for file changes and backs up the file every time it is changed.


#2

I don’t think Crashplan has that feature either, at least not on a remote server.

Or if you mean that it scans the files periodically and if one has changed it backs it up, then I believe duplicati also has that functionality.


#3

The article you linked talks about source, not destination. CrashPlan does support real-time backup to any destination including cloud and local network.

My question is specifically about backing up local files from directly attached drives to remote destinations.


#4

I’m a crashplan oldtimer at this point, pretty experienced with it… And relatively new to duplicati. So I might be missing something…but it seems to me that if you set the duplicati backup interval to every 15 minutes that it would pretty much accomplish the same thing that crashplan (didn’t the paid version of crashplan back up changed files every 15 minutes?)


#5

I think there are two potential problems:

  1. If the backup job is too big so that it takes longer than 15 minutes, you won’t be saving a new version every 15 minutes
  2. If you have other backup jobs, you will not get any versions while they are running.

Depending on why exactly you want resl time backup, I think the better solution would probably be to to put those frequently changed files in a Dropbox folder and let Dropbox do the real time backups (which it saves for 30 days). I guess in most use cases Dropbox will do as your short term backup and let duplicati take care of the long term backup.


Slow backup Mozy vs Duplicati
#6

As the others have noted, such a feature is not currently there.
There is support for handing over the list of changed/deleted paths to avoid the scanning of the disk, but there is not yet a monitor that produces this list.

For the problem with duration, there is really not much you can do if the backup takes longer than the interval (the 15min mentioned).


#7

Is there any plan to implement this feature or is it completely on back burner?


#8

Not sure, we have not made a real road-map yet.


#10

A post was split to a new topic: Run backup when PC is idle


#11

I’m also a CrashPlan user, and I’m still desperately looking for a solution that will do real-time back-ups without scanning my entire file system. To give you an idea: my laptop has roughly 500k files in the backup set, any one of which may be modified. Using CrashPlan, I had real-time backup at 15’ intervals. Using Duplicati, a scan now takes 30’, and a lot of CPU, forcing me to a single backup per day.

I would be happy to implement the real-time support (using the FileSystemWatcher class) on Windows for Duplicati, but would need some help how to interface this with Duplicati. Could you give me some overview of the relevant interfaces?

  • Daniel

#12

Sounds awesome!

The implementation is based on the two options --changed-files and --deleted-files options. Using the commandline, you can set these two to a path separated list (using ; on Windows and : on others). Once they are set, the file scanner is not used, and it only scans the --changed-files and removes the --deleted-files.

If you are using USN, it should be possible to store the previous USN time-stamp and then query USN to give the change lists. This will work even if you restart Duplicati in the meantime.

For the FileWatcher approach, you have the problem that it is not guaranteed to keep running, so you need to handle the case where it is started (and thus cannot help you and must run a full scan). If it has been running since the last backup, you can use the list and avoid the scan.

The list may be really large (many files are changed), so perhaps you need something like a database to store the filenames. It is also possible that we need to re-design the way we pass the list of filenames to avoid storing them in-memory( i.e. using a file or similar).

I am not sure how to do this the best, but perhaps it would make sense to start the FileWatcher when running a backup. This first run will then use the scan, but keep the FileWatcher running, such that the next backup is using the data from the FileWatcher.

If this is the “right way”, I think you can inject this into the “Runner.cs” class in the Duplicati.Server project:

Let me know if you have any questions.


#13

Ken,

Thanks for the introduction. I agree that running the FileSystemWatcher in a separate process, and then passing the changed / deleted file on the command line (or using some other mechanism, such as IPC) is risky.

I’d rather favour the approach of an in-process task, with an initial scan and then updating the file list using the watcher thread.

I will look into this over the next weeks.

  • Daniel

#14

Buy if you do it this way you will

  • have to keep multiple lists for different tasks and
    every first scan per task will have to be done from scratch and
  • or after that on every run you would have to check the change list and double check if it’s indeed newer then the last backup because it might have been there from before the last full scan.

Understand what I try to say? :innocent:


#15

I’m really looking forward to this real-time backup thing.


#16

This part is very useful. I was wondering how this worked. I still don’t completely understand how to use it though. I’ve searched the forum for information about --changed-files, but could only find this topic discussion about it so far.

Are there any ideas for other file systems that Duplicati can take advantage of?


#17

I believe --changed-files is a “;” (Windows) or “:” (non-Windows) separated list of file paths previously determined to have been changed. Similarly, --deleted-files is a list of file paths previously determined to have been deleted.

My guess would be that a third party “file watcher” tool can be monitoring the file system between Duplicati runs, then when Duplicati starts a backup, a --run-script-before script queries the file-watcher to provide a list of changed and deleted files which are then passed into Duplicati likely using DUPLICATI__changed_files and DUPLICATI__deleted_files environment variables.


Here are the descriptions of the two file list parameters as of version 2.0.2.12_canary.

Duplicati.CommandLine.exe help changed-files
  --changed-files (Path): List of files to examine for changes
    This option can be used to limit the scan to only files that are known to
    have changed. This is usually only activated in combination with a
    filesystem watcher that keeps track of file changes.

Duplicati.CommandLine.exe help deleted-files
  --deleted-files (Path): List of deleted files
    This option can be used to supply a list of deleted files. This option
    will be ignored unless the option --changed-files is also set.

#18

I wish all of this usage information was easily accessible in one place.


#19

Well, technically it (like the above) IS all in one place - it just happens to be the command-line do isn’t searchable.

Hang in there - you might see a topic about it soon and there is some wonderful work on an actual manual going on!


#20

Can I be able to get that information using macOS or Linux?


#21

Yep. Just run Duplicati.CommandLine.exe help.