Duplicati 2 vs. Duplicacy 2

How so?

Search to restore or the functions separately?

I explained in Edit2 here, it turned out not quite as bad as I thought. But Duplicati is clearly superior here. To start with, duplicacy can only restore to the original location. You can’t tell it to restore something to your desktop or something like that. Big minus for me.

Then, you cannot search for a file across revisions using the GUI. If you want to search your entire archive for a particular file, you need to do it via the CLI and once you’ve found it, you need to restore it using an entirely separate command, i.e. you can’t just say: “yes, that one, please restore it.”

And if you don’t want/like/need the GUI, you cannot search to restore at all. You can search using the history command, and if you want, you can take not of what you found and restore it with the restore command. In fact, it may be so that you can’t search for a file using CLI on windows at all, because the command proposed by the developer includes grep which is a linux command and does not exist on the windows command line. In other words: duplicacy CLI doesn’t really offer any search itself but merely allows you to search the output of the history command using other tools.

Well, it seems strange to discuss the Duplicacy commands in the Duplicati forum, but here we go :wink: I think is being an interesting conversation.

(Remembering that I’m using the CLI version in Windows 10).

This is not completely correct. I’ve set up a script that runs daily with the following steps:

  1. checks the integrity of the snapshot chunks (Duplicacy check command);
  2. randomly selects X files from local folders to be compared to backup;
  3. creates a new temporary folder to download selected backup files (not the local original folder);
  4. downloads the selected files from the backup to the temporary folder;
  5. compares the files downloaded in the temporary folder with the files in the “official” folder via MD5 checksum;
  6. record a log and send an email report with the result of the comparison;
  7. erases all files downloaded for testing and the temporary folder;
  8. performs new daily backups (new revision);
  9. sends an email with the result of the backup (a single email with one line per job).

So by steps 3 and 4 above you can see that it is possible to restore the files to a different folder than the original one.

In Duplicacy you can save the backup settings to a centralized folder, they do not have to be in the original folders themselves (called repositories). So it’s easy for the script I described above to retrieve these settings.

This is a good example of what I commented some posts above, that some things that Duplicati does automatically, in Duplicacy have to be placed in the scripts.

About:

This is true (about grep), but you can easily send the file name by include pattern or by parameter when calling the command.

All this reinforces what I said above: Duplicati is more user friendly, but if you like to control things by scripts (like me), there are no major complications to using Duplicacy. But I recognize that not everyone likes scripting.

1 Like

Since we are talking about differences, there is a very useful Duplicacy command, which makes it very similar to Crashplan in terms of version maintenance:

$ duplicacy prune -keep 1:7       # Keep 1 snapshot per day for snapshots older than 7 days
$ duplicacy prune -keep 7:30      # Keep 1 snapshot every 7 days for snapshots older than 30 days
$ duplicacy prune -keep 30:180    # Keep 1 snapshot every 30 days for snapshots older than 180 days
$ duplicacy prune -keep 0:360     # Keep no snapshots older than 360 days

You can even use:

$ duplicacy prune -keep 0:360 -keep 30:180 -keep 7:30 -keep 1:7

(source: Duplicacy guide at GitHub)

I know that similar functionality is being developed for Duplicati (and I’m following).

This is an essential point to reduce the space used in remote storage.

1 Like

The duplicacy developer said:

to restore to a different location you’ll need to initialize a new repository on the new location with the same repository id.

and I guess I overinterpreted that as basically meaning “you can’t restore to a new location”. I suppose that is what your script is doing (initialize a new repository)?

I still think this is a bit of an overkill if I simply want to restore a single file, even for someone who appreciates the benefits of scripting. It may work well as part of your housekeeping script, but if you just want that file?

I see scripting as an additional feature that adds flexibility to the product. When scripting becomes a philosophy of simplicity that actually demands the user to be flexible (and compose the right command to achieve simple things), then I think that scripting approach has gone wrong. Scripting should not mean: “well, then you have to script everything”.

I’m not sure I understand what you mean here.

No, it is not necessary to initialize from scratch.

Duplicacy has two ways of storing the settings for each repository (the local folders) and the storages for which each repository sends the backups:

  • a “duplicacy” folder inside each repository (unpractical …)
  • a centralized folder with all configurations of all repositories.

I use the second format. And in this format, there is only a small file in each repository that contains only one line with the configuration path in the centralized folder.

So to get clearer:

I have the d:\myprojects folder that I want to back up (it’s a “repository”, by the nomenclature of Duplicacy);

I have a centralized folder of settings, in which there is a subfolder for each repository:

\centralized_configs
└── my…
└── my…
└── myprojects
└── my…

In the repository (d:\myprojects) there is only one .duplicacy file with the line:

\centralized_configs\myprojects

(no key, password, nothing, just the path)

So in the script I just create this little file inside the temporary folder for which I’m going to download the test files.

You just do this:

duplicacy restore -storage "storage_where_my_file_is" file1.txt

(without worrying about the subfolder, etc)

If you execute this command from the original folder, it will restore the file to the original subfolder (the original location). If you run the command on a temporary folder, it will restore to that temporary folder by rebuilding the folder structure for this file.

Or use patterns: Include Exclude Patterns

In this case (Duplicacy backups) I’m considering the philosophy of the scripts like this:
let’s put in a file all the commands that I would have to type, and schedule the execution of this file:slightly_smiling_face:

1 Like

FYI, the duplicacy developer made a nice summary of this topic. In particular, his statement about trusting the backend is very enlightening:

In Duplicacy we took the opposite way – we assumed that all the cloud storage can be trusted and we based our design on that. In cases where a cloud storage violates this basic assumption (for example, Wasabi, OneDrive, and Hubic), we will work with their developers to fix the issue (unfortunately not everyone is responsive) and in the meantime roll out our own fix/workaround whenever possible.

3 Likes

I’d be curious to understand the deficiencies that make these specific cloud storage services fall short of that assumption.

I’m following both topics. In the topic of the Duplicacy site you will find the links:

Wasabi issue
OneDrive issue
Hubic issue

Thanks. A little light evening reading…
:grinning:

Yes, that is actually the situation I was thinking of. The problem here is that the backend (either in Duplicati or the actual destination) is broken, and returns an incorrect result. This really should be fixed, because otherwise you may end up having a partial backup (some files are really missing).

Not checking the destination is really a dangerous workaround.

1 Like

Thanks for that link!

When I wrote Duplicati 1.x I also had the same approach: of course the storage works, otherwise they will (should?) fix it.

Having seen the error reports from the 1.x users, I know this is just not the case.

One particularly interesting problem is that WebDAV on Apache has a slight race condition, where it will list a file as “existing, downloadable, and correct” and afterwards delete the file:
https://bz.apache.org/bugzilla/show_bug.cgi?id=54793

For this reason, Duplicati uses random filenames, and does not overwrite an existing file. Using the CY approach it is hard to do the same as the hash names are the filenames, but not checking if the remote chunck is really there will cause problems when you need to restore.

The OneDrive issue that @TowerBR links to is another example of a real problem that is not discovered unless you check. (should be fixed by writing a new backend that uses the update OneDrive API and as such is easy to fix for CY and TI alike).

But I agree that there are multiple issues that cause the database in TI to break, and they should of course all be fixed asap as they create a really poor user experience.

1 Like

Hm, I find it increasingly hard to make a judgement between the DY and DT approach. One reason for this difficulty is that it is really difficult to weigh the different pros and cons against each other. Another reason is that both sides seem to be addressing various issues, so that what is true today may not be true in a few months.

For example, duplicacy seems to be addressing at least some of the trust problem:

But again, I am not really able to judge what is being fixed here and what problems will remain and perhaps can’t be addressed without a database.

Continuing the tests …

One of the goals of my use for Duplicati and Duplicacy is to back up large files (some Gb) that are changed in small parts. Examples: Veracrypt volumes, mbox files, miscellaneous databases, and Evernote databases (basically an SQLite file).

My main concern is the growth of the space used in the backend as incremental backups are performed.

I had the impression that Duplicacy made large uploads for small changes, so I did a little test:

I created an Evernote installation from scratch and then downloaded all the notes from the Evernote servers (about 4,000 notes, the Evernote folder was then 871 Mb, of which 836 Mb from the database).

I did an initial backup with Duplicati and an initial backup with Duplicacy.

Results:

Duplicati:
672 Mb
115 min

and

Duplicacy:
691 Mb
123 min

Pretty much the same, but with some difference in size.

Then I opened Evernote, made a few minor changes to some notes (a few Kb), closed Evernote, and ran backups again.

Results:

Duplicati (from the log):

BeginTime: 21/01/2018 22:14:41
EndTime: 21/01/2018 22:19:30 (~ 5 min)
ModifiedFiles: 5
ExaminedFiles: 347
OpenedFiles: 7
AddedFiles: 2
SizeOfModifiedFiles: 877320341
SizeOfAddedFiles: 2961
SizeOfExaminedFiles: 913872062
SizeOfOpenedFiles: 877323330

and

Duplicacy (from the log):

Files: 345 total, 892,453K bytes; 7 new, 856,761K bytes
File chunks: 176 total, 903,523K bytes; 64 new, 447,615K bytes, 338,894K bytes uploaded
Metadata chunks: 3 total, 86K bytes; 3 new, 86K bytes, 46K bytes uploaded
All chunks: 179 total, 903,610K bytes; 67 new, 447,702K bytes, 338,940K bytes uploaded
Total running time: 01:03:50

Of course it jumped several chunks, but still uploaded 64 chunks of a total of 176!

I decided to do a new test: I opened Evernote and changed one letter of the contents of one note.

And I ran the backups again. Results:

Duplicati:

BeginTime: 21/01/2018 23:37:43
EndTime: 21/01/2018 23:39:08 (~1,5 min)
ModifiedFiles: 4
ExaminedFiles: 347
OpenedFiles: 4
AddedFiles: 0
SizeOfModifiedFiles: 877457315
SizeOfAddedFiles: 0
SizeOfExaminedFiles: 914009136
SizeOfOpenedFiles: 877457343

and

Duplicacy (remembering: only one letter changed):

Files: 345 total, 892,586K bytes; 4 new, 856,891K bytes
File chunks: 178 total, 922,605K bytes; 26 new, 176,002K bytes, 124,391K bytes uploaded
Metadata chunks: 3 total, 86K bytes; 3 new, 86K bytes, 46K bytes uploaded
All chunks: 181 total, 922,692K bytes; 29 new, 176,088K bytes, 124,437K bytes uploaded
Total running time: 00:22:32

In the end, the space used in the backend (contemplating the 3 versions, of course) was:

Duplicati: 696 Mb
Duplicacy: 1,117 Mb

That is, with these few (tiny) changes Duplicati added 24 Mb to the backend and Duplicacy 425 Mb.

Only problem: even with a backup so simple and small, in the second and third execution Duplicati showed me a “warning”, but I checked the log and:

Warnings: []
Errors: []

It seems to me a behavior already known from Duplicati (this kind of “false-positives”). What worries me is to ignore the warnings and fail to see a real warning.

Now I’m here evaluating the technical reason for such a big difference, thinking about how Duplicati and Duplicacy backups are structured. Any suggestion?

I agree, it is bad to report warnings when none are shown.

The current logging system is a bit split with 3 or 4 different log systems.
Most of these end up in the same place, but there must be at least one case where the warning is hidden from the output.
I hope to rewrite it and use a single logging system, so there is one place where log data is viewed.

I don’t have any ideas here. The algorithm in Duplicati is very simple: fixed offset with fixed size chunks. Any kind of dynamic algorithm should be able to re-use more data than Duplicati (at speed expense perhaps).

I see no reason that Duplicacy should not be able to achieve the same upload size.

Best guess is it has to do with compression being more aggressive with Duplicati.

For anyone interested, I’m continuing this test with other parameters (chunk size) and putting the results in this topic.

1 Like

Interestingly, @mdwyer commented on this topic in much the same way I did: “When it’s working, great, but when a problem occurs, it’s a nightmare.”

Well, this is probably my final test with the Evernote folder, and some results were strange, and i really need your help to understand what happened at the end…

Some notes:

Upload and time values were obtained from the log of each software.

Duplicati:
bytes uploaded = “SizeOfAddedFiles”
upload time = “Duration”
Duplicacy:
bytes uploaded = “bytes uploaded” in log line “All chunks”
upload time = “Total running time”

Rclone was used to obtain the total size of each remote backend.

In the first three days I used Evernote normally (adding a few notes a day, a few kb), the result was as expected:

graph01

graph02

graph03

BUT, this is the first point I didn’t understand:

1) How does the total size of Duplicacy backend grows faster if daily uploading is smaller (graph 1 x graph 2)?

Then on 26th I decided to organize my tags in Evernote (something I already wanted to do). So I standardized the nomenclature, deleted obsolete tags, rearranged, etc. That is, I didn’ t add anything (bytes), but probably all the notes were affected.

And the effect of this was:

graph04

graph05

graph06

That is, something similar to the first few days, just greater.

Then, on day 27, I ran an internal Evernote command to “optimize” the database (rebuild the search indexes, etc.) and the result (disastrous in terms of backup) was:

(and at the end there are the other points for which I would like your help to understand)

graph07

graph08

graph09

2) How could Duplicati upload have been so small (57844) if it took almost 3 hours?

3) Should I also consider the “SizeOfModifiedFiles” Duplicati log variable?

4) If the last Duplicati upload was so small, why the total size of the remote has grown so much?

5) Why did Duplicacy last upload take so long? Was it Dropbox’s fault?

I would like to understand all these doubts because this was a test to evaluate the backup of large files with minor daily modifications, and the real objective is to identify the best way to backup a set of 20Gb folders with mbox files, some with several Gb , others with only a few kb.

I look forward to hearing from you all (especially @kenkendk).

(P.S.: Also posted on the Duplicacy forum)

2 Likes

Nice to know, but not easy to find. Not good. Documentation needs to be out front, so to speak, to speed use of the program. Just my opinion. (Elsewhere, I have proposed a use wiki for this very purpose and reason.

As usual, great work on the research and numbers!

Normally I’d say I suspect Duplicati’s longer times are due to the encryption and compression, but I assume you “equalized” those with Duplicacy… (Sorry if you said so in your post - I’m just not seeing it).

My guess on the higher Duplicati bandwidth usage (was that really JUST uploads?) is due to compacting (download, extract, re-compress, upload) that I don’t believe Duplicacie’s design “needs”. I think yYou could test that by turning off Duplicati’s auto-compact.