Because of the deduplication, this is a challenge.
Great request…please let us know what those numbers are.
Because of the deduplication, this is a challenge.
Great request…please let us know what those numbers are.
Find large files in backup? and Visualize backup data usage have some ideas for identifying things to trim, however, as has been mentioned, the deduplication (also compression and versions) make it challenging. Please also note the space might not be immediately reclaimed, but eventually a compact will free space.
Ok. I will consider that if needed.
Source : 1013.31 GB
Sauvegarde : 1.42 TB / 478 Versions
Is there log i can consult to see each time which files or backuped ? Because i have a lot of “static data”, it could help me to identify which files or often backup…
If i uncheck files or folders in source selection, should i run then purge and/or compact ?
Keep in mind that even changed files may only generate small amounts of change data that’s backed up:
Duplicati performs a full backup initially. Afterwards, Duplicati updates the initial backup by adding the changed data only. That means, if only tiny parts of a huge file have changed, only those tiny parts are added to the backup. This saves time and space and the backup size usually grows slowly.
Feature request: List all backup sets with their size shows how to find added, modified, and deleted files.
If you’re willing to dig details out of a general-pupose log, you can set –log-file, –log-file-log-level=verbose (and if you’d rather pre-filter you might be able to use –log-file-log-filter to catch only message like below):
2019-09-10 17:45:15 -04 - [Verbose-Duplicati.Library.Main.Operation.Backup.FilePreFilterProcess.FileEntry-CheckFileForChanges]: Checking file for changes C:\PortableApps\Notepad++Portable\App\Notepad++\backup\webpages.txt@2019-09-06_154119, new: False, timestamp changed: True, size changed: True, metadatachanged: True, 9/8/2019 12:12:06 PM vs 9/6/2019 7:58:56 PM
where any of the tests turning up True means there is the potential for some data about it being uploaded.
Unchecking affects only new backups. You can either wait for older ones to age off, or purge data yourself. Space from deletions is recycled by compact, which by default is automatic, but can be forced if you wish. Depends on how much of a hurry you’re in. Living with an almost-out-of-space condition can get awkward.
Ok, i tried adding this advanced option (log-file-log-level=verbose) to my backup config; i will check the log after next backup. If i understand, the log is set to “warning” even if the advanced option isn’t added ?
Only if i set a retention setting other than “Keep all”, am i right ?
I’m a little confuse with the difference between Compact and Purge but if i undesrtand; if i uncheck files or folders in source selection that i’m sure i don’t want anymore : i have to run “purge” ?
Yes i agree, but before to change my destination disk with a bigger, i want to be sure it is worth. In fact, when i created my first backup config, i had a lot of space in destination disk, so after selecting my “important” files, i added “less important static files” and then “program files” like database…
Now my disk is soon full, i would like to delete bigger “less important” files and check what space use “program files” (not big files but maybe they change very often and take to muche place with all versions).
Not sure of the question. Without any advanced options there is no log file. You have to give it the path.
What you say is correct if you mean the --log-file-log-level default so this means you need the verbose.
Right. Depends on what you want. If it helps, you might have it thin out versions to avoid filling up again, however any file that exists only between versions that something like custom retention policy kept are unavailable for restore because no remaining version got it. There are lots of retention options possible.
You have to if you want an immediate deletion. You don’t if you want slow deletion as new backups omit it and old backups age away. When the last version having the file is deleted by retention policy, file is gone.
You can easily have File Explorer look for changes if you like. If the only thing there is programs that could easily be reinstalled somehow, there’s not much point backing it up at all. You want to backup YOUR data. You’re correct that in some cases versions build up. If I put
datemodified:this month in the search box then look at results, I see the VirtualBox update I just installed. If I switch view to Details and select all files then right-click and get Properties, I see it’s got about a GB of data. Longer period would certainly be more.
[quote=“ts678, post:10, topic:8002”]
Not sure of the question. Without any advanced options there is no log file[/quote]
Ok, it was my question. I thought the log was the “history” listing but so, it is different.
I didn’t. Ok i think i understand : i have to add the 2 advanced options :
log-file with a path in value and log-file-log-level=verbose. Ok
Ok, but with my retention setting “Keep all”, nothing will be deleted if i just uncheck files or folders in source selection, am i right ? And so, i have to “purge”.
I agree, it isn’t program files not useful, it is program “data” files i don’t want to lose, but i think some “program data folders” contains temp files; i have to check that but it isn’t easy
So the better solution should be to check changes on the source side; if i understand it’s more easy than on backup side; ok, i will try.
Thanks for all your reply, i keep you informed
Correct, unless you want to go the hard route to identify and hand-delete all versions that saw those files.
I made a lot of analyse stuff.
I used a windows storage analyse application (Wiztree, a “windirstat” like) for each item selected in “Source” tab of my backup config.
I used the “file” tab with “folder” option ticked and sorted by modification date. With that i can analyse if files or folders or modified and how often.
So, this is my results and items which can potentially increase my backup size, with the one i suspect the most in first :
C:\Users\mbmc\Evernote\Databases\nonobio.exb (7.2 GB, i use a lot this app and this file is modified several times per day !)
C:\Users\mbmc\AppData\Roaming\Kodi\userdata\Thumbnails\ (a lot of subfolders with more than 100 MB each; seems to change every time i start Kodi)
C:\Users\mbmc\AppData\Roaming\Kodi\userdata\Database\Addons27.db (4 MB each time i start Kodi)
C:\Users\mbmc\AppData\Roaming\Kodi\userdata\Database\Epg12.db (12MB each time i start Kodi)
C:\Users\mbmc\AppData\Roaming\Kodi\userdata\Database\Textures13.db (8 MB each time i start Kodi)
C:\Users\mbmc\AppData\Roaming\MediaMonkey\MM.DB (47 Mo) : an app in permanent use. The file isn’t very big but seems to change each time i listen to a song.
After reflexion, i think it could be a good solution to separate these backups which i don’t need a “keep all” retention, from my personal data i would like to keep “keep all” retention.
If im’ right : i can’t set different retention settings for different items in the same backup job ?
If yes, the solution is to create another backup job and so have 2 jobs, for instance :
First for my personal important data with “Keep all retention”
Second for my programdata files and databases with “Keep… for instance… 6 months”
Is it a good way ?
If yes (again), can i shedule these 2 jobs at the same time ? In others terms : will Duplicati be smart and manage 2 backups at the same time without risk of conflict or too high ressources taken ?
For now, i added some “Filters” : “Exclude folder” to my actual backup job. Does it produce the same result than unticking items in source list ? I mean : will these excluded items be deleted after a purge command ?
By the way, i tried to start a “purge” command line from web gui. I think make a mistake because it finished instantly with this message : "You cannot combine filters and paths on the commandline
Return code: 200"
Thanks, thanks, and thanks again
If two jobs are scheduled to start at the same time (or if any job is scheduled to start when another job is running already), the backup operations are queued to run sequentially. So you don’t have to worry about conflicts or hammering system performance.
Also: things like your Kodi data files - I’d suggest you carefully analyze exactly what you’d need (if anything) to rebuild your system after a crash. I run a Plex server on my PC and store only the bare minimum to back up the “watched / not watched” statistics for any show/movie; anything else, I could just as easily reconfigure after a reinstall.
In my opinion the first thing you should change on your main backup set is to enable custom retention instead of “keep all” - your source / backup numbers indicate that you have as much as 400 GB wasted in old versions, and after you set a custom retention, it will immediately go through your old versions and delete unneeded versions, freeing up storage space (without needing to do advanced “purge” or “compact” operations manually).
Ok, it’s perfect.
Yes, Kodi could also be reconfigured quite easily if needed, but even if it is easy, it can take a long time to reconfigure all applications, it’s why i backup my program data of main applications. That said, as shown in above post, some files changed often and doesn’t seem to be needed in cas of reinstallation (thumbs will be recreated automaticly if i’m right). It isn’t always easy to know what backup; even on Kodi Wiki it is recommanded to backup all the userdata…
For my personal important data (photos and videos), i’m always affraid to lose data (video,photo album, etc.) and seeing it to late : for instance several month or years after the losing date. “Oh no, it’s out of my retention scope” . It’s why i choose “Keep forever” when it is possible…, for this kind of data.
But for my others data less important; programdata files essentially, i don’t need to keep all versions. Often i will see that my program lost settings quickly, and if eventually i lose something, it won’t be too grave.
It’s why i think this data doesn’t need a “keep all” retention setting, but maybe a “keep 6 month” one.
So i think it can be the good “first” thing to do with my issue : separate my unique job in two jobs, isn’t ?
That’s correct. You’d need different jobs, and I can’t comment much on what you want to put in each, but everybody is suggesting custom retention, and I’m not seeing any sign that you’re considering it. You can keep a version for unlimited time without keeping ALL versions. It’s a progressive thinning, announced as:
New retention policy deletes old backups in a smart way and because manual hasn’t caught up, done as:
I think it’s the same as unticking, meaning it’s removed from source data on new backups but not purged from old backups. To do that you need to purge manually. You may face some challenges on splitting the existing backup. While you can purge files, I don’t think there’s a way to move the entire history of the files into a separate backup, and deduplication isn’t done across different backup jobs, so space use goes up.
I consider it, it’s was just out of my reflexion scope . It’s not easy to understand but i start to.
Thanks for your explanation and the link, i read the first post and some following. I also read again earlier post and your caveat. All of this helps me to understand.
So i have to choose :
Keep only one job with custom retention
Keep this job with “Keep all” only for important data and create a second job with custom retention for less important data.
Even if retention settings are more clear for me now, i have to say that the “keep all” setting is more rassuring for me
Do you recommend to me to keep only one job with custom retention, or create a second job ?
Before making choice, i would like to start a purge to see if i win some space after unselecting source data and excluding some things, but as i said :
Do you have an idea of what is causing this message ?
I understood that; i will lose my old versions of less important data if i go this way.
I’d agree with that, plus the manual doesn’t cover it at all, and the help text isn’t quite enough to convey it.
I can’t recommend how you should treat your varying-levels-of-importance-and-update-frequencies data, however I did express space concern earlier on adding the second job while old backups exist in the first.
You can also consider your own restore history – do you ever actually restore? If so, do you need to have precise version choice? Going back for what period? How often do you back up? That limits precision too.
Reassurance is good, but unless you want to increase your storage capability you may need to trim files.
This command can either take a list of filenames or use the filters to choose which files to purge.
You said earlier that you had filters in use, probably of the --exclude variety. If you’re in GUI Commandline, options (including such filters) are automatically put on the screen. If you then modified the screen to do a purge command, you might have typed paths into
Commandline arguments instead of the --include filter, while leaving the --exclude filter, causing the “You .cannot combine filters and paths on the commandline”.
It’s hard to say, because you didn’t say what you did, but if you left an inappropriate–exclude, just delete it. Changes on the Commandline screen don’t cause changes in the saved settings. Tailoring is expected…
Yes i try to think about all of this. In my own history, i ever had a lot of entire photos albums losts and i saw this lost only several month later. I have several backup methods and while one (or two) didn’t work (i don’t remember why), the third restored my files. I also had several times need to restore program settings lost, and i often find a good version from daily versions, not very old (one or two weeks).
If i really need space for important data, i will buy a new HDD (4 TB) and transfer my backup, but before i want to be sure that my backup settings are appropried to my needs. It is what i try to check
Yes i tried to use the Command line in GUI and i just changed “Command” from “backup” to “purge” and didn’t touched anything else; this is my screen before choosing the command" purge" :
So it isn’t as simple as choose “Purge” in “Command” and “Run” ?
Shoudl i just delete each exclude filter i have on this screen ?
I have a doubt on my understanding : if i create a new job for my less important files (programdata files), i understand i can’t transfer backup history of these files from my first job, ok. But if i’m ok to lose this history and than i untick all of this files from first backup configuration then i run “purge” (when i will success to purge ); they will be deleted of the first backup destination, won’t ?
Don’t purge with that config unless you mean to purge all of the listed source folders from the backup.
Purges (removes) files from remote backup data. This command can either take a list of filenames or use the filters to choose which files to purge. The purge process creates new filesets on the remote destination with the purged files removed
That might be the only thing that saved you from a self-inflicted purge disaster. Please study the manual which also says to use –dry-run to see what an operation will do before it’s run? Very important on purge.
Manual gives --include filters as an option for selection. Deleting the list of files and using --exclude filter would seem risky because it deletes everything except what the filter excludes (I think). And just feeding your current -exclude list will make sense only if you want to remove files that you’re currently excluding.
The manual suggests that versions can be given, which would be the –version command I guess, but it reads kind of like the default is all versions. I hope you haven’t deleted your backup already. Be careful.
No i didn’t deleted anything but even after reading your explanations i don’t understand how to delete my unticked and exclude items
Should like enter manually all files i unticked and exclude following the purge command ? It seems hard.
I remember Crashplan when unticking source items : if i’m right it was deleting automaticly after replying “Yes” to the question "Are you sure ? " … or something like that. But maybe i’m wrong, and i know, Duplicati isn’t Crashplan
The now-discontinued CrashPlan Home used to, IIRC, handle deselection and purge as one operation, giving a big warning you needed to click through to confirm that you were sure. In that sense, it was less hard than the Duplicati way. I don’t know if Duplicati’s method meant to give more control, or was just a simpler first plan. There’s a lot in Duplicati that could be enhanced, but that’s to be expected at its stage.
Whatever you unticked, you probably need to copy into the box in a format similar to what’s there, with paths ending in backslash for a folder and (though you don’t show any on your limited screenshot) files without it. Remove lines you don’t want purged, and always use --dry-run to see if it looks as intended. Copying the list of files and folders before you do that will make it easier to run again as the for-real run.
As for your excluded items, if they were always excluded, then there should be nothing to delete. If you added excludes over time (and keep all versions forever) then you might have some copies before the exclude that will take up space forever. You possibly could purge those with a separate purge using the --include to include them in the purge, which is the opposite of using --exclude to exclude from backup.
But I almost never purge, so please use --dry-run to check. Purge is permanent, just like on CrashPlan.
Maybe i can retreive on my notes what i unticked, but i don’t understand how you could see what i unticked on my screenshot : once i unticked items and save my backup config, there isn’t indication on what has been untickicked ?!
Maybe you mean the entire list in “Commandline arguments” which is reduced and match all my source items ? For the rest, my screenshot isn’t limited : you can display all of it by clicking on it.
Some exclude filters are old but some are new (since this discussion).
Yes, i will try with --dry-run I keep you informed.
is a non-specific way of assuming you unticked items (as you said), and saying what to do as next step.
It does not save your entire history of what you did to the backup configuration. CrashPlan doesn’t either.
If you have a previous export of the configuration (which you should for safety), you can look at it either in native form which you can open in notepad or whatever and try to interpret, or import for a look, but don’t save it unchanged or you may have two identical backups clobbering each other in the same destination.
Another way of figuring out what you previously backed up is to start down restore just to inspect the tree.
The FIND command can also be used to show the files in the backup, but it’s a command-line tool again.
Yes. All visible ones ended in backslash.
That gives an unreadably tall and narrow view, which I did download and open in Paint 3D to see excludes.
For example, possibly it’s the reason you’re so carefully picking what to back up. I know I’m a bit careful of cloud storage usage because it’s metered, but the same vendor (Backblaze) has an unlimited backup too, intended to simplify things so people don’t pick-and-choose and possibly omit important things by mistake. One drawback is they only keep deleted data for 30 days. Another grab-everything option might be to get a drive (maybe even your current one if you upgrade) for use as an occasional image backup to get another. Macrium Reflect Free can do a full image and then you can maintain it with smaller differentials if you wish.