Speed-up database rebuild


#1

Some of the issues of long database rebuilds are probably linked to an issue where compacting does not (or did not) create the correct dindex files. A way to avoid long database recreate times at moments you need them, is described here.

I have noticed sometimes a lot of dblock files needed to be downloaded in order to recreate the database. This is because the “list”-folder seems to be missing in some dindex files.

Note: this should be done on a completely consistent backup set. Be careful with what you are doing, and take backups of everything (local database, remote backup) before you do this…

Prerequisites:

  • No guarantees!! Below actions might completely break your backup
  • I used rclone in order to fetch the remote dindex files and delete the ones that are not correctly created. rclone can be downloaded here: rclone.org.
  • the script that checks the files is a bash file, so linux is needed
  • configure rclone and update below script so that the “remote_repo” variable links to your remote folder.
  • execute the following script. It assumes gpg encryption (not the default) but can be adjusted for aes encryption. This will download all dindex files, and check if they are correct.
    #/bin/bash
    remote_repo="repo:path_to_your_backup"
    rm result.txt
    rm rclone.txt
    rm dindex -rf

    rclone sync ${remote_repo} ./dindex/ --include=*dindex*

    find dindex -name '*.zip.gpg' -print0 |
        while IFS= read -r -d $'\0' line; do 
    	gpg --output "${line%.gpg}" --decrypt "$line" 
        unzip -d "${line%.zip.gpg}" "${line%.gpg}" 
    	if [ -d "${line%.zip.gpg}/list" ]; then
    		printf '%s %s\n' ${line} keep >> result.txt
    	else
    		printf '%s %s\n' ${line} remove >> result.txt
    		printf 'rclone delete %s/%s\n' ${remote_repo} ${line##*/} >> rclone.txt
    		rclone delete ${remote_repo}/${line##*/}
    	fi
    	rm "${line%.zip.gpg}" -rf
    	rm "${line%.gpg}"
        done
    rm dindex -rf
  • the rclone delete statements have been logged to rclone.txt (and have been executed)
  • delete your local database (or: move to a new database)
  • recreate the database. This will download dindex files and the dblock files that are no longer being referred to by any dindex file.
  • run a backup. This will recreate the dindex files. Note: this will take a long time depending on the number of files that have to be recreated.
  • your next db recreate should take less time.

#2

Thanks for sharing that!

I’m curious, in the 3rd from the last step, why are you downloading (instead of deleting) unreferenced dindex and dblock files?


#3

The script deletes incorrect dindex files, but the repair process knows some data is still missing after downloading the index files. It will start looking in the (remaining, unreferenced) dblock files (in the logs you will see it starts “probing” the dblock files. ie. downloading them untill it has all the data).


#4

Got it, so this is checking for any INCORRECTLY unreferenced data so it can add it back. Cool!


#5

My Duplicati isn’t in the mood for rebuilding index files. When yours get rebuilt, what’s in the list folder?
Having no list folder seems to be the normal thing if there are no blocklists around for multiple-block files.

By the way, I put a technical explanation for one all-dblock download cause into one of your articles here. Possibly your workaround fixes a VolumeID of -1 somehow? You’d need to look for it in Block table to see.


Duplicati database rebuild - 2gb client database for 750gb of backup files
#6

The list folder is created and no longer empty. I do have a backup set now that did create another empty one however. This will need some further checking after my other sets have finished recreating in a week or two.