Recover one files from AWS Glacier

000al000 · September 14, 2022, 6:49pm

I had backup to amazon glacier deep archive.
Backup working file, but now i try to test restore on other pc.
I download all *.dbindex and *.dblist file to local pc.
Try to restore one file - rebuild local database - but need dblock files.
How i know dblock file name, that i need to restore one file?

drwtsn32 · September 16, 2022, 5:26pm

Here is a similar thread from last year: Testing Restore from Galcier DA

Unfortunately I don’t think there’s an easy way to see what dblocks will be needed by a particular restore. It would be a nice feature for use cases like yours, though.

ts678 · September 16, 2022, 7:39pm

It probably has all the file names right here, but it will only give a count (and it might be a big count):

github.com

duplicati/duplicati/blob/666b2281032460254839fdc3b6e1055fdf7ce1db/Duplicati/Library/Main/Operation/RestoreHandler.cs#L385-L389


      
              volumes = database.GetMissingVolumes().ToList();

          

          if (volumes.Count > 0)

          {

              Logging.Log.WriteInformationMessage(LOGTAG, "RemoteFileCount", "{0} remote files are required to restore", volumes.Count);

(you can probably see the count go by at About → Show log → Live → Information) or in a log-file)

The general question would be if you had a large restore, how would you handle a huge list of names?
Although I don’t use Glacier, it seems very awkward to (I think) need special handling and a long delay. There are other services such as Google Cloud that are fast, and only hit you financially in some ways. Low-cost S3 compatible services exist, but may give less, e.g. many are not geographically redundant.

If a developer ever volunteers, it probably needs some thinking through of the whole process including whatever the user has to do (manually?) in AWS somehow, and how to tell Duplicati to pause and wait. There’s also a programmatic Glacier API, I think, but there might be no avoiding some amount of waits.
Archive Retrieval Options suggests that waits can be reduced by paying more. I am not an S3 expert…

000al000 · September 19, 2022, 6:27am

I store *.dbindex and *.dblist in S3 Standart Storage, so Duplicati can read it with easy as from other storage such Google or MS. So theoreticali i have all info about dblock files that i need.
May be it is “test restore mode” so i can read log and undestand files than i need?

ts678 · September 19, 2022, 10:35am

If by “test” you mean a dry-run, I’m not confident it will help but you could certainly try.

Probably so, but are you willing to write some tools to study them, or try that by hand?

Do you know how big this file is? Especially if small, studying database may be easier.
DB Browser for SQLite can help with either manual view or an attempt at an SQL tool.
Look up file (what version?) in File table. Look up its BlocksetID in BlocksetEntry table.
Look up all its BlockID in Block table. Look up those VolumeID in Remotevolume table.
Result from Name column will be the dblock file names that Duplicati will be needing…

Are your files encrypted or already just .zip? If the latter, you can start looking inside.
How the backup process works explains some things, but does not go into the depths.
Using the database is probably quicker, but if you prefer digging in files, it’s possible…

000al000 · September 19, 2022, 11:31am

I try scenario with disaster recovery (only backup on amazon avaiable), no local database.
My files was encrypted.
I used “Duplicati.RecoveryTool.exe download” on other PC to decrypt and save files, but *.index files not downloaded, so i “Duplicati.RecoveryTool.exe index” downloaded files.
I’ve got worikng “Duplicati.RecoveryTool.exe list” - completly list all of my files in backup.
But as i unedstand there is no --dry-run for Duplicati.RecoveryTool.exe

ts678 · September 19, 2022, 11:50am

RecoveryTool doesn’t use them, which is one reason why it won’t care at all if they’re corrupted.
What did download and decrypt? I’d have expected dblock files, but they are trapped in Glacier.

What was that then? The encrypted *dlist*.aes and *dindex*.aes downloaded some other way?
They can be decrypted using various available tools, but it’s not automatic like in RecoveryTool.

000al000 · September 19, 2022, 12:20pm

I downloaded a couple of dblock (download is non free operations) just for testing.

Yes, i downloaded it with AWS S3 console (download from deep archive is process from request, waiting 18 hours, and then dwonload them).

I used official Duplicati disaser recovery from local datastore with dlist.aes and dindex.aes and some (about 15) of dblock files

BTW i try to search decrypt tools for Win, but not found them.

ts678 · September 19, 2022, 12:41pm

CLI tool SharpAESCrypt.exe is in your Duplicati installation folder.
AES Crypt has GUI, but I think it has a limit of how many at once.

Do you actually have one file you need now, or are you preparing?
Large files or all files would be done differently than one small file.

You can recreate the database if all dlist and dindex files are good.
If some data is missing, the DB recreate will have to read dblocks.

Then what “test restore mode” were you speaking of? Just hoping?
Telling Glacier what files you want is going to be very cumbersome.

EDIT:

For a tiny file (smaller than your blocksize which defaults to 100KiB), it’s fairly easy to answer.
Get the filelist.json, look at it in a text editor, find your file, look for hash and metahash values,
look those up in index.txt to get needed file names. This gets harder for files bigger than that.
Processing a large file in the article I cited shows how an “indirection block” is used. They are
stored in the dblock file, but by default there is also a copy in the dindex file which saves time.

Here’s the “fairly easy” example:

From filelist.json

{“type”:“File”,“path”:“C:\backup source\short.txt”,“hash”:“6ECk9xP7UfzNyw1P4SQTL941SBkxWzJqHxxHDE9MsEQ=”,“size”:54,“time”:“20220908T002325Z”,“metahash”:“1uS1p1eGUAclUyhR1h9gYsp2V5nVhWuDkZ9s98qI3ys=”,“metasize”:137}

From index.txt

1uS1p1eGUAclUyhR1h9gYsp2V5nVhWuDkZ9s98qI3ys=, duplicati-b9aaf7cfce95e4ea2a70f6dd2fae52c4e.dblock.zip
6ECk9xP7UfzNyw1P4SQTL941SBkxWzJqHxxHDE9MsEQ=, duplicati-b9aaf7cfce95e4ea2a70f6dd2fae52c4e.dblock.zip
manifest, duplicati-b9aaf7cfce95e4ea2a70f6dd2fae52c4e.dblock.zip

000al000 · September 19, 2022, 1:03pm

Thank you for tools for decrypting.

Lets summary my case : recovery some files (one or more) from deep archive on other PC with minimal cost. Minimal cost = minimal dblock downloading.
dindex and dlist - free of charge, so i’ve got them all.

I try to restore one randomize file (for test situationt that i need restore some files).

thank you, i’ll try it

P.S. i restored some files from my downloaded a couple dblock with disaster recovery instruction, but i need to know how restore specific file/files.
P.S.S. AWS deep archive is “backup of backup” for me, if my base backup server will lost or destroyed.

ts678 · September 19, 2022, 3:31pm

RecoveryTool is maximal dblock downloading because there are no dindex files to give dblock content.
Regular Duplicati CLI or GUI is minimal, except for database recreate when it needs to go searching…

Regular Duplicati CLI or GUI is very easy. RecoveryTool documentation shows it can do only --exclude.
I did attempt an --include even though it’s not documented. It works sometimes but didn’t seem to here.

Do you do regular expressions well? I think one can do patterns to match everything except those files.

Since they’re already on a drive, you can see how healthy DB recreation is (for now – future may differ) using something like Direct restore from backup files to see if it can build its partial temporary database without resorting to dblock downloads which will fail because those files are not on the drive. It’s a test.

Alternatively (an even better test in a way), make a dummy job in Duplicati that never does backup but exists solely to see if database recreate from drive works. Without existing DB, that uses Repair button.
Open the DB and inspect as described, or use SQL query. I have a draft, but don’t really do much SQL.

This seems by far to be the easy way as long as DB recreate works. If not, the options aren’t very good.
Unless you are seeing some tool that’s giving what look like relevant dblock names, it’s lots of digging…

The index.txt file looks like it keeps the indirection block (known as a blocklist) in its original form, which identifies the blocks of their file by their block hash. So one needs to get the blocklist, then go through it.

github.com

duplicati/duplicati/blob/666b2281032460254839fdc3b6e1055fdf7ce1db/Duplicati/CommandLine/RecoveryTool/Restore.cs#L376-L381


      
          public IEnumerable<string> ReadBlocklistHashes(string hash)

          {

              var bytes = ReadHash(hash);

              for (var i = 0; i < bytes.Length; i += m_hashsize)

                  yield return Convert.ToBase64String(bytes, i, m_hashsize);

          }

Another option is to see if you can alter the Independent restore program to help here. It’s in Python.

EDIT:

drwtsn32 · September 19, 2022, 6:36pm

--dry-run didn’t work for me when I tested it in that other thread I linked. Duplicati still downloads dblocks in a dry run.

ts678 · September 19, 2022, 7:07pm

github.com

duplicati/duplicati/blob/666b2281032460254839fdc3b6e1055fdf7ce1db/Tools/Commandline/RestoreFromPython/restore_from_python.py#L94-L111


      
          # write to file
          with open(outPath, 'wb') as f:
              if 'blocklists' not in listEntry or not listEntry['blocklists']:
                  # small files store data in one block
                  data = getContentBlock(d, dbopts, listEntry['hash'])
                  f.write(data)
              else:
                  # large files point to a list of blockids, each of which points
                  # to another list of blockids
                  for blhi, blh in enumerate(listEntry['blocklists']):
                      blockhashoffset = blhi * opts['hashes-per-block'] * opts['blocksize']
                      binaryHashes = getContentBlock(d, dbopts, blh)
                      for bi, start in enumerate(range(0, len(binaryHashes), opts['hash-size'])):
                          thehash = binaryHashes[start: start + opts['hash-size']]
                          thehash = base64.b64encode(thehash)
                          data = getContentBlock(d, dbopts, thehash)
                          f.seek(blockhashoffset + bi * opts['blocksize'])
                          f.write(data)

is a (to me) more readable version of a restore algorithm, where you can see that the challenge of the blocklist indirection is that one needs to get the blocklist which is in the dindex file list folder and the dblock file. The two emergency-restore programs seem to not use the dindex. Regular Duplicati does.

Arguably, doing an occasional database recreate is good practice for disaster recovery, as (especially when properly prepared with an Export To File of the job) it lets you more easily get back in operation.

Telling whether DB recreate needed to download dblock file is a little harder. Visually, it’s the last 30% range on the GUI progress bar (last 10% is the slow last-ditch attempt at finding some missing block). Logging and other messages can also reveal it, but if scripting you’d need a way to scan the output…

If you happen to be good at scripting, you could probably write something that would do a no-DB way.