My NAS Is in a Pickle: Recreating DB Stuck for 2 Days. Wait or Cancel?

fadenoir · January 14, 2024, 1:16pm

Hey tech gurus and digital superheroes!

So, picture this: I’m knee-deep in disaster recovery mode, battling a custom NAS meltdown with a dodgy HDD. The hero in shining armor? The new Synology 423+!

Now, the plot twist: my “Recreating database” quest has been on an epic week-long journey. On Jan 12th (2 days ago), it hit a snag at 62%, and since then, it’s been radio silent - no new logs, no progress, nada.

The dilemma? Do I wait for the digital deities to work their magic or hit the cancel button for a reboot?

Here’s the lowdown: both instances are rocking Duplicati in docker, dancing with B2 for storing my precious 2.5 TB of data (B2 says so!). Backups have been the bi-weekly ritual for the past three years. The Duplicati maestro on the new Synology is flexing version 2.0.7.1_beta_2023-05-25.

Settings, you ask? Hold your breath:

asynchronous-concurrent-upload-limit: 10
concurrency-max-threads: 0
number-of-retries: 15

Let’s peek behind the digital curtain with a snapshot of my system’s secret sauce:

lastEventId : 5951
lastDataUpdateId : 18
lastNotificationUpdateId : 0
estimatedPauseEnd : 0001-01-01T00:00:00
activeTask : {"Item1":6,"Item2":"e4f93b91-db68-44f2-9b0d-08f55f62cb6b"}
programState : Running
lastErrorMessage :
connectionState : connected
xsfrerror : false
connectionAttemptTimer : 0
failedConnectionAttempts : 0
lastPgEvent : {"BackupID":"e4f93b91-db68-44f2-9b0d-08f55f62cb6b","TaskID":6,"BackendAction":"Get","BackendPath":"duplicati-id6bcf7315f654aeb9c073ca4cfc4b1f0.dindex.zip.aes","BackendFileSize":37533,"BackendFileProgress":0,"BackendSpeed":-1,"BackendIsBlocking":false,"CurrentFilename":null,"CurrentFilesize":0,"CurrentFileoffset":0,"CurrentFilecomplete":false,"Phase":"Recreate_Running","OverallProgress":0.620590448,"ProcessedFileCount":0,"ProcessedFileSize":0,"TotalFileCount":0,"TotalFileSize":0,"StillCounting":false}
updaterState : Waiting
updatedVersion :
updateReady : false
updateDownloadProgress : 0
proposedSchedule : []
schedulerQueueIds : []
pauseTimeRemain : 0

And for the grand finale, the last 4 Logs (Profiling):

12 sty 2024 16:34: ExecuteScalarInt64: SELECT "ID" FROM "Remotevolume" WHERE "Name" = "duplicati-id6bc321cb3d4419a8beddf3c303bf812.dindex.zip.aes" took 0:00:00:00.000
12 sty 2024 16:34: Starting - ExecuteScalarInt64: SELECT "ID" FROM "Remotevolume" WHERE "Name" = "duplicati-id6bc321cb3d4419a8beddf3c303bf812.dindex.zip.aes"
12 sty 2024 16:34: ExecuteScalarInt64: SELECT "VolumeID" FROM "Block" WHERE "Hash" = "b2I6TJdW/8pEsjYv7TpvQxJwu9tETUs8I7vy4fc/Wnk=" AND "Size" = 102400 took 0:00:00:00.000
12 sty 2024 16:34: Starting - ExecuteScalarInt64: SELECT "VolumeID" FROM "Block" WHERE "Hash" = "b2I6TJdW/8pEsjYv7TpvQxJwu9tETUs8I7vy4fc/Wnk=" AND "Size" = 102400

So, dear tech savants, wizards, and coding sorcerers, I beckon thee! Should I hold out for the digital dawn or slam the cancel button for a fresh start? Your insights could be the magic spell my NAS desperately needs!

Drop your thoughts, hacks, or mystical incantations below. Let’s banish this tech turmoil together! #TechRescue #DigitalSOS #NASNightmare

ts678 · January 14, 2024, 2:38pm

So you lost the old database? If not, you could move it.

Synology 423+, first time NAS user says it’s more heroic with more memory, 18 GB up from 2.
Synology DS423+ NAS Review seems to be giving a different answer on the amount possible.

Regardless, what’s the memory and CPU use there? Little memory likely turns into high swap.
Synology tools are unknown to me, but if there’s a top command, that’s one way to check use.

Whose image? Are you good at doing CLI things using whatever tools the image-maker put in?

Is that saying they both have backups, are you trying restore to initialize the new NAS, or what?

ts678 · January 14, 2024, 2:56pm

You should have increased blocksize for that, as the default performs well only up to about 100 GB. There’s an unprocessed pull request to increase the default, and nobody knows how to auto-adjust.
One proposal is to let the user estimate it then make a guided decision near Remote volume size.

If you have tools to monitor disk use, they might help, but disk use would likely also show CPU use.

gpatel-fr · January 14, 2024, 3:39pm

Hello

You can try to use latest Canary (2.0.7.100), it has a fix to accelerate db rebuilding in case of very damaged backends.

ts678 · January 14, 2024, 5:51pm

Resource Monitor might be one thing your NAS has. I use neither Synology nor Docker, but I also see

Runtime metrics in Docker docs is talking about using docker stats for examining container resources.

These are probably competing at some priority setting with everything else happening on NAS though.

Anther thing you can look at are the size and timestamp on the Duplicati database (on host, I assume).

Going out of the live log and back in might also be useful just in case lack of updates is just a log hang.

Depending on what other hardware you have, a recreate can be done elsewhere and moved onto NAS.

fadenoir · January 14, 2024, 7:22pm

Thank you for getting back so fast!

Fortunately DB is on a different volume (not the one that’s damaged) and yes, I can access it. I assumed though, that using the same DB on a different machine with different volumes, different file systems might just break things. Am I wrong? Can I just copy old DB to my new Synology and restore backup there? How about different file paths etc.?

Now there is very, very little usage of both CPU and memory in 1-3% range for the whole system. When I started the process it was ~5 MB/s volume write and around 15-20% CPU use with very little memory usage. BTW I’ve upgraded Synology 423+, so it has 6 GB RAM.

That’s the image I used on both instances:

I’m not an expert, but I can use CLI if needed.

I had an old NAS that I intended to replace with a new Synology. While transferring files from the old NAS to the new one, an error occurred, leading to corruption in the file system of the volume containing all my stored data on the old NAS. As a solution, I opted to use Duplicati, which was set up with B2 on my old machine, to restore the missing files on my new Synology and complete the migration. Hope it explains the entire quest

Well, yes, I’ve learned this already browsing this forum post-factum. As I understood, there is no way I can change the blocksize now, when the backup is already created, right?

I have 2.0.7.1_beta_2023-05-25 version. I guess it should contain the fix you’ve mentioned, doesn’t it?

Sounds like a good idea. Could you help me to understand how to do it? Where can I find Duplicati DB? Is it as simple as checking stats of the file containing it?

fadenoir · January 14, 2024, 7:52pm

I checked duplicati container stats. CPU is in 0.1-0.2% usage. Memory usage: 1.2 GB / 6 GB. Very little network/disk usage.

I’ve found Duplicati-server.sqlite file. Last modification date is Jan, 10th. Last access: Jan, 14th. As a reminder last log was dated Jan, 12th.

Is it dead? Are there any other logs than these in the web interface that I could check on what’s going on?

ts678 · January 14, 2024, 8:05pm

If you decided to change the paths, you can use “Pick location” to drop folders where they now belong.

You’re probably avoiding the OS difference pain with moving a database with intent to continue backup, because Docker suggests Linux, which means no drive letter in front, and any later slashes are forward.

User names ideally are like the prior ones (or confusion may occur if you ask to restore them), but that’s true even if the recreate was cooperating. If you try this, try a small restore first to verify it’s as desired…

Generally correct, unless it’s bad enough and you’re ambitious enough to try an experimental approach.

The fix (if I understand it correctly) kicks in if it has finished the dindex files but data is still being sought. This is at 70% on the progress bar. You’re at 62% there, but might have measured it less than perfectly, however hearing your low CPU, I kind of wonder. It should also be logging as too much time doing SQL.

It’s usually considered a single-file database, but there can be an associated file with -journal suffix.
Since this is LinuxServer, the directions you link show how /config in Docker maps somewhere else.

That’s the server config file with the job definition and the statistics. It’s not going to be changing much.
The per-job database name starts with random alphabetic characters and is typically in the same folder. “Direct restore from backup files” uses temporary database. I “think” recreate by repair uses the usual.

gpatel-fr · January 14, 2024, 8:07pm

No, 2.0.7.100 was published 3 weeks ago.

ts678 · January 14, 2024, 8:39pm

You can certainly set them up for a retry, but there’s not much besides live log to turn on midway.
That would be log-file=<path> log-file-log-level=<something>. Profiling is informative but very big.
Verbose might be a good level, as it gives the counts of the files it will get, while it’s getting them.

You already found About → System info → Server state properties, and it might look concerning.

lastPgEvent : {“BackupID”:“e4f93b91-db68-44f2-9b0d-08f55f62cb6b”,“TaskID”:6,“BackendAction”:“Get”,“BackendPath”:“duplicati-id6bcf7315f654aeb9c073ca4cfc4b1f0.dindex.zip.aes”,“BackendFileSize”:37533,“BackendFileProgress”:0,“BackendSpeed”:-1,“BackendIsBlocking”:false,“CurrentFilename”:null,“CurrentFilesize”:0,“CurrentFileoffset”:0,“CurrentFilecomplete”:false,“Phase”:“Recreate_Running”,“OverallProgress”:0.620590448,“ProcessedFileCount”:0,“ProcessedFileSize”:0,“TotalFileCount”:0,“TotalFileSize”:0,“StillCounting”:false}

Compare to one I took midway:

lastPgEvent : {“BackupID”:“8”,“TaskID”:10,“BackendAction”:“Get”,“BackendPath”:“duplicati-i3c3994a3743140a18a40c5e48b7db8af.dindex.zip.aes”,“BackendFileSize”:5325,“BackendFileProgress”:5325,“BackendSpeed”:14183,“BackendIsBlocking”:false,“CurrentFilename”:null,“CurrentFilesize”:0,“CurrentFileoffset”:0,“CurrentFilecomplete”:false,“Phase”:“Recreate_Running”,“OverallProgress”:0.301351368,“ProcessedFileCount”:0,“ProcessedFileSize”:0,“TotalFileCount”:0,“TotalFileSize”:0,“StillCounting”:false}

Yours seems to not have a BackendFileProgress and BackendSpeed somehow, so if there were a log view that wasn’t just the last 4 lines, maybe one would see a download starting but then not completing.

I suppose you can try <job> → Show log → Remote, however at least on Windows, database is locked. Restarting things can fix that, but transaction rollback might then delete the log inside DB that we want.

EDIT 1:

My live log got unreliable (which is why I asked you to start yours again), but when going, it showed me transition from dlist files to dindex files. After below, I tested the stop button in status bar, and it did stop.

It pops up a yellow box about a warning, but I think neither the job log or server log have the details. Database rollback seems to have erased the remote operations from the log, so that’s what there is.

The only other ways I know of to examine current state are rather extreme low-level sort of things…

EDIT 2:

… except for things already suggested, like find your job database and see what the timestamp has.
There’s a small chance that Linux (which is more permissive) might even let you copy the database (along with any associated journal file) and let you look at them with sqlitebrowser to look at state.

EDIT 3:

You use of the stop button might go worse than mine, as mine was not stuck, and yours might be… Regardless, there are a number of ways to shut things down more severely if it comes down to that.

EDIT 4:

If you have lsof command you could try lsof -c mono, mostly to see what TCP connections it has, although possibly some unexpected open database connections will also appear in the big listing…

gpatel-fr · January 14, 2024, 11:23pm

Reading again your posts, it may have stuck at the network level. That’s not a given, but that’s a possibility.
If you retry (as it seems that you may have to), first update to last Canary as advised, and before starting the restore, set the option http-operation-timeout to something like, maybe 2 minutes ? if you have fiber for your network connection to B2 it should be quite large enough.
I never use restore from files, I always first define a job where it’s easy to set options and check connection, and then restore from job.

ts678 · January 15, 2024, 4:47pm

In the “stuck on Compact” post just above that links here, we’re discussing timeout strategy.

Regardless, the lsof test will show whether there are any connections that need the timeout.

If you don’t have that command, netstat command might also find the potential connections.
Some small systems don’t have the full command. If need be, maybe we study /proc (ugh).

Even worse is to give mono its documented-as-“last-resort” signal to give us its stack traces.
On a normal Linux, this seems to wind up in its systemd log, but Docker is usually simplified.

fadenoir · January 15, 2024, 8:09pm

Again, thank you all for the tips!

I’ll cancel the current job, clean it up, and try again. What’s the best way moving forward?

For me, it seems like I should move the database from the old NAS to the new Synology and hopefully skip the recreation part altogether. Which version of Duplicati should I install on the target machine? Canary, as you all advised, or better to be on the safe side and install exactly the same version as I had on my old NAS (it’s from 3 years ago)?

As for the settings, I’ll add:

http-operation-timeout to prevent network issues
log-file and log-file-log-level to see what’s going on

Any other settings that may help reduce the risk of failure?

Finally, how to migrate the database? Is it as simple as replacing one DB file with another? Do you know where it is located?

ts678 · January 16, 2024, 12:25am

You already have 2.0.7.1 which is the latest Beta. I noticed Server state properties saying

so maybe that’s what the progress bar uses, and it’s still processing dindex as other logs show.
Getting on Canary channel could be done if needed, but perhaps current Beta will work as well.

This could potentially be a random networking glitch interacting with infinite download timeout…
The lsof, netstat, etc. could tell us more, but if they’re not possible, then I guess we lack info.

When you copy the database, keep the old one. As the new Duplicati is newer, it may update it.
Did you ever find your job database on new system? Random file name is different than old DB.

Regardless, either rearrange things so old database is copied to new name, or change settings.
Database screen will let you modify the path to wherever database is, if you prefer its old name.

and I don’t know where you put it on the host. Didn’t you have to set this up manually on container?

After you find it and copy it, you can do a quick sanity test with Verify files, then a small restore.