Warning in backup due to special characters

tacioandrade · October 25, 2024, 2:41pm

Hello everyone, I’m from Brazil and here we have characters with accents like á, ã, etc., and I’m having problems with some files that have this type of character on our backup server, an Ubuntu Server 20.04, where I centralize some backups and send them to the cloud with Duplicati.

I’m currently experiencing 1900 warnings in my backups due to accent issues:

2024-10-25 01:03:19 -03 - [Warning-Duplicati.Library.Main.Operation.Backup.FileEnumerationProcess-FileAccessError]: Error reported while accessing file: /home/backup/wp-content/uploads/2011/08/Caf�-3-300x225.jpg
FileNotFoundException: Could not find file '/home/backup/wp-content/uploads/2011/08/Caf�-3-300x225.jpg'.

Does anyone know how I could fix these warnings?

PS: In locales I’m using pt_BR.UTF-8 and UTF-8

marceloduplicati · October 25, 2024, 3:35pm

Hi Tacio,
With this you might be able to identify the actual encoding used on the filename.

ls /home/backup/wp-content/uploads/2011/08/Ca*-3-300x225.jpg | hexdump -C

UTF-8 ‘é’ would appear as: c3 a9
ISO-8859-1 ‘é’ would appear as: e9
CP1252 ‘é’ would appear as: e9

If they are not UTF-8 then you might need to fix the encoding on the filenames with linux convmv

Additionally, if you may, please let us know what version of Duplicati you are running.

tacioandrade · October 25, 2024, 9:22pm

Sorry, I forgot to mention the main thing! I’m using 2.0.8.1_beta_2024-05-07 and the output of the hexdump -C command was:

00000000  2f 76 61 72 2f 77 77 77  2f 74 61 62 65 6c 69 6f  
00000010  6e 61 74 6f 66 69 73 63  68 65 72 2e 6e 6f 74 2e  
00000020  62 72 2f 70 75 62 6c 69  63 5f 68 74 6d 6c 2f 62  
00000030  61 63 6b 75 70 2f 77 70  2d 63 6f 6e 74 65 6e 74  
00000040  2f 75 70 6c 6f 61 64 73  2f 32 30 31 31 2f 30 38  
00000050  2f 43 61 66 e9 2d 33 2e  6a 70 67 0a              
0000005c

marceloduplicati · October 27, 2024, 7:13am

So as you can see on the dump around 66 e9 2d , 66 being ‘f’ and 2d ‘-’, the ‘é’ is mapped as E9, which indicates it is a CP1252 encoded filename (maybe created by old Windows?).

You can try:

convmv -f CP1252 -t utf8 filename

This will confirm that (the command performs a dry run so it does not actually apply any change to the filename, until you run it with --notest flag)

ts678 · November 6, 2024, 9:59pm

How does the centralize part work? It seems to be putting old 8-bit characters onto Linux.
Those possibly look pretty strange even on Linux. Can you convert names on the way in?

As a side note, that ls wasn’t exactly the one requested. Why was the directory different?