Using signed URLs for backup uploads (S3, Google Storage)

dejaime · May 26, 2021, 4:29pm

Hi,

I have a specific use-case where I’d need to install Duplicati on client’s machines and periodically backup files to my S3/Google Storage buckets (upload only, no recovery).

Reading through the manuals, I take it that both AWS S3 and Google Storage are supported but only with public buckets or using access tokens/keys. Given the buckets are private and the configuration will be stored in (assumed insecure) clients’ systems, I can’t use access token/key information in plain text.

What I can, however, is fetch presigned upload URLs from an API using said clients’ credentials. I looked at --run-script-before-required and related options, but couldn’t find a way to request the URL using it. The manual has a long list of providers, what hinted the possibility of creating a custom provider for my use case. This process would be basically:

Fetch upload URL with an HTTP GET request using credentials
Upload file (rarely over 100 mb) to that URL
(Possibly) Send backup confirmation to previous server

I went over to the source code, and found the providers under duplicati/Duplicati/Library/Backend/, but I have a couple of questions.

Am I overlooking any other, simpler way to do this?
Is there any documentation on how to develop providers? Are there any providers that make simple HTTP uploads available for reference?

ts678 · May 28, 2021, 4:01pm

Welcome to the forum @dejaime

Ignoring the question of restore plan, please note that Verifying backend files and Compacting files at the backend do downloads. Compact also does deletes (as might file cleanup actions when an upload fails).

I consider public buckets to be a huge security risk, and hope nothing is advocating them for general use.

They’re not stored in quite plain text on Windows, but web UI does let the user see backup configurations.

Are your talking about writing your own API and server, and hooking calls to it into Duplicati’s file uploads?
Sounds ambitious. If you’re a .NET developer, your skills are very much needed in many Duplicati areas.

Duplicati default remote volume size is 50 MB. That can be configured, but it’s still multiple files uploaded.
There is generally at least a dblock of source file data, a dindex that indexes dblock, and a dlist listing files.
How the backup process works

I don’t use S3 or Google Cloud Storage, but have you considered S3’s IAM, which sounds very extensive?

How can I grant a user access to a specific folder in my Amazon S3 bucket?

How do I set up Wasabi for user access separation?

github.com

duplicati/duplicati/blob/0a43410535d89e3e8ca93e78f799b9cec8812310/Duplicati/Library/Interface/IBackend.cs#L27-L37


      
          /// <summary>

          /// The interface all backends must implement.

          /// The classes that implements this interface MUST also 

          /// implement a default constructor and a constructor that

          /// has the signature new(string url, Dictionary&lt;string, string&gt; options).

          /// The default constructor is used to construct an instance

          /// so the DisplayName and other values can be read.

          /// The other constructor is used to do the actual work.

          /// An instance is never reused.

          /// </summary>

          public interface IBackend : IDisposable

is the minimum Duplicati interface (most backends offer other features). What a backend requires varies.

HttpClient Class and the deprecated HttpWebRequest Class are the basic APIs underneath HTTP usage.
Duplicati usages may actually use OAuthHttpClient or AsyncHttpRequest, and probably other variations…
How we get along with OAuth looks like OAuth is relevant to Google Cloud Storage, but not to Amazon S3.

dejaime · May 28, 2021, 7:10pm

Thank you very much for the insightful answer!

They are not plain text? That’s interesting. I’ll have another look.

Using IAM was the first thing that came to mind, as both S3 and Google Storage work with a very similar API (Storage is S3 compatible). Still, it would only solve the authentication problem, but would still need to access an API to get / update IAM access. Either that or at least a hacky companion application to update Duplicati’s configuration with new IAM and make the API calls. Figured properly adding the backend would be better.

Yes, exactly. We have a medium volume backup system for sensitive text files (read Json and XML); one that we are extending to allow specific binary files. We would need some API calls upon starting and finishing an upload.

I’ll definitely try and contribute back if I add anything that’d be useful outside of our specific use-case!

This is exactly what I needed!

Thank you very much!

ts678 · May 28, 2021, 7:36pm

Duplicati.Server.exe talks about

--server-encryption-key
This option sets the encryption key used to scramble the local settings database. This option can also be set with the environment variable DUPLICATI_DB_KEY. Use the option --unencrypted-database to disable the database scrambling.

and

--unencrypted-database
Disables database encryption.

which make the weak encryption slightly more secure (or at least off the default), or turns it off if desired.
Which tool can open encrypted DB talks about the encryption used on Windows, but it’s not high quality.

Duplicati security is more against attacks on the remote backup, and less on attacks on its own system.
The support of CLI use requires the ability of the Duplicati administrator to see their own credentials, etc.

I’m not an IAM expert, and I’m even less an expert on IAM automation, but manual setup won’t scale well.

ts678 · June 2, 2021, 6:58pm

Still sounds ambitious, and I’m not sure how it would be worked into the general product code base.

Typically new features go out for Canary release in the hope someone will try them. Server setup in conjunction with Duplicati public test is something that nobody may be willing to do. So how to test?

Branding and OEM customization talks about customized packages. Updates might be a challenge.

It’s not clear to me exactly how much sharing is in mind. Are you putting all the users in one bucket?
This avoids using tons of buckets, but I assume some administration and per-user config is needed.