Building a new storage backend for Duplicati

This article covers the design considerations for creating a new backend module for Duplicati extending the destinations possible for storing backup data. The Duplicati project is written in C# so some knowledge of C# is required for the implementation.

The backend abstraction

The backend interface is designed to be very primitive compared to some of the existing cloud provider storage APIs. This choice is made deliberately to ensure that diverse storage can be supported and additional support code is developed to deal with the weak guarantees and limited storage features.

Files and paths

The Duplicati backend is designed to provide an interface between local storage and remote storage. The paths given for the local storage are native to the current operating system. Make sure that the code works as well as possible for both the Windows path format and the POSIX path format.

The paths given to the remote storage are always filenames only as the concept of folders is not currently supported for the remote storage. The names of the files are restrained to letters, numbers, hyphen, and period characters. The user can provide a non-default file-prefix which MAY have special characters. If the backend does not support the special characters it should raise an exception. Alternatively, the backend may choose to encode filenames, which works if it also decodes the filenames before reporting them to the caller.

Backend basics

The class that implements the backend is currently implemented with two constructors, one default and one that accepts the connection url and options dictionary. The default constructor is only used to read the ProtocolKey, DisplayName, Description, and SupportedCommands properties. Future implementations will likely remove this default constructor in favor of static members on the interface.

The url passed to the backend follows the Duplicati logic where it accepts a relaxed form of a url with support for username/password, protocol, hostname and path parameters. This was originally designed to be easy to use from the commandline but has proven problematic due to encoding of special characters. The more recent backends use query parameters to pass arguments, and only use the protocol, hostname and path parts of the URL. Some backends like S3 support not using the hostname as the bucket names can be strings that are not valid hostnames.

The ProtocolKey property is used to map the url to the backend, and is passed unmodified. This is the key for the backend and must be unique among the backends. If for instance the protocol key is “example” a url could be example://hostname/path?option1=value1

The constructor is responsible for parsing the url and options. Generally, the url query parameters override the commandline options with the logic that the path is more specific, so options there are more specific to the backend. Enums are parsed case insensitive and booleans should be parsed with the Duplicati.Library.Utility.Utility.ParseBool() method to ensure consistent parsing of the values.

Operations on a backend

To support as many backends as possible, the Duplicati interface only requires support for 4 operations: GET, PUT, LIST, DELETE. If the backend supports renaming files it can also support a RENAME operation, but this is not currently used by Duplicati.

To implement a new backend module:

  1. Create a new .NET library project,
  2. Reference the assembly Duplicati.Library.Interface
  3. Create a class that implements the interface IBackend

Most libraries also implement the IStreamingBackend interface which has additional methods that are used to perform upload and download via Stream. Using these methods is recommended as it enables the upload and download throttle mechanism as well as showing progress on the transfers.

If the backend is wrapping a binary executable that expects filenames for input or output, the backend should not implement IStreamingBackend because that will throttle writing the stream to disk, which is pointless.

If the storage has the concept of folders, the backend can throw FolderMissingException which will be used to trigger a folder creation. For instance, if a bucket is not created, the S3 backend will throw FolderMissingException and use the CREATEFOLDER method to create the bucket.

Listing contents: LIST

The LIST operation is responsible for returning a list of files found on the remote destination. Since the backend does not have the concept of folders, this should return only files, but the return type IFileEntry allows also marking items as folders. The folders are not used in Duplicati, but many backends return these entries anyway, in case the backends are used in another project.

Each entry returned has a Name property, that is exactly the name of the file. This name will be passed to the GET/PUT/DELETE methods and should be stable, so that a file stored with PUT will return the same name in the LIST command.

Some providers use pagination when listing large directories and Duplicati does not yet support this. Backends needs to wrap the pagination and return a full list. Future plans would be to use AsyncEnumerable to support pagination from the backend.

Getting a file: GET

Arguably the most important method is to read a file from remote storage, as that enables restoring data. The GET method is similar to a HTTP GET request and simply returns the data as a stream. To support backends that do not support streams, the basic IBackend interface has a GET method that takes the destination filename, and the IStreamingBackend interface also has a method that returns the stream. Most implementations use the streaming method and simply writes the stream to a file to also support the non-streaming method.

Duplicati will download the stream to a temporary file regardless of implementation. If the stream method is used, this will allow Duplicati to report download progress as well as apply throttle speeds.

If the file is not found, the backend should throw FileMissingException, which will be handled differently than other exceptions.

Storing a file: PUT

The second-most important method is the PUT method as that allows the backup data to be stored. The PUT method is similar to the HTTP POST method and simply stores the data under the given name. As mentioned for the LIST operation, it is important that the name given to PUT is also the name reported in the LIST. Similar to the GET method there is also a streaming and non-streaming version. For either version, Duplicati will save the contents in a temporary file before processing it further.

Note that due to throttling and progress reporting, the client library is not allowed to read the stream twice (e.g. for hashing), as a read will be throttled. Through some careful reflection magic, it is possible to sniff out the underlying stream if pre-upload hashing is required.

Deleting a file: DELETE

The delete operation simply deletes the remote file. The method can throw if the delete operation fails, but generally does not have to report errors or support duplicated delete operations.

Optional operations on a backend

The backends also support some support operations. If possible these should be implemented but can also simply return empty results.

Testing the backend: TEST

To aid the user in debugging a backend connection, the TEST method should attempt to connect to the remote storage. For most backends this is simply mapped to the LIST operation, and throws FolderMissingException to indicate that the target folder should be created.

Creating a remote folder: CREATEFOLDER

The method to create the remote folder does not make sense for all storage, but can be used in cases where there is a one-time setup needed, such as creating a storage bucket or other resource. The flow is usually that the backend throws FolderMissingException to indicate a one-time setup is needed, and then calls CREATEFOLDER.

DNS names

For some systems, the IP addresses will change during a backend operation. If this is the case, the backend should report the hostnames used by the backend. If any hostnames are reported and a name resolution failure is detected, the backend manager will retry to obtain all hostnames again before retrying the operation.

Debugging a backend

During development of the backend, the tool Duplicati.CommandLine.BackendTool can be used to issue one of the supported LIST/GET/PUT/DELETE operations. The setup wraps most of the logic and simplifies testing with a specific URL and operation without dealing with the main project code.

Testing a backend for compatibility

Once the backend development is completed, the tool Duplicati.CommandLine.BackendTester can be used to test the backend. It works by creating a few random files, reading them back and listing the files. Beware that it will delete the entire target folder before starting to allow simple restarts in testing. The tool uses a more extended default character set than what Duplicati uses, so you may need to reduce the character set if the storage does not support all characters.

State, threads and cache

The backend instance is generally created once and re-used for multiple operations. This enables the possibility to cache various tables, such as filename/id maps. On errors, the backend manager may decide to restart the backend instance, so the backend should be able to operate with multiple instances (i.e., no static shared state).

The backend manager ensures that the instance is only accessed by a single thread at a time so the backend does not have to be thread-safe in any way.

Implementing a UI for the backend

If nothing special is done, the UI will show a basic version of the input, allowing the user to type in the values. Any special items may be entered as advanced options.

If a custom UI is desired, it can be implemented in the file EditUriBuiltins.js. The logic is a bit hard to follow, but generally consists of a html template that defines the visual aspects and a set of callback methods registered in EditUriBuiltins.js:

  • Loader: fetching additional data, such as allowed hostnames
  • Parser: converting url components into the scope
  • Builder: converting the user variables into a url
  • Validator: checks the user input when saving
  • Tester: Performs checks, usually calls the TEST method

OAuth support

For supporting OAuth-based APIs, a Duplicati service is set up that can perform the application login and obtain a refresh token. Due to the way OAuth is designed, this server needs to guard some secrets, so it cannot be completely user controlled. A C# implementation of the OAuth server is provided that can be extended to support new services.

Inside Duplicati the OAuthHelper and OAuthHttpClient classes are implemented to assist in making the authentication happen and return the access token needed to interact with the API.

2 Likes