ShellBoost

File Systems Synchronization

Starting with ShellBoost version 1.3.0.0, ShellBoost provides a File Systems Synchronization feature. This feature is supported only with Windows 10 starting with version 1709 (or “Fall Creators Update”).

This feature also includes Windows 10 Files On-Demand synchronization support, like what Windows’ OneDrive provides, but enabled for any file, storage, or database system. Here is the architecture of this feature:

File Systems Synchronization - Picture 120

Here are the main components and concepts involved in the synchronization process:

File Systems X, Y, Z, etc.: they represent the implementation of ShellBoost-provided .NET interfaces (ISyncFileSystem, etc.). This .NET interface(s) implementation is the only component that you must write to enable synchronization between your storage system and other storage system. File system implementations can be readable, or writable, or both, and they can also provide events to indicate changes to their files or directories (whatever that can means for that specific implementation).

EndPoint Synchronizers: provided by ShellBoost, this component talks to a specific file system implementation. One EndPointSynchronizer class instance handles one and only one ISyncFileSystem implementation instance.

MultiPoint Synchronizer: provided by ShellBoost, this is the main component. It serves as an orchestrator for all EndPointSynchronizer class instances.

Entries: entry is an abstract concept that is equivalent to the file and to the directory (or folder) well-known concepts. It’s important to note each entry has a unique identifier, that is different than its name. It means you cannot provide a valid implementation of a file system if that file system doesn’t handle unique identifiers for entries.

Events: file system implementations can send events to their associated EndPointSynchronizer instance. Events represent changes that occurs at file system level : created, moved / renamed, changed, deleted, for all entries (files and directories).

Changes: a change represent a synthetized change that originates from a file system event and must be applied to other file systems (through their associated EndPointSynchronizer instances).

State: EndPoint Synchronizers and the MultiPoint Synchronizer must store state (entries, changes, etc.) on the local disk. ShellBoost provides a default implementation of this state provider that should suit most purposes.

 

Here is the flow of operations (numbers in red circles in the figure above):

1)      File Systems implementations provide events to their associated EndPointSynchronizer. These events serve as triggers to determine the difference between the last file system entries (persisted in state) and the new file system entries.

2)      Each EndPointSynchronizer instance synthetizes events into changes. Note there is not necessarily a one-to-one relation between a file system event and a synchronization change. These changes, if any, are then sent to the MultiPointSynchronizer instance, which persists them.

3)      The MultiPointSynchronizer instance ensures all EndPointSynchronizer instances apply changes to their associated File System implementation for events that did not originate from them. Once a change is applied, the EndPointSynchronizer instance state is updated.

 

ShellBoost currently provides the following implementations of the ISyncFileSystem interface:

The LocalFileSystem class: provides Windows 10 file system support;

The OnDemandLocalFileSystem class: provides Windows 10 file system support like LocalFileSystem, plus Files On-Demand support;

We provide a Google Drive Folder sample implementation.

We also provide a Cloud Folder Sync sample implementation.

 

Here is a sample C# code that create a synchronization between a local directory path and a custom file system.

// create the MPS. We need a stable and globally unique identifier (could be a guid)
using (var mps = new MultiPointSynchronizer("MyCompany.MyProgram"))
{
    // add one endpoint with the ShellBoost-provided local file system
    mps.AddEndPoint("local", new LocalFileSystem(@"c:\myPath"));
 
    // if you want a file On-Demand local file system, then you would choose this class instead
    // (don't forget to register the On-Demand root path, check the File On-Demand following chapter)
    // mps.AddEndPoint("onDemand", new OnDemandLocalFileSystem(@"c:\myPath"));
 
    // add one endpoint with a custom file system
    mps.AddEndPoint("local", new CustomFileSystem());
 
    Console.WriteLine("Press ESC to quit, C to clear...");
 
    // start the MPS
    mps.Start();
 
    do
    {
        var k = Console.ReadKey(true);
        if (k.Key == ConsoleKey.Escape)
            break;
    }
    while (true);
}

Here is what the custom file system must implement if it wants to be readable, writable and raise events:

public class CustomFileSystem : ISyncFileSystem, ISyncFileSystemRead, ISyncFileSystemWrite, ISyncFileSystemEvents
{
    // ISyncFileSystem
    public string RootId => { /* TODO */ }
    public EndPointSynchronizer EndPointSynchronizer { get; set; }
    public bool HasCapability(SyncFileSystemCapability capability) => { /* TODO */ }
 
    // ISyncFileSystemRead
    public IEnumerable<StateSyncEntry> EnumerateEntries(SyncContext context, StateSyncEntry parentEntry, SyncEnumerateEntriesOptions options = null) { /* TODO */ }
    public Task GetEntryContentAsync(SyncContext context, StateSyncEntry entry, Stream output, SyncGetEntryContentOptions options = null) { /* TODO */ }
 
    // ISyncFileSystemWrite
    public void DeleteEntry(SyncContext context, StateSyncEntry entry, SyncDeleteEntryOptions options = null) { /* TODO */ }
    public void GetOrCreateEntry(SyncContext context, StateSyncEntry entry, SyncGetEntryOptions options = null) { /* TODO */ }
    public Task SetEntryContentAsync(SyncContext context, StateSyncEntry entry, Stream input, SyncSetEntryContentOptions options = null) { /* TODO */ }
    public void UpdateEntry(SyncContext context, StateSyncEntry entry, SyncUpdateEntryOptions options = null) { /* TODO */ }
 
    // ISyncFileSystemWriteAsync (ShellBoost 1.6.0.2 or higher). Implement that *or* ISyncFileSystemWrite, not both.
    public Task DeleteEntryAsync(SyncContext context, StateSyncEntry entry, SyncDeleteEntryOptions options = null) { /* TODO */ }
    public Task GetOrCreateEntryAsync(SyncContext context, StateSyncEntry entry, SyncGetEntryOptions options = null) { /* TODO */ }
    public Task SetEntryContentAsync(SyncContext context, StateSyncEntry entry, Stream input, SyncSetEntryContentOptions options = null) { /* TODO */ }
    public Task UpdateEntryAsync(SyncContext context, StateSyncEntry entry, SyncUpdateEntryOptions options = null) { /* TODO */ }
 
    // ISyncFileSystemEvents
    public event EventHandler<SyncFileSystemEventArgs> Event;
    public void StartEventMonitoring() { /* TODO */ }
    public void StopEventMonitoring() { /* TODO */ }
}

Requirement for a file system implementation

A file system implementation must support some features to be successfully integrated with ShellBoost synchronization:

It must have the “item” concept (aka document, file, etc.). An item has a content (bytes). It’s size (number of bytes) can be 0. An item has always a parent item which is a folder.

It must have the “folder” (aka directory, container, etc.) concept. A folder contains items and has no content. Its size is 0 (or non-existent).

Both concepts have the following properties/metadata:

a.       Identifier (string)

b.       Parent identifier (string). Points to the containing folder’s identifier.

c.       Name (string)

d.       Creation date (DateTimeOffset)

e.       Last write date (DateTimeOffset)

f.        Size (long).

g.       Attributes (SyncEntryAttributes). Tells if the item is a folder or not.

h.       Content version (string). Optional

i.         Extended data (dictionary of string/string pairs). Optional. Used to store custom data.

The identifier must be unique globally in the file system.

The identifier must be different than the name (for example, a Guid expressed as a string is valid). Renaming an item must not change its identifier.

There is a root folder that contains everything, all children, and grandchildren recursively. Its identifier is exposed by ISyncFileSystem.

A folder cannot contain two child items or folders with the same name.

Getting and settings an item’s content is based on streams. Streams are not required to be seekable. It’s recommended, but not mandatory, that the file system implementation supports getting or setting an item’s content using chunks.

A file system implementation can and often must throw errors or add errors to the current SyncContext instance if an error occurs. For example, it must not fail gracefully when asked to create a file and that file is not created. The ShellBoost classes will retry the failing operation in most cases.

 

Here is a list of the functions a file system must support to fully integrate with synchronization. The functions here are described using pseudo signatures and actually depend on your file system internal API, but implementing ShellBoost’s ISyncFileSystemXXX interfaces implies for your file system a feature set equivalent to what’s described here:

Enumerate(folderId): enumerates items and folders for a given folderId. The enumeration can be deep or at folder level.

GetItemMetadata(itemId, out name, out dates, out size, out attributes): gets an item properties/metadata, such as the name, the dates, and the size.

GetFolderMetadata(folderid, out name, out dates, out attributes): gets a folder properties/metadata, such as the name, the dates. Size is considered zero for a folder.

SetItemMetadata(itemId, out name, dates, attributes): sets an item properties/metadata. Size is generally set when using SetItemContent.

SetFolderMetadata(itemId, name, dates, attributes): sets a folder properties/metadata.

GetItemContent(itemId, out stream): gets an item content (bytes, data, etc.) as a stream. For optimal performance and memory consumption, it’s strongly recommended to support range request (offset, count). This corresponds to the GetPartialContent file system’s capability.

SetItemContent(itemId, stream): sets an item’s content (bytes, data, etc.) from a stream. For optimal performance and memory consumption, it’s strongly recommended to support range request (offset, count). This corresponds to the SetPartialContent file system’s capability.

CreateItem(folderId, name, out itemId): creates an item. This can be replaced by a SetItemContent if that method supports setting name.

CreateFolder(folderId, name, out folderId): creates a folder.

RenameFolder(folderId, newName): rename a folder. A rename doesn’t semantically change an entry’s Identifier property, only its Name property.

RenameItem(itemId, newName): rename an item. A rename doesn’t semantically change an entry’s Identifier property, only its Name property.

MoveFolder(folderId, newFolderId): move a folder. A move doesn’t semantically change an entry’s Identifier property, only its Parent identifier property.

MoveItem(folderId, newFolderId): move an item. A move doesn’t semantically change an entry’s Identifier property, only its Parent identifier property.

DeleteItem(itemId): delete an item.

DeleteFolder(folderId): delete a folder.

CopyContent(item1Id, item2Id): copy an item’s content (bytes, data, etc.) from an item to another. Check the “Upload mechanism” chapter for more on this. This doesn’t semantically change an entry’s Identifier property or Parent identifier property.

 

Here is a list of the events a file system must support to fully integrate with synchronization:

Item Created: a new item has been created.

Folder Created: a new folder has been created.

Item Changed: an item’s metadata (size, dates, attributes) has changed.

Folder Changed: a folder’s metadata (dates, attributes) has changed.

Item Content Changed: an item’s content (data, bytes, etc.) has changed.

Item Deleted: an item has been deleted.

Folder Deleted: a folder has been deleted. It implies children and grandchildren items have been deleted too.

Item Moved: an item has been moved (into a different parent folder) and/or its name has changed (same parent folder). The event must provide the old location/path.

Folder Moved: a folder has been moved (into a different parent folder) and/or its name has changed (same parent folder). The event must provide the old location/path.

Events publishing internal implementation for your file system can be based on any polling or real time mechanisms. The Google Drive Folder sample uses polling, while the Cloud Folder Sync sample uses ASP.NET Core’s SignalR (websockets).

File System “download” mechanism

Note: the term “download” here just means copying items (not folders) from your file system to other file systems.

The download (and upload) mechanism copies.NET Streams from one file system to another. Internally, ShellBoost-provided file system implementation, such as the OnDemandLocalFileSystem, use “partial” or “chunked” download. It means they are not always requesting the full content of an item in one shot but can request only chunks of it. For example, this is the case when the end-user stops a file download and restarts it later, in the case of large files.

So, you have two possibilities when implementing your file system:

1.       Implement support partial download. This is what’s recommended. In this case you must return true when asked for the GetPartialContent capability. For example, if your file system is implemented by a web server, it would typically mean to connect the ShellBoost stream chunk request to the HTTP range header of this web server.

2.       Don’t implement support for partial download. In this case you must return false when asked for the GetPartialContent capability. ShellBoost will then provide to the requesting file system an internal range-enabled Stream, backed by your stream, so operations can work transparently. It’s not optimized because in the case of an item that was already partially downloaded, it will be completely downloaded again. Note it can also cause some UIs, such as the Windows Shell progress dialog box, to stall, waiting for the beginning of the file to be downloaded again, as the expects the download process to start where it was stopped.

What you must not do, is return true when asked for the GetPartialContent capability but forgot to check for the offset and count values provided by ShellBoost with a given stream.

For ShellBoost versions lower than 1.8.0.0, in the case of downloads (not uploads), a custom file system implementation must call the SyncContext.ProgressSink instance’s Progress method that’s passed to the GetEntryContentAsync method to notify the Windows file system and the Windows Shell (Explorer views) of download progress. This is particularly important when using the OnDemandLocalFileSystem with large size files being hydrated/downloaded. Progress implementation is demonstrated in the Google Drive Folder sample and in the Cloud Folder Sync sample. For ShellBoost versions 1.8.0.0 and higher, progress reporting is automatic by default.

File System “upload” mechanism

Note: the term “upload” here just means copying items (not folders) from other file systems to your file system.

When an entry content change (which is different than an entry metadata such as Name or LastWriteTime), the MultiPointSynchronizer doesn’t necessarily directly uploads its content to target End Points. It uses an intermediary class named ContentMover that handles content downloads and uploads. There can be two cases:

1.       The entry content is uploaded directly in the target folder. This target folder must have been created before, possibly recursively including all grandparent folders. You must set EndPointSynchronizerOptions.UploadsWaitForParents to true. In this case, it’s recommended the temporary files are invisible to your file system’s end users. This is demonstrated in the Cloud Folder Sync sample.

2.       The entry content is uploaded in another folder and then moved to the target folder. This is demonstrated in the Google Drive Folder sample which uses a folder named “ShellBoostTemp” for all temporary uploads. This folder is visible to end-user.

In both cases, ContentMover uses temporary items when starting the upload operation. Therefore, the GetOrCreateEntry method implementation must handle the temporary entry case. When doing “uploads”, ContentMover is also responsible for handling possible errors (network disconnection, etc.) and retries. Here is a typical ContentMover upload workflow:

It uploads items as temporary items to your file system using your file system implementation, like any other items. Temporary items are recognizable as they have a name are supposed to be globally unique and start by “sbcmtemp”, like “sbcmtemp.201.73410ec4b9814c689c1193e3684c0824.CloudFolder.d29f6e3677e72ae22b7cbf9c421e0704--MyImage.jpg” for example. This is true for brand new files as well as file content changes (updates). These files should be hidden from your end-user. That doesn’t necessarily means your back end system requires a hidden attribute that has the same meaning as for the Windows file system, but it means they should not be visible and/or actionable by an end-user using other UI that your back end system provides. You can for example chose to filter these names even if you don’t support a hidden equivalent attribute.

Once an item has been uploaded, ContentMover will call your ISyncFileSystemWrite UpdateEntry or ISyncFileSystemWriteAsync UpdateEntryAsync method with a rename or move implicit instruction.

The file system implementation must detect it's a moving/renamed/content change file operation and is then supposed to "move" (whatever that would mean for your file system) the temporary item to its final location. After the upload operation, which can take lots of time, suffer retries or all sort of problems, and once the file is completely present on your file system, this move operation is supposed to be almost immediate and solve many potential issues with end-users UI.

That could end up with a new item (in the case of a creation scenario), or the update of an existing item (in the case of a content or metadata update/change scenario).

In the case of an update, it should not result in a new item created (with a new id, etc.). That’s why your file system should implement a “CopyContent”-like method as described in the previous chapter.

The rationale behind this temporary upload + move workflow is to avoid having inconsistent items on your file system visible to an end-user. This is a common scenario for large size items, but also for items that would have been copied only partially due to any issue (synchronization not running, network down, file system issue, etc.). Upload speeds are also much slower than download speeds. Your file system final move operation is supposed to make the final change (rename or move) the most atomic possible and unnoticeable for the end-user, beyond metadata visual updates (dates, size, etc.).

Remarks

File system implementations are provided with the Google Drive Folder sample and the Cloud Folder Sync sample.

A file system implementation is required to provide StateSyncEntry instances (though the EnumerateEntries method) with unique identifiers different than names. This is a strong requirement to successfully support renaming-type events. It means, for example, that a custom file system that doesn’t define a unique identifier per entry (file and folder) cannot be supported.

The EndPointSynchronizer implementation expects only one entry with a given name for a given parent entry. It’s another way of saying that there cannot be two files with the same name in a folder. While this is obvious on Windows physical disks, some file systems do support multiple entries with the same name (Google Drive for example). If a file system implementation provides multiple entries with the same Name + ParentId property values but with different Id, then the EndPointSynchronizer implementation will consider all these duplicate-name entries as invalid and will discard them all.

The EndPointSynchronizer implementation doesn’t immediately respond to file system events to avoid events flooding the system. It means in the normal course of operations, there can be delays between one given file system events and the corresponding changes finally applied to other file systems. These delays are usually in the range of a few seconds for not content-related events, and more for content-related ones (depends on the content size).

The EndPointSynchronizer implementation catches all possible exceptions, thrown by file system implementations, gracefully. It means a synchronization system running may appear to work without raising any error, while in fact nothing does work. To check errors, you must set the Logger property of the MultiPointSynchronizer class and check what’s logged by all components. The samples demonstrate that.

Your file system implementation must be consistent between a reported StateSyncEntry’s Size and the final size of the same item’s content when it’s copied. You must not return a content with a different size than expected.