Deployment
The user-mode library comes in two pieces, both of which must be deployed with the application:
- A Python module named cbfsvault.py.
- A native dynamic library (unmanaged), named as follows:
- Windows: pycbfsvault24.dll (available for x64 and x86 processor architectures)
- Linux: libpycbfsvault.so.24.0 (available for x64 and x86 processor architectures)
- macOS: libpycbfsvault24.0.dylib (available for x64 and ARM64 processor architectures)
Both the module and the native library are included in the product's Python package, <install_dir>\cbfsvault-24.0.xxxx.tar.gz, which should be installed using pip:
cd C:\path\to\install_dir
python -m pip install cbfsvault-24.0.xxxx.tar.gz
Once the product's Python package has been installed, the module can be imported and used: from cbfsvault import *. Nothing else is required to deploy the application.
As an alternative to installing the module using pip, you can utilize the built-in setuptools module to package the module for deployment or install it to the machine.
python setup.py build --build-lib=<app_dir>
The above setup command packages the module and native library for deployment. A folder is created in the app_dir directory with the module and the native library packaged inside.
General Information
The topics in this section provide general information about various aspects of the product's functionality.
Topics
- Buffer Parameters
- Callback Mode
- Encryption
- Error Handling
- File Features section
- Query Language section
- Vaults section
Buffer Parameters
Some events include one or more parameters intended for use as a binary data buffer. Depending on the event, these parameters may contain data when the event is fired, or it may be expected that the application populates them with the desired amount of data during the event handler. Some events combine both paradigms and then expect the application to modify the data already present when the event is fired.
The documentation for such events will describe which of these cases applies to each buffer parameter. In all cases, buffer parameters point to a preallocated block of unmanaged memory, the size of which is specified by the parameter immediately following the buffer parameter. In cases in which data are to be written, be sure to write it directly to the pointed-to memory, do not change the value of the buffer parameter itself. Buffer parameters are always of the ctypes.c_void_p type; use the ctypes.memmove() method to read and write data from and to the unmanaged memory region. To obtain a pointer to a Python buffer suitable for use with ctypes.memmove(), call ctypes.c_void_p.from_buffer() and pass it a byte array or some other type that implements Python's buffer interface.
Callback Mode
As discussed in the Vaults topic, the default behavior of CBVault class is to create a vault using a real file on a local disk. However, the filesystem engine behind these classes does not require a vault to be a local file: it can be a remote file, a memory region, or anything else that the application can provide random read/write access to.
Applications that wish to use something other than a local file to store a vault must enable callback mode using the class's callback_mode property. When callback mode is enabled, applications must handle the following events (which map closely to the Windows File API) for the class to interact with the vault. For brevity, vaults created and accessed using callback mode are typically referred to as "callback mode vaults".
- on_vault_close: Fires when the currently open vault should be closed.
- on_vault_delete: Fires when a callback mode vault (that is not open) should be deleted.
- on_vault_flush: Fires when any buffered vault data should be flushed out to storage.
- on_vault_get_parent_size: Fires when the class needs to know how much free space is available for the currently open vault to use for automatic growth.
- on_vault_get_size: Fires when the class needs to know the size of the currently-open vault.
- on_vault_open: Fires when a callback mode vault should be opened (and, if necessary, created).
- on_vault_read: Fires when the class needs to read one or more pages of vault data.
- on_vault_set_size: Fires when the class needs to resize (i.e., shrink or grow) the currently open vault.
- on_vault_write: Fires when the class needs to write one or more pages of vault data.
Callback mode is an extremely powerful feature for applications that want to fine-tune performance. For example, consider the following scenario: a help-authoring tool that keeps a compound file in memory for fast operations. In this scenario, the vault would be a help project that the help-authoring tool loads into memory and uses the above events to access. When a user needs to save the project, the vault is flushed and the data in memory are copied back to the help project on disk.
Note: An application should not attempt to call class's methods from handlers of the listed events. Doing this is guaranteed to cause a deadlock.
Encryption
The CBVault class includes strong built-in data encryption support, which can be applied to individual files and alternate streams, entire vaults, or both. Each file, alternate stream, and vault can have its own encryption key.
Note: The API members discussed in this topic are available in all listed classes, unless otherwise noted.
Encrypting Vaults
To specify a default encryption mode and password to use when creating new vaults, applications can set the vault_encryption and vault_password properties. To change the encryption mode or password of an existing vault, use the update_vault_encryption method.
When opening an existing vault, vault_encryption is updated to reflect the vault's encryption mode; and if the vault is encrypted, the password specified by vault_password is used to access it.
Encrypting Files and Alternate Streams
To specify a default encryption mode and password for files and alternate streams, applications can set the default_file_encryption and default_file_password properties. Additionally, the following methods allow applications to set a file or alternate stream's encryption mode or password explicitly:
- open_file (when creating a new file or alternate stream)
- open_file_ex (when creating a new file or alternate stream)
- set_file_encryption
- copy_to_vault
When a file or alternate stream is encrypted, its encryption password must be provided to access it; many methods in the class's API provide a Password parameter for this purpose. If the application does not explicitly specify a password when calling such a method, then the default_file_password will be used, if possible.
Using Custom Encryption
The class's built-in encryption implementation uses 256-bit AES encryption in XTS mode with PBKDF2 key derivation based on a HMAC-SHA256 key hash. However, applications also can choose to provide their own custom encryption and key derivation implementations. This flexibility allows applications to support more sophisticated security techniques, such as PKI-based encryption or Digital Rights Management. To get started, do the following:
- Choose a custom encryption mode to implement (i.e., one of the VAULT_EM_CUSTOM* options from the table below). This choice will determine:
- Whether the custom encryption implementation uses a 256-bit, 512-bit, or 1024-bit block size; and,
- Whether to use built-in key derivation, custom key derivation, or no key derivation.
- Implement the on_data_encrypt and on_data_decrypt events.
- If a VAULT_EM_CUSTOM*_CUSTOM_KEY_DERIVE mode was chosen, implement the on_key_derive event.
- If a VAULT_EM_CUSTOM*_DIRECT_KEY mode was chosen, implement the on_hash_calculate event.
Supported Encryption Modes
The class supports the following encryption modes:
VAULT_EM_NONE | 0x0 | Do not use encryption. |
VAULT_EM_DEFAULT | 0x1 | Use default encryption (VAULT_EM_XTS_AES256_PBKDF2_HMAC_SHA256). |
VAULT_EM_XTS_AES256_PBKDF2_HMAC_SHA256 | 0x2 | Use AES256 encryption with PBKDF2 key derivation based on a HMAC_SHA256 key hash. |
VAULT_EM_CUSTOM256_PBKDF2_HMAC_SHA256 | 0x3 | Use event-based custom 256-bit encryption with PBKDF2 key derivation based on a HMAC_SHA256 key hash.
A 256-bit (32-byte) block size is used with this encryption mode. |
VAULT_EM_CUSTOM512_PBKDF2_HMAC_SHA256 | 0x4 | Use event-based custom 512-bit encryption with PBKDF2 key derivation based on a HMAC_SHA256 key hash.
A 512-bit (64-byte) block size is used with this encryption mode. |
VAULT_EM_CUSTOM1024_PBKDF2_HMAC_SHA256 | 0x5 | Use event-based custom 1024-bit encryption with PBKDF2 key derivation based on a HMAC_SHA256 key hash.
A 1024-bit (128-byte) block size is used with this encryption mode. |
VAULT_EM_CUSTOM256_CUSTOM_KEY_DERIVE | 0x23 | Use event-based custom 256-bit encryption with custom key derivation.
A 256-bit (32-byte) block size is used with this encryption mode. |
VAULT_EM_CUSTOM512_CUSTOM_KEY_DERIVE | 0x24 | Use event-based custom 512-bit encryption with custom key derivation.
A 512-bit (64-byte) block size is used with this encryption mode. |
VAULT_EM_CUSTOM1024_CUSTOM_KEY_DERIVE | 0x25 | Use event-based custom 1024-bit encryption with custom key derivation.
A 1024-bit (128-byte) block size is used with this encryption mode. |
VAULT_EM_CUSTOM256_DIRECT_KEY | 0x43 | Use event-based custom 256-bit encryption with no key derivation.
A 256-bit (32-byte) block size is used with this encryption mode. This mode is useful for cases in which the password is an identifier for an external key and should not be used for key derivation. |
VAULT_EM_CUSTOM512_DIRECT_KEY | 0x44 | Use event-based custom 512-bit encryption with no key derivation.
A 512-bit (64-byte) block size is used with this encryption mode. This mode is useful for cases in which the password is an identifier for an external key and should not be used for key derivation. |
VAULT_EM_CUSTOM1024_DIRECT_KEY | 0x45 | Use event-based custom 1024-bit encryption with no key derivation.
A 1024-bit (128-byte) block size is used with this encryption mode. This mode is useful for cases in which the password is an identifier for an external key and should not be used for key derivation. |
VAULT_EM_UNKNOWN | 0xFF | Unidentified or unknown encryption. |
Error Handling
Error Codes
The CBFS Vault class APIs communicate errors using the error codes defined on the Error Codes page.
Reporting Errors to the Class from Event Handlers
If an event has a ResultCode parameter, an event handler can use it to return the result code of the operation to the class. The ResultCode parameter is set to 0 by default, which indicates the operation was successful.
If an unhandled exception occurs in the event handler, it will be caught by the class, which will fire the OnError event.
In some events, the OS does not expect the error code to be returned and either the class or the OS ignores the returned error code. Please refer to the description of a particular event for more information.
How to Handle Errors Reported by the Class
If an error occurs, the class will throw an exception. The code property of the exception object will contain an error code, and the message property will contain an error message (if available).
File Features
The topics in this section provide information about file-related features.
Topics
Alternate Streams
Every file stored in a CBFS Vault vault contains a primary stream of data that holds the file's contents. In addition to this primary stream, files in a vault may also contain one or more alternate streams of data whose contents are determined by the application. By taking advantage of the flexibility that alternate streams offer, applications can support a wide range of use-cases, including the following:
- Storing a file's metadata or security information.
- Saving supplementary information associated with a file (e.g., song lyrics for music files).
- Maintaining a history of file content revisions.
- Providing multiple representations of the same file (e.g., HTML, RTF, and plain versions of text content).
Alternate streams are addressed using names like <FileName>:<StreamName>, so an alternate stream named "altstream" could be addressed as \path\to\filename.ext:altstream. Alternate streams can be created and accessed just like files using open_file, open_file_ex, and delete_file, and can even be compressed or encrypted individually if an application desires.
To enumerate a file's alternate streams, call the find_first method with a mask like <FileName>:<StreamNameMask>. The <StreamNameMask> part can be * to enumerate all streams in a file. A file's main stream, which is always nameless, can be accessed explicitly using the name <FileName>: (note the trailing colon).
Compression
The CBFS Vault filesystem stores data in a vault as a series of one or more pages. To reduce space usage, CBFS Vault can compress files and alternate streams with an application-selected compression algorithm using the following mechanism, which is optimized to provide optimal performance for both sequential and random data access:
- A block of data composed of a specific number of pages is passed to the compression routine, which attempts to compress the data.
- If the compressed data can be stored using fewer pages than before, it is written to the vault. Otherwise, the original (uncompressed) data are written instead.
- Steps 1 and 2 are repeated until all of the data pages associated with the file or alternate stream have been processed.
Compressing Files and Alternate Streams
To specify a default compression mode for files and alternate streams, applications can set the default_file_compression property (and, if applicable, the DefaultFileCompressionLevel configuration setting). Additionally, the following methods allow applications to set a file or alternate stream's compression mode explicitly:
- open_file_ex (when creating a new file or alternate stream)
- set_file_compression
- copy_to_vault
Using Custom Compression
CBFS Vault includes built-in support for zlib and RLE data compression. However, applications can also choose to provide their own custom compression implementation using the on_data_compress and on_data_decompress events.
Supported Compression Modes
CBFS Vault supports the following compression modes:
VAULT_CM_NONE | 0 | Do not use compression. |
VAULT_CM_DEFAULT | 1 | Use default compression (zlib). |
VAULT_CM_CUSTOM | 2 | Use event-based custom compression.
This compression level is not used. |
VAULT_CM_ZLIB | 3 | Use zlib compression.
Valid compression levels are 1-9. |
VAULT_CM_RLE | 4 | Use RLE compression.
This compression level is not used. |
File Tags
CBFS Vault allows applications to attach arbitrary metadata to any file, directory, or alternate stream using file tags. There are two kinds of file tags, both of which are stored as key-value pairs:
- Raw file tags use numeric Ids as keys and store raw binary data.
- Valid Id values are those in the range 0x0001 to 0xCFFF (inclusive).
- A tag should contain at least one (1) byte of data.
- The maximum size of a raw file tag's binary data is 65531 bytes.
- Typed file tags use string keys and store typed values.
- Names may be up to 4095 characters long (not including the null terminator), and are stored in UTF-16LE format internally.
- The maximum size of a typed file tag's value is 65529 - (name_length * 2) bytes (where name_length is measured in characters, including the null terminator).
Each file, directory, and alternate stream can have up to 1024 typed file tags and 53247 raw file tags attached to it at once. The following methods are used to manage and interact with file tags:
- delete_file_tag
- file_tag_exists
- get_file_tag, get_file_tag_as_ansi_string, get_file_tag_as_boolean, get_file_tag_as_date_time, get_file_tag_as_number, get_file_tag_as_string
- get_file_tag_data_type
- get_file_tag_size
- set_file_tag, set_file_tag_as_ansi_string, set_file_tag_as_boolean, set_file_tag_as_date_time, set_file_tag_as_number, set_file_tag_as_string
Applications can also use the find_first_by_query method to search for files and directories whose file tags match a specified query; please refer to that method's documentation, as well as the Query Language topic, for more information.
Note: The query language works only with typed file tags; it does not support raw file tags.
Query Language
In addition to searching by name, applications can search for files and directories based on their File Tags (metadata) using the CBFS Vault query language.
The query language includes a wide variety of Language Elements and supports all common Data Types. Search queries are interpreted as UTF-16LE strings, and they may contain any valid arrangement of language elements, typed file tag names, and constants. For example:
- From = 'John Smith': Selects all files received from John Smith.
- From is the name of a file tag.
- = is the equality operator (== is also supported).
- 'John Smith' is a string constant.
- SendData - Today > 3: Selects all files that were sent more than three days ago.
- SendData is the name of a file tag.
- - is the subtraction operator.
- Today is an intrinsic constant that returns the current system date.
- > is the greater than operator.
- 3 is a numeric constant.
When parsing an expression from a search query, the query engine converts all of its operands to the same data type using a specific set of rules; please see the Type Conversion topic for more information.
To find the first match for a query, call the find_first_by_query method, passing the desired search query for the Query parameter; and then call find_next to find other matches, if necessary. Be sure to call find_close when finished so that the class can release the resources allocated for the search operation.
Please refer to the File Tags topic for more information about how to manage file tags and see the other topics in this section for more information about the query language:
Language Elements
The CBFS Vault query language supports a wide variety of language elements, all of which are described below.
Logical Operators
Operator | Operand Type(s) | Description |
NOT, !, ~ | Boolean | Logical negation (NOT) |
NOT, !, ~ | Number | Bitwise NOT |
AND, & | Boolean | Logical AND |
AND, & | Number | Bitwise AND |
OR, | | Boolean | Logical OR |
OR, | | Number | Bitwise OR |
Arithmetic Operators
Operator | Operand Type(s) | Description |
+ | Number, DateTime | Addition |
+ | String | String concatenation |
- | Number | Negation |
- | Number, DateTime | Subtraction |
* | Number | Multiplication |
/ | Number | Division (attempting to divide by zero will cause an exception) |
Addition and subtraction operations involving DateTime operands behave as follows:
- When adding a Number (n) and a DateTime, the result is a DateTime whose value has increased by n whole days.
- When subtracting a Number (n) from a DateTime, the result is a DateTime whose value has decreased by n whole days.
- When subtracting a DateTime from another DateTime, the result is a Number that reflects the difference as a number of whole days. The query evaluator converts both operands to whole days before performing the subtraction; "leftover" time is truncated as part of the conversion.
Relational Operators
Operator | Operand Type(s) | Description |
=, == | All types | Equal to |
<>, != | All types | Not equal to |
< | All types | Less than |
> | All types | Greater than |
<= | All types | Less than or equal to |
>= | All types | Greater than or equal to |
Conditions
Condition | Operand Type(s) | Description |
IS [NOT] NULL | All types | Returns True if the value is/is not NULL, and False otherwise. |
IS [NOT] True | Boolean | Returns True if the value is/is not True, and False otherwise. |
IS [NOT] False | Boolean | Returns True if the value is/is not False, and False otherwise. |
[NOT] LIKE '...' [ESCAPE '...'] | String | Returns True if the value does/does not match the specified pattern; see notes below. |
Keep the following in mind when using the LIKE condition:
- Two kinds of wildcards are supported: %, which matches a string of any length; and _, which matches any single character. For example:
- From LIKE '% Smith': Selects all files received from people with the last name "Smith".
- From LIKE 'John Sm_th': Selects all files received from people with the first name "John" and a last name that is five characters long, begins with "Sm", and ends with "th" (e.g., Smith, Smyth, Smeth).
- To search for values that include wildcard characters, the optional ESCAPE parameter can be used to specify a wildcard escape character. For example:
- From LIKE 'John!_Smith' ESCAPE '!': Selects all files received from "John_Smith".
- From LIKE 'John!_%' ESCAPE '!': Selects all files received from a name that begins with "John_".
File Variables
File variables represent some piece of information about the current file the query is being evaluated against.
Variable | Type | Description |
FileName | String | The name of the current file. |
FullName | String | The fully qualified name of the current file, starting from the root directory /. |
Path | String | The full path to the current file, including the final path separator (not including the file name). |
IsFile | Boolean | True if the current file is not a directory, and False otherwise. |
IsDirectory | Boolean | True if the current file is a directory, and False otherwise. |
IsLink | Boolean | True if the current file is a symbolic link, and False otherwise. |
LinkDestination | String | If the current file is a symbolic link, the link's target; otherwise, it acts the same as FullName. |
CreationTime | DateTime | The current file's creation date and time. |
LastAccessTime | DateTime | The current file's last access date and time. |
ModificationTime | DateTime | The current file's last modification date and time. |
Size | Number | The size of the current file (always 0 for directories). |
Attributes | Number | The current file's attribute, encoded as a number. |
IsEncrypted | Boolean | True if the current file is encrypted, and False otherwise. |
IsCompressed | Boolean | True if the current file is compressed, and False otherwise. May be True for directories that contain files compressed by default. |
Intrinsics
"Intrinsics" are the functions and constants built into the query language.
Intrinsic | Operand Type(s) | Return Type | Description |
D(value) | String | DateTime | Converts a String to a DateTime; please refer to the Type Conversion topic for more information. |
IsNull(value) | All types | Boolean | Returns True if the value is NULL, and False otherwise. |
IsNotNull(value) | All types | Boolean | Returns True if the value is not NULL, and False otherwise. |
Min(value1, value2) | All types | All types | Returns the smaller of the two values. |
Max(value1, value2) | All types | All types | Returns the larger of the two values. |
Now | DateTime | Returns the current system date and time. | |
Today | DateTime | Returns the current system date. | |
True | Boolean | Boolean True. | |
False | Boolean | Boolean False. |
Precedence
The following table lists the query language's elements in order of descending precedence. Any legal expression within a query string may be surrounded with parentheses () to override precedence or increase readability.
Precedence | Language Elements |
1 |
All File Variables All Intrinsics (except D(value); see note) |
2 |
-: Arithmetic negation NOT, !, ~: Logical/bitwise negation D(value): Explicit String to DateTime conversion |
3 |
*: Multiplication /: Division |
4 |
+: Addition/string concatenation -: Subtraction |
5 |
=, ==: Equal to <>, !=: Not equal to <: Less than >: Greater than <=: Less than or equal to >=: Greater than or equal to IS [NOT], [NOT] LIKE: All Conditions |
6 | AND, &: Logical/bitwise AND |
7 | OR, |: Logical/bitwise OR |
Note: The query engine treats the D(value) function as an operator, so its precedence is lower than the other intrinsics.
Data Types
The CBFS Vault query language supports the following operand data types:
Type | Description |
NULL | Empty value. Operations with NULL operand(s) always result with NULL. |
Boolean | Boolean; either False or True (and False < True). |
String | String of UTF-16LE (2-byte Unicode) characters. |
DateTime | Describes the date and time. |
Number | Signed 64-bit integer. |
Type Conversion
When CBFS Vault parses a query, it will attempt to convert operands to the same type before evaluating an expression. The right-hand operand is converted to match the type of the left-hand operand if possible; otherwise, if the right-hand operand is a String type, the left-hand operand is converted to String.
Supported Data Type Conversions
Convert To | Convert From | Notes |
String | Number, Boolean, NULL | Typical conversion rules apply; Boolean values become "True" or "False". |
String | DateTime | The format string used by the conversion is YYYY-MM-DD hh:mm:ss.fff. |
Boolean | String | "True" and "False" are the recognized string values. |
DateTime | String | The parsing pattern used by the conversion is YYYY[-]MM[-]DD[[tT ]hh[[:]mm[[:]ss[.fff]]]]; see notes below. |
Number | String | The conversion recognizes numbers formatted as signed base-10 integers. |
In addition to the implicit conversion mentioned above, a String can be converted to a DateTime explicitly using the intrinsic function D(value). The implicit and explicit conversions both use the parsing pattern shown above, which has a number of optional parts:
- The date separators - may be omitted if the month and day are both two-digit values. They must both be present if the month and/or day is a single-digit value.
- The time portion may be omitted; if present, it must be specified as one of the following: hours only; hours and minutes; hours, minutes, and seconds; or hours, minutes, seconds, and milliseconds.
- The time separators : may be omitted if all included time elements are two-digit values. They must be present if any time elements are single-digit values.
- Milliseconds, if present, must always be separated by a . character, and must always be a three-digit value.
- When the time portion is present, it may immediately follow the date portion (i.e., with no separator), or it may be separated from the date portion using a T, a t, or a single space.
Vaults
What Is a Vault?
The key functionality CBFS Vault provides is the ability to create and store an entire filesystem (complete with files, directories, metadata, and much more) in a standalone container called a vault.
A vault is typically stored as a real file on a local disk (similar to, e.g., an SQLite database file), but applications can technically store it using any data location by using Callback Mode.
Internally, a vault's storage space is divided into chunks of equal size called pages. A vault's page size is specified at creation time, and it cannot be changed later. Applications do not have direct access to vault pages, but awareness of their existence is helpful for understanding certain class APIs.
Please refer to the other topics in this section for more information about vaults:
Multipart Vaults
CBFS Vault is capable of storing a single vault across multiple files on disk; this is known as a multipart vault. To create a multipart vault, set the PartSize configuration setting to a nonzero value before the vault is created. CBFS Vault will automatically create, resize, and delete individual part files as necessary over time (please refer to the Vault Size topic for more information).
Multipart vaults are typically used by applications that operate in environments with file size constraints. For example, if an application needed to store a 16 GB vault on a FAT32 filesystem, it could use a multipart vault with a 4 GB part size.
Existing vaults cannot be converted between multipart and non-multipart, and a multipart vault's size cannot be changed after creation. Also, multipart vaults are not supported in Callback Mode (because it already gives applications full control over how/where a vault is stored); the PartSize configuration setting is simply ignored.
Using RootData
All CBFS Vault vaults contain a special data stream called RootData that can be used for application-defined purposes. A vault's RootData stream can be accessed only using the open_root_data method, because the stream is not part of the vault's filesystem hierarchy. The standalone nature of the RootData stream means the following:
- It does not have a path in the filesystem, which prevents it from being found using find_first or find_first_by_query.
- It cannot have any file tags, attributes, times, or alternate streams associated with it.
- It cannot be compressed using the set_file_compression method.
- It cannot be encrypted using the set_file_encryption method.
The RootData stream is also exempt from whole-vault encryption. This exemption is intentional; it allows applications that utilize whole-vault encryption to store information about that encryption (e.g., encrypted session keys, certificates, access control lists [ACLs]) within the vault, thus simplifying application design.
Vault Corruption
CBFS Vault vaults have a complex internal structure that may become corrupted if a vault is not closed properly or if some operation is interrupted. Typically, such things are caused by an application crash or a system crash, or (when operating in Callback Mode) as a result of an error in an event handler. Corruption can also occur if a vault's raw data are modified externally, either intentionally or because of storage failure.
When a vault is open, the is_corrupted property can be queried to determine if has been corrupted. If a vault is corrupted, any operation may fail with a VAULT_ERR_VAULT_CORRUPTED error code.
Applications can attempt to fix a corrupted vault by calling the check_and_repair method. Always create a vault backup before calling check_and_repair, because in cases of severe corruption, it is possible for data to be lost during the repair process.
Journaling
To reduce the chances of vault corruption in the event of a crash, CBFS Vault can make use of journaling. Journaling works by wrapping vault modification operations in transactions, as follows:
- A new transaction is opened by writing information about a change to a journal located within the vault.
- The changes themselves are written to the vault.
- The transaction is committed by writing another entry in the journal.
If a crash occurs, any interrupted modification operations will appear in a vault's journal as pending transactions. The next time CBFS Vault opens the vault, it will discover any pending transactions and automatically try to recover them. During the transaction recovery process, each transaction is either committed or rolled back, depending on its last known state.
Overall, journaling is an effective technique for maintaining data integrity. However, keep the following considerations in mind:
- When journaling is enabled, all file data changes incur additional write operations; this has a significant impact on overall write performance.
- Journaling does not provide any kind of data redundancy or consistency; it cannot protect against corruption caused by bit-rot, storage failures, or external modification of a vault's physical data.
CBFS Vault implements journaling as an operational mode rather than a vault attribute, and there is no such thing as a "journaled vault" or a "nonjournaled vault". Applications control whether the journaling mode is used by setting the JournalingMode parameter of the open_vault method when opening a vault. Therefore, the same vault might be opened with journaling enabled at one point and may be opened without journaling enabled at another point.
The filesystem engine will always perform transaction recovery when a vault is opened (if transactions are pending in its journal), even if journaling is disabled.
Vault Size
By default, a vault grows automatically as more data are written to it, and it shrinks automatically when its free space percentage reaches the threshold defined by the auto_compact_at property.
Applications can use the following properties to both control, and obtain information about, a vault's size. Please refer to each one's documentation for more information.
- vault_size_max: Specifies the maximum size a vault can be; 0 (unlimited) by default.
- vault_size_min: Specifies the minimum size a vault can be; 0 by default.
- vault_size: Reflects a vault's actual size; and also can be used to explicitly resize a vault, keeping in mind the following:
- A vault cannot shrink more than its available free space allows (i.e., not by more than vault_free_space bytes).
- A vault cannot shrink beyond vault_size_min bytes.
- If vault_size_max is not 0 (unlimited), a vault cannot grow beyond vault_size_max bytes.
- If a vault grows enough to reach or exceed its auto_compact_at threshold, it will automatically shrink again when the next automatic compaction occurs.
- vault_free_space: Reflects the actual amount of free space a vault has available.
- possible_size: Reflects the maximum size a vault could possibly be.
- possible_free_space: Reflects the maximum amount of free space a vault could possibly have available.