This project is read-only.

Architecture

There are three important structural elements in the framework:
  • Finisher – A wrapper around a general purpose compression algorithm. It defines convenience methods that accept lists of intrinsic data types and writes the data to a byte stream for the compressor.
  • Transform – A preprocessing algorithm that steps in to alter the data in some way before it is passed to the Finisher.
  • Codec – A combination of Transform and Finisher that handles header information and acts as the vehicle for processing more complex data structures.
Finishers
Finishers are defined in the namespace Stability.Data.Compression.Finishers:
  • IFinisher – The interface that all finishers must implement.
  • Finisher – An abstract base class that implements methods to convert lists to a byte stream.
    • EncodeToStream – The single abstract encoding method that concrete classes implement.
    • DecodeToStream – The single abstract decoding method that concrete classes implement.
  • DeflateFinisher – A concrete class that implements EncodeToStream and DecodeToStream using DeflateStream.
Since concrete finishers only need to override a single pair of abstract methods from the base class, it is very easy to create implementations for any general purpose compression algorithm.
Transforms
Transforms are defined in the namespace Stability.Data.Compression.Transforms:
  • IDeltaTransform – The interface that all transforms must implement.
  • NullTransform – A concrete class that does nothing but pass data to the finisher.
  • DeltaTransform – A concrete class that simply differences data before passing it to the finisher.
A concrete transform knows nothing about concrete finishers. It gets passed an IFinisher that is called when preprocessing is complete.

In many cases custom transforms will inherit from one of the simple ones provided in order to override one or more of the virtual methods. For example, you might create a codec that only needs specialized handling of two data types, say DateTime and Double, for processing a specific data structure. The other virtual methods are there if you need to override any of them later.
Codecs
Codecs are defined in the default namespace Stability.Data.Compression:
  • IDeltaCodec – The interface that defines signatures for handling lists of intrinsic data types.
  • DeltaCodec – The abstract base class that implements IDeltaCodec methods.
  • DeflateCodec – A concrete class that combines a NullTransform with a DeflateFinisher.
  • DeflateDeltaCodec – A concrete class that combines a DeltaTransform with a DeflateFinisher.
Concrete implementations in the core library require no external references. Additional codecs are included in a separate assembly: Stability.Data.Compression.ThirdParty. Naturally, you can create your own library of codecs to combine whatever mix of transform and finisher implementations you require.
DataStructure
New types were added in v1.2 to show how multi-dimensional data can be handled:
  • Struple - This can best be described as a "structure tuple". It allows up to 15 fields of data.
  • StrupleEncodingArgs - This is a way of passing arguments when using Struples (CompressionLevel, Granularity, Monotonicity). Each vectorized field can have individual settings.
Utility Types
A number of utility types are defined in the namespace Stability.Data.Compression.Utility.

Although it is not really necessary to understand the details of these types for basic usage of the library, I’ll briefly describe a few of the more interesting ones here:
  • DeltaBlockState – This makes it easier to manage information about the state of a single block of data during the encoding and decoding process. A block can be the entire original vector (list or array) or it can be some subset of this data, such as an ArraySegment. It includes some logic to encode and decode some of the header information as bit fields.
  • DeltaBlockSerializer – This static class defines the methods used by codecs to serialize or deserialize single blocks of header information and encoded data.
  • OrderedRangeFactory – This provides static methods that help with partitioning of data when block parallel encoding is being used. It has some advantages over TPL range partitioning. See code comments for more information.
A number of other types might be interesting to those who want to create custom transforms and codecs. Browse the source code to see if they might be useful. Future versions will include several additional utility types for planned enhancements.


Last edited Jul 23, 2015 at 2:50 PM by bstabile, version 22