This project is read-only.

Introduction

The compression of time series data usually involves writing a lot of boilerplate code that seldom gets reused efficiently across project or enterprise boundaries. To make matters worse, general purpose compression algorithms are not designed to exploit some of the useful characteristics of this type of data. And thus, performance in terms of both space and time is usually quite dismal.

A typical approach to improve the situation is to preprocess the data before feeding it to a general purpose compressor. In essence, we create a “pipeline” of transformations to improve the compressibility. This is a well-known and widely used pattern.

The framework described here provides a simple but solid foundation upon which to build arbitrarily sophisticated time series compression pipelines. But it also provides simple codecs that can be used effectively straight out of the box!

Without knowing anything about how things work internally, a new user can get started very easily:

DeltaCodec_SimpleUsage.png
Obviously, this is pseudo-code to show you that the codec automatically understands the data type of your list when encoding. When you are decoding you must call the generic method with an explicit type.

Intrinsic data types are encoded in block header bit fields. But explicit typing is preferred in method signatures because custom codecs are intended to handle arbitrarily complex data structures. The codecs included in the core library only understand how to handle intrinsic data types. But you can create a derivation that understands how to handle any data structure that you want.

For example, you could create a structure or class that has several intrinsic fields. Your codec would implement a method that handles a list of that type by creating individual vectors for each field and passing those to the low level inherited methods. Alternatively, you can create a method that accepts multiple lists of different data types as arguments. Either way, you simply need to serialize the encoded parts with appropriate header information.

In the future sample codecs will be added that show how easy it is to handle more complex data scenarios. You may then directly use the generic versions provided (if suitable), or you may opt to derive your own strongly typed codecs (or extension methods) by copying the example that most closely matches your requirements, adjusting it as needed.

NOTE: As of 1.2 additional methods have been added to process multiple fields of complex data structures ("Struples"). It is recommended that you only use this as guidance for implementing strongly-typed implementations. There are, however, tests that demonstrate usage in ad hoc situations.



Last edited Jun 16, 2015 at 6:32 PM by bstabile, version 22