Skip to content

Investigate different tools for minimizing footprint of common data types #349

@cjnolet

Description

@cjnolet

Currently for the event store we have the expiration and timestamp values placed in the VALUE portion of each shard keyvalue. We've experimented a little with different formats that lend themselves to varying levels of overall compressability- especially considering Accumulo is gzipping each r-file at the block level (with configured block sizes between 100k and 500k on average).

Sqrrl has mentioned Apache Gora to me before and I know they use that for some of their encodings. Thrift/Protobuf also has some good algorithms for minimizing encoded information.

Staying away from non-text unicode bytes will ultimately respond the best to being gzipped but we really need to investigate all of our options and figure out which method seems to have the best performance, lowest complexity, and lowest footprint.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions