The library for those cabbage lovers out there who want to send data over the wire.
A revitalization of Pickling in the Scala 3 world.
When defining over-the-wire messages, do this:
import sauerkraut.core.{Buildable,Writer,given}
case class MyMessage(field: String, data: Int)
derives Buildable, WriterThen, when you need to serialize, pick a format and go:
import format.json.{Json,given}
import sauerkraut.{pickle,read,write}
val out = StringWriter()
pickle(Json).to(out).write(MyMessage("test", 1))
println(out.toString())
val msg = pickle(Json).from(out.toString()).read[MyMessage]Here's a feature matrix for each format:
| Format | Reader | Writer | All Types | Evolution Friendly | Notes |
|---|---|---|---|---|---|
| Json | Yes | Yes | Yes | Yes | Uses Jawn for parsing |
| Protos | Yes | Yes | Yes | Yes | Binary format evolution friendly format |
| NBT | Yes | Yes | Yes | For the kids. | |
| XML | Yes | Yes | Yes | Inefficient prototype. | |
| Pretty | No | Yes | No | For pretty-printing strings |
See Compliance for more details on what this means.
Everyone's favorite non-YAML web data transfer format! This uses Jawn under the covers for parsing, but can write Json without any dependencies.
Example:
import sauerkraut.{pickle,read,write}
import sauerkraut.core.{Buildable,Writer, given}
import sauerkraut.format.json.Json
case class MyWebData(value: Int, someStuff: Array[String])
derives Buildable, Writer
def read(in: java.io.InputStream): MyWebData =
pickle(Json).from(in).read[MyWebData]
def write(out: java.io.OutputStream): Unit =
pickle(Json).to(out).write(MyWebData(1214, Array("this", "is", "a", "test")))sbt build:
libraryDependencies += "com.jsuereth.sauerkraut" %% "json" % "<version>"See json project for more information.
A new encoding for protocol buffers within Scala! This supports a subset of all possible protocol buffer messages but allows full definition of the message format within your Scala code.
Example:
import sauerkraut.{pickle,write,read, Field}
import sauerkraut.core.{Writer, Buildable, given}
import sauerkraut.format.pb.{Proto,,given}
case class MyMessageData(value: Int @Field(3), someStuff: Array[String] @Field(2))
derives Writer, Buildable
def write(out: java.io.OutputStream): Unit =
pickle(Proto).to(out).write(MyMessageData(1214, Array("this", "is", "a", "test")))This example serializes to the equivalent of the following protocol buffer message:
message MyMessageData {
int32 value = 3;
repeated string someStuff = 2;
}sbt build:
libraryDependencies += "com.jsuereth.sauerkraut" %% "pb" % "<version>"See pb project for more information.
Named-Binary-Tags, a format popularized by Minecraft.
Example:
import sauerkraut.{pickle,read,write}
import sauerkraut.core.{Buildable,Writer, given}
import sauerkraut.format.nbt.Nbt
case class MyGameData(value: Int, someStuff: Array[String])
derives Buildable, Writer
def read(in: java.io.InputStream): MyGameData =
pickle(Nbt).from(in).read[MyGameData]
def write(out: java.io.OutputStream): Unit =
pickle(Nbt).to(out).write(MyGameData(1214, Array("this", "is", "a", "test")))sbt build:
libraryDependencies += "com.jsuereth.sauerkraut" %% "nbt" % "<version>"See nbt project for more information.
Everyone's favorite markup language for data transfer!
Example:
import sauerkraut.{pickle,read,write}
import sauerkraut.core.{Buildable,Writer, given}
import sauerkraut.format.xml.{Xml, given}
case class MySlowWebData(value: Int, someStuff: Array[String])
derives Buildable, Writer
def read(in: java.io.InputStream): MySlowWebData =
pickle(Xml).from(in).read[MySlowWebData]
def write(out: java.io.Writer): Unit =
pickle(Xml).to(out).write(MySlowWebData(1214, Array("this", "is", "a", "test")))sbt build:
libraryDependencies += "com.jsuereth.sauerkraut" %% "xml" % "<version>"See xml project for more information.
A format that is solely used to pretty-print object contents to strings. This does not have a [PickleReader] only a [PickleWriter].
Example:
import sauerkraut._, sauerkraut.core.{Writer,given}
case class MyAwesomeData(theBest: Int, theCoolest: String) derives Writer
scala> MyAwesomeData(1, "The Greatest").prettyPrint
val res0: String = Struct(rs$line$2.MyAwesomeData) {
theBest: 1
theCoolest: The Greatest
}We split Serialization into three layers:
- The
sourcelayer. It is expected these are some kind of stream. - The
Formatlayer. This is responsible for reading a raw source and converting into the component types used in theShapelayer. SeePickleReaderandPickleWriter. - The
Shapelayer. This is responsible for turning Primitives, Structs, Choices and Collections into component types.
It's the circle of data:
Source => format => shape => memory => shape => format => Destination
[PickleData] => PickleReader => Builder[T] => T => Writer[T] => PickleWriter => [PickleData]
This, hopefully, means we can reuse a lot of logic betwen various formats with light loss to efficiency.
Note: This library is not measuring performance yet.
The Shape layer is responsible for extracting Scala types into known shapes that can be used for
serialization. These shapes, current, are Collection, Structure and Primitive. Custom
shapes can be created in terms of these three shapes.
The Shape layer defines these three classes:
sauerkraut.core.Writer[T]: Can translate a value into write* calls of Primitive, Structure or Collection.sauerkraut.core.Builder[T]:
Can accept an incomiing stream of collections/structures/primitives and build a value of T from them.sauerkraut.core.Buildable[T]: Can provide aBuilder[T]when asked.
The format layer is responsible for mapping sauerkraut shapes (Collection, Structure, Primitive, Choice) into
the underlying format. Not all shapes in sauerkraut will map exactly to underlying formats, and so each
format may need to adjust/tweak incoming data as appropriate.
The format layer has these primary classes:
sauerkraut.format.PickleReader: Can load data and push it into a Builder of type Tsauerkraut.format.PickleWriter: Accepts pushed structures/collections/primitives and places it into a Pickle
The source layer is allowed to be any type that a format wishes to support. Inputs and outputs are
provided to the API via these two classes:
sauerkraut.format.PickleReaderSupport[Input, Format]: A given of this instance will allow thePickleReaderto be constructed from a type of input.sauerkraut.format.PickleWriterSupport[Output,Format]: A given of this instance will allowPickleWriterto be constructed from a type of output.
This layer is designed to support any type of input and output, not just an in-memory store (like a Json Ast) or a streaming input. Formats can define what types of input/output (or execution environment) they allow.
New formats are expected to provide the "format" + "source" layer implementations they require.
TODO - a bit more here.
There are a few major differences from the old scala pickling project.
- The core library is built for 100% static code generation. While we think that dynamic (i.e. runtime-reflection-based)
pickling could be built using this library, it is a non-goal.
- Users are expected to rely on typeclass derivation to generate Reader/Writers, rather than using macros
- The supported types that can be pickled are limited to the same supported by typeclass derivation or that
can have hand-written
Writer[_]/Builder[_]instances.
- Readers are no longer driven by the Scala type. Instead we use a new
Buildable[A]/Builder[A}design to allow eachPickleReaderto push value into aBuilder[A]that will then construct the scala class. - There have been no runtime performance optimisations around codegen. Those will come as we test the limits of Scala 3 / Dotty.
- Format implementations are separate libraries.
- The
PickleWritercontract has been split into several types to avoid misuse. This places a heavier amount of lambdas in play, but may be offsite with optimisations in modern versions of Scala/JVM. - The name is more German.
Benchmarking is still being built-out, and is pending the final design on Choice/Sum-Types within the Format/Shape layer.
You can see benchmark results via: benchmarks/jmh:run -rf csv.
Latest status/analysis can be found in the benchmarks directory.
- Basic comparison of all formats
- Size-of-Pickle measurement
- Well-thought out dataset for reading/writing
- Isolated read vs. write testing
- Comparison against other frameworks.
- Protos vs. protocol buffer java implementation
- Json Reading vs. raw JAWN to AST (measure overhead)
- Jackson
- Kryo
- Thrift
- Circe
- uPickle
- Automatic well-formatted graph dump in Markdown of results.
Thanks to everyone who contributed to the original pickling library for inspiration, with a few callouts.
- Heather Miller + Philipp Haller for the original idea, innovation and motivation for Scala.
- Havoc Pennington + Eugene Yokota for helping define what's important when pickling a protocol and evolving that protocol.