Skip to content

[Feature] Create integrations with different storage engines with the Da Vinci Record Transformer (DVRT) #1563

@kvargha

Description

@kvargha

Willingness to contribute

No. I cannot contribute at this time.

Feature Request Proposal

The Da Vinci Record Transformer is an API that allows users to hook into the lifecycle of the Da Vinci Client.

This feature was built by the open-source community to give more leverage to our users. To showcase the power of DVRT, we would like to build integrations with it that can used by others or be used an example of what can be done with it.

One example integration done with DVRT was with DuckDB. This allows users to run SQL OLAP queries against Venice datasets, which was not previously possible due to Venice's key-value access pattern.

Motivation

Since this is a brand new API that is directly exposed to users, we would like to showcase what is possible with it to encourage our users to onboard to it. Doing this exercise also helps identify any gaps in the DVRT abstract class.

Details

Integrations will be built on top the DaVinciRecordTransformer abstract class. Please use the DuckDB integration as a frame of reference when developing.

Since Venice is a key-value database and DuckDB is a SQL OLAP database, we would like to have integrations with different types of databases. Some examples would be: graph databases or search engines. Please keep in mind performance and community usage when selecting a database to integrate with.

What component(s) does this affect?

  • Controller: This is the control-plane for Venice. Used to create/update/query stores and their metadata.
  • Router: This is the stateless query-routing layer for serving read requests.
  • Server: This is the component that persists all the store data.
  • VenicePushJob: This is the component that pushes derived data from Hadoop to Venice backend.
  • VenicePulsarSink: This is a Sink connector for Apache Pulsar that pushes data from Pulsar into Venice.
  • Thin Client: This is a stateless client users use to query Venice Router for reading store data.
  • Fast Client: This is a stateful client users use to query Venice Server for reading store data.
  • Da Vinci Client: This is an embedded, stateful client that materializes store data locally.
  • Samza: This is the library users use to make nearline updates to store data.
  • Admin Tool: This is the stand-alone client used for ad-hoc operations on Venice.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions