Skip to content

[Feature] Include D2Server as part of Venice set up #2150

@xunyin8

Description

@xunyin8

Willingness to contribute

Yes. I would be willing to contribute a fix for this bug with guidance from the Venice community.

Feature Request Proposal

The proposal is to add an option to start and configure D2Server as part of Venice set up. Similar to how we have the option to spin up and run our own Helix service as part of the Venice controller.

Motivation

Many Venice features such as cluster discovery, request based metadata for FC and DVC relies on D2. We do have our own D2 set up as part of testing, D2TestUtils. However, in production users would need to configure and start their own D2Server and services. The intricacies are not well documented and it becomes a major barrier to use Venice.

Details

We can use the existing D2TestUtils as a starting point. Many of the default/hard coded configs are reasonable but we can always refactor the class and introduce new configs to allow customization:

    long startupTimeoutMillis = 5000;
    boolean continueIfStartupFails = false;
    long shutdownTimeoutMillis = 5000;
    boolean continueIfShutdownFails = true;
    boolean doNotStart = false;
    boolean delayStart = true;
    boolean initMarkUp = true;
    boolean healthCheckEnabled = false;
    long healthCheckInterval = 1000;
    int healthCheckRetries = 3;
    String healthCheckUrl = "";
    int d2HealthCheckerTimeoutMs = 500;

Similar to how we configure D2 in our test environment we need to add the option (controlled via a new config) to configure these ServiceDiscoveryAnnouncer in components that will be announcing to them.

The following D2 clusters/services need to be configured and announced to by different Venice components. In Venice we don't use D2's cluster feature and just set the same D2 cluster name as D2 service name. e.g. the D2 service name for discovery is venice-discovery and the d2 cluster name is also venice-discovery.

  1. venice-discovery all Venice routers regardless to which Venice cluster they belong to should announce to this.
  2. Router d2 services based on the Venice cluster name. See usages of D2TestUtils.setupD2Config in VeniceRouterWrapper. The corresponding routers in that Venice cluster to announce themselves to.
  3. Server d2 services based on the Venice cluster name. Similar as above.
  4. Optional: venice-controller since we already support the option to discover controller via controller URLs..

In order for cluster discovery and request based metadata to function we also need to configure two router configs so they can provide the corresponding mapping when discovering cluster for a store.

  1. cluster.to.d2 venice cluster to router d2 cluster/service (used for TC)
  2. cluster.to.server.d2 venice cluster to server d2 cluster/service (used for request based metadata for FC and DVC)

Here is an example setup, we have two venice clusters venice-0 and venice-1. The corresponding router and server d2 are just prefixed with router and server then we should configure the above configs as:
cluster.to.d2 = venice-0:router-venice-0,venice-1:router-venice-1
cluster.to.server.d2 = venice-0:server-venice-0,venice-1:server-venice-1

What component(s) does this bug affect?

  • Controller: This is the control-plane for Venice. Used to create/update/query stores and their metadata.
  • Router: This is the stateless query-routing layer for serving read requests.
  • Server: This is the component that persists all the store data.
  • VenicePushJob: This is the component that pushes derived data from Hadoop to Venice backend.
  • VenicePulsarSink: This is a Sink connector for Apache Pulsar that pushes data from Pulsar into Venice.
  • Thin Client: This is a stateless client users use to query Venice Router for reading store data.
  • Fast Client: This is a stateful client users use to query Venice Server for reading store data.
  • Da Vinci Client: This is an embedded, stateful client that materializes store data locally.
  • Samza: This is the library users use to make nearline updates to store data.
  • Admin Tool: This is the stand-alone client used for ad-hoc operations on Venice.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions