This lists some possible improvements to Vespa that have been considered or requested, can be developed relatively independently of other work, and are not yet under development. For more information on the code structure in Vespa, see Code-map.md.
Effort: Low
Difficulty: Low
Skills: Java
Query profiles make it simple to support multiple buckets, behavior profiles for different use cases, etc. by providing bundles of parameters accessible to Searchers processing queries. Writes go through a similar chain of processors - Document Processors, but have no equivalent support for parametrization. This is to allow configuration of document processor profiles by reusing the query profile support also for document processors.
Code pointers:
Effort: Medium
Difficulty: Low
Skills: Java
There is currently support for creating Application instances programmatically in Java to unit-test application package functionality (see com.yahoo.application.Application). However, only Java component functionality can be tested in this way as the content layer is not available, being implemented in C++. A Java implementation, of some or all of the functionality would enable developers to do more testing locally within their IDE. This is medium effort because performance is not a concern and some components, such as ranking expressions and features are already available as libraries (see the searchlib module).
Code pointers:
- Content cluster mock in Java (currently empty): ContentCluster
- The model of a search definition must consume config from: Search
Effort: Medium
Difficulty: Medium
Skills: C++, multithreading, performance, indexing, data structures
Vespa supports maps and making them searchable in memory by declaring as an attribute. However, maps cannot be indexed as text-search disk indexes.
Code pointers:
Effort: High
Difficulty: High
Skills: C++, Java, distributed systems, performance, multithreading, network, distributed consistency
Vespa instances distribute data automatically within clusters, but these clusters are meant to consist of co-located machines - the distribution algorithm is not suitable for global distribution across data centers because it cannot seamlessly tolerate data center-wide outages and does not attempt to minimize bandwidth usage between data centers. The application usually achieves global presence instead by setting up multiple independent instances in different data centers and write to all in parallel. This is robust and works well on average, but puts an additional burden on applications to achieve cross-datacenter data consistency on datacenter failures, and does not enable automatic data recovery across data centers, such that data redundancy is effectively required within each data center. This is fine in most cases, but not in the case where storage space drives cost and intermittent loss of data coverage (completeness as seen from queries) is tolerable.
A solution should sustain current write rates (tens of thousands of writes per node per second), sustain write and read rates on loss of connectivity to one (any) data center, re-establish global data consistency when a lost data center is recovered and support some degree of the tradeoff between consistency and operation latency (although the exact modes to be supported is part of the design and analysis needed).
Code pointers:
Effort: High
Difficulty: High
Skills: Java, C++, distributed systems, performance, networking, distributed consistency
Tensors in ranking models may either be passed with the query, be part of the document or be configured as part of the application package (global tensors). This is fine for many kinds of models but does not support the case of really large tensors (which barely fit in memory) and/or dynamically changing tensors (online learning of global models). These use cases require support for global tensors (tensors available locally on all content nodes during execution but not sent with the query or residing in documents) which are not configured as part of the application package but which are written independently and dynamically update-able at a high write rate. To support this at a large scale, with a high write rate, we need a small cluster of nodes storing the source of truth of the global tensor which has perfect consistency. This in turn must push updates to all content nodes in a best-effort fashion given a fixed bandwidth budget, such that query execution and document write traffic is prioritized over ensuring perfect consistency of global model updates.
Code pointers:
Effort: Low
Difficulty: Low
Skills: Knowledge of a decent HTTP/2 library in some language
/document/v1 is a RESTified HTTP API that exposes the Vespa Document API to the outside of the application's Java containers. The design of this API is simple, with each operation modeled as a single HTTP request, and its result as a single HTTP response. While it was previously not possible to achieve comparable throughput using this API to what the undocumented, custom-protocol /feedapi offered, this changed with HTTP/2 support in Vespa. The clean design of /document/v1 makes it easy to interface with any language and runtime that supports HTTP/2. An implementation currently only exists for Java and requires a JDK8+ runtime, and implementations in other languages are very welcome. The below pseudocode could be a starting point for an asynchronous implementation with futures and promises.
Let http
be an asynchronous HTTP/2 client, which returns a future
for each request.
A future
will complete some time in the future, at which point dependent computations
will trigger, depending on the result of the operation. A future
is obtained from a
promise
, and completed when the promise
is completed. An efficient feed client is then:
inflight = map<document_id, promise>()
func dispatch(operation: request, result: promise, attempt: int): void
http.send(operation).when_complete(response => handle(operation, response, result, attempt))
func handle(operation: request, response: response, result: promise, attempt: int): void
if retry(response, attempt):
dispatch(operation, result, attempt + 1)
else:
result.complete(response)
func enqueue(operation): future
result_promise = promise()
result = result_promise.get_future()
previous = inflight.put(document.id, result) # store `result` under `id` and obtain previous mapping
if previous == NIL:
while inflight.size >= max_inflight(): wait()
dispatch(operation, result, 1)
else:
previous.when_complete(ignored => dispatch(operation, result, 1))
result.when_complete(ignored => inflight.remove_value(result)) # remove mapping unless it has been replaced
return result
Apply synchronization as necessary. The inflight
map is used to serialize multiple operations
to the same document id: the mapped entry for each id is the tail of a linked queue where new
dependents may be added, while the queue is emptied from the head one entry at a time, whenever
a dependency (previous
) completes computation. enqueue
blocks until there is room in the client.
Code pointers: