-
Notifications
You must be signed in to change notification settings - Fork 52
Open
Description
Add YugabyteDB as a Connector in Unstructured
YugabyteDB would be a valuable addition to the list of supported connectors for unstructured-ingest. It is a distributed SQL database designed for high performance and global-scale workloads, while maintaining compatibility with PostgreSQL tooling and drivers.
Why YugabyteDB?
- PostgreSQL compatibility: YugabyteDB supports the PostgreSQL wire protocol and ecosystem, which allows most PostgreSQL tools and drivers to work seamlessly.
- Distributed & scalable: Built to scale horizontally with fault tolerance and low-latency reads/writes.
- Vector capabilities: YugabyteDB can store vector data and—through its PostgreSQL compatibility (e.g., extensions like
pgvector) or native support where available—can be used for similarity search and other vector-based ML workflows. When combined with appropriate indexing and query patterns, it enables scalable vector workloads on a distributed SQL foundation. - Native Python drivers: YugabyteDB provides Python drivers and client libraries adapted for its distributed environment, ensuring efficient integration and operational features beyond a vanilla PostgreSQL client.
Proposal
- Add YugabyteDB as a first-class connector in
unstructured-ingest. - Ensure ingestion, transformation, and document parsing pipelines can natively read from and write to YugabyteDB.
- Leverage its PostgreSQL compatibility to reuse existing patterns where possible, while accommodating YugabyteDB-specific optimizations through its dedicated Python driver.
We (the Yugabyte team) are willing to contribute to this connector or collaborate closely on its development. Please let us know how we can best support and assist in making this happen.
ashetkar
Metadata
Metadata
Assignees
Labels
No labels