Skip to content

[FEATURE] Support more connectors by leveraging available open source #604

@stevereiner

Description

@stevereiner

What is the use case?
Needing more sources and targets sooner. Save development time
Describe the solution you'd like
Use connectors from
From query in place (have open source python connector code)
Swirl content focused sources
MindsDB data focused sources (would think less important)
Many sources and targets have python api /sdks code separately
Sources
Box, SharePoint, Dropbox, Nuxeo, Alfresco (me todo), CMIS, Adobe AEM,
AWS boto3 including S3, Microsoft azure-storage-blob, Google Cloud Client Libraries
MongoDB pymongo (as doc content source, not database)
PostgresSQL psycopg2 , SQLAlchemy (would think databases sources less needed)
Targets
OpenSearch, ElasticSearch (these are important)
Neo4j (neo4j, graphdatascience, neo4j-graphrag)
Weaviate, Qdrant, Pinecone pinecone package

Additional context
Was thinking unstructured.io has a ton of source and target connectors, would have some for the open source tier But these are a only for the enterprise UI and API
Enterprise
https://unstructured.io/enterprise has a picture of all under World Class Transformation and Orchestration ETL
https://docs.unstructured.io/ingestion/source-connectors/overview
https://docs.unstructured.io/ingestion/destination-connectors/overview
Open Source
Apache 2.0
For open source tier many supported file formats
https://docs.unstructured.io/open-source/introduction/supported-file-types
For open source tier data workflows for LLM
https://docs.unstructured.io/open-source/introduction/overview
Python apis for partitioning, cleaning, extracting, staging, chunking, staging, chunking embedding
Unstructured Github


❤️ Contributors, please refer to 📙Contributing Guide.
Unless the PR can be sent immediately (e.g. just a few lines of code), we recommend you to leave a comment on the issue like I'm working on it or Can I work on this issue? to avoid duplicating work. Our Discord server is always open and friendly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions