-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Natural Language Querying (NLQ) using genAI #1659
Comments
DesignUser ExperienceText To SQL
Document Analysis (Unstructured Use Case)
There will be a limit of Generate Metadata API calls performed per day or per day/team. If the number is surpassed, a comprehensive error message will appear in the top banner. Data analysisFor this use-case it is relevant to describe the different types of data and metadata that would serve as input to the generation of metadata. Depending on the data there will be different genAI workflows. Data.all S3 Datasets: (S3 Bucket + Glue database)
Data.all Redshift Datasets [v.2.7.0] : We need to keep it in mind for the design, but the feature won’t be implementing metadata in Redshift in its first release.
Data scenarios For column metadata generation (column name and column description):
For Table and Folder metadata generation:
For Dataset metadata generation
High Level DesignText To SQL
Document Analyzer
|
Problem statement
Data.all currently requires users to have technical knowledge of data.all datasets, glue tables, schemas, S3 buckets, folders and SQL querying in order to access and derive insights from the diverse structured and unstructured data sets available across the organization. This creates a significant barrier for non-technical business users who need to quickly and easily query data to make informed decisions. The problem is that there is a lack of intuitive, natural language-based interfaces that allow these users to ask questions in plain English and receive relevant, contextual data responses without requiring SQL expertise or specialized data extraction abilities.
Generative AI models offer a promising solution to bridge this gap by enabling natural language querying capabilities that understand user intent, extract data from both structured and unstructured sources, and generate dynamic responses tailored to the user's needs. This feature aims to empower non-technical users such as business analysts and executive decision makers to query and analyze structured and unstructured data using natural language querying by leveraging Generative AI (GenAI) capabilities to improve data accessibility and data-driven decision-making within data.all.
User Stories
Describe the solution you'd like
US1.
US1. As a Data Consumer, whether a non-technical business user, business analyst, or executive decision maker, I want to be able to query structured data in data.all using natural language, so that I can quickly find and retrieve the insights I need for my applications and decision-making processes.
Acceptance Criteria:
US2.
US2. As a Data Consumer, whether a non-technical business user, business analyst, or executive decision maker, I want to be able to query unstructured data sources in data.all using natural language, so that I can quickly find and retrieve the insights I need for my applications and decision-making processes.
Acceptance Criteria:
US3.
As a data.all developer and maintainer, I want the natural language query feature to be secure and respect data governance access permissions.
Acceptance Criteria:
US4.
As a data.all developer and maintainer, I want the natural language query feature to be configurable, scalable, reliable, and seamlessly integrated into the data.all platform, so that I can ensure a smooth and efficient user experience for all data.all users.
Acceptance Criteria:
US5.
As a data.all developer and maintainer, I want to be able to configure rate limits for the natural language query feature so that I can prevent overuse and ensure responsible access to the feature.
Acceptance Criteria:
###US6.
As a data.all developer and maintainer, I want the natural language query feature to clearly display a disclaimer about the limitations and confidentiality of the responses, so that I understand the context and boundaries of the AI-generated information.
Acceptance Criteria:
US7. (Future Scope)
As a data.all developer and maintainer, I want the natural language query feature to provide feedback functionality so that users can easily indicate if the response was helpful or not, which can then be used to improve the quality of future responses.
Acceptance Criteria:
Scope
Structured Data Query (SQL Generation & Execution)
Unstructured Data Query
Out Of Scope
Guardrails
The text was updated successfully, but these errors were encountered: