Train SQL AI
This powerful feature allows users to enhance SQL AI's performance by providing custom training data specific to your datasets and use cases. By configuring and syncing knowledge bases, user can dramatically improve the accuracy and relevance of SQL AI's responses, making it better adapted to your unique data environment.
With Train SQL AI, user can:
- Associate training data with specific datasets
- Provide examples of successful SQL queries
- Add question-SQL pairs for better natural language understanding
- Include domain-specific documentation
- Keep your knowledge base updated through automated and manual syncs
This guide will walk user through everything user need to know about training SQL AI to achieve optimal performance.
Understanding Training Data
As the number of resources in your system grows, SQL AI may face challenges in delivering precise answers for specific resources. Training data helps mitigate these issues by supplying targeted context, ensuring more reliable and accurate responses tailored to your specific datasets.
Why Training Data Matters
Training data significantly improves SQL AI by:
- Providing context specific to your datasets
- Reducing inaccuracies or hallucinations in responses
- Helping the system understand your unique query patterns
- Aligning generated SQL with your business intent
- Incorporating domain-specific knowledge
Working with Training Data
Training Data Types
SQL AI supports three types of training data, each serving a different purpose in enhancing the system's capabilities:
Training Data Type | Purpose | Benefits |
---|---|---|
SQL | Add successfully executed or commonly used queries | Improves understanding of query patterns and syntax preferences |
QnA | Provide question-SQL pairs | Directly maps natural language questions to appropriate SQL queries |
Documentation | Include relevant business or technical documentation | Enhances contextual understanding of your data domain |
SQL Training Data
By adding examples of successful queries, user help SQL AI understand the typical query patterns used with your datasets. This is particularly useful for:
- Complex query structures specific to your data
- Commonly used filtering patterns
- Preferred aggregation methods
- Custom SQL functions user frequently use
QnA Training Data
Question-SQL pairs provide the most straightforward method for improving query accuracy. This approach:
- Helps the system understand the context of questions
- Creates direct mappings between natural language and SQL
- Is especially valuable when queries could be ambiguous
- Aligns SQL generation with user intent
Documentation Training Data
Including relevant documentation about your database, business, or industry helps SQL AI better understand the context of queries. This additional information:
- Improves the accuracy and relevance of responses
- Provides domain-specific knowledge
- Helps interpret industry-specific terminology
- Offers context about data relationships and business rules
Creating Training Data
Follow these steps to create new training data:
- Select
Create Training Data
from the Training Data section - Fill in the following details:
Attribute | Description |
---|---|
Document Name | A descriptive name for your training data resource |
Document Type | Select the type: SQL, QnA, or Documentation |
Associated Resource Type | The resource type to which the training data will be linked |
Associated Resource ID | The specific ID of the resource from the selected resource type |
- Upload the document containing your training data
Supported File Formats
SQL AI supports a wide range of file formats for training data:
Format | Extension |
---|---|
Plain Text | .txt |
Markdown | .md |
HyperText Markup Language | .html |
Microsoft Word Document | .doc / .docx |
Comma-Separated Values | .csv |
Microsoft Excel Spreadsheet | .xls / .xlsx |
Portable Document | .pdf |
- Keep training documents concise and focused on a specific topic
- For QnA pairs, include a diverse range of question phrasings
- For SQL examples, include comments explaining the purpose of each query
- Ensure documentation clearly explains domain-specific terminology
Managing Training Data
Downloading Training Data
Once a training document is created, user can download the attached document from the details page for review or updates.
Limitations
- Only one file can be attached per resource for each document type (SQL, QnA, or Documentation)
- After adding or modifying training data, a sync job must be run to update the context
Deleting Training Data
If user find that adding training data negatively impacts the model's responses:
- Navigate to the training data details page
- Select the delete option
- Confirm the deletion
- Run a sync job to update the context
All users with access to the training data's associated resource can utilize the enhanced SQL AI after the sync job has run.
Sync Jobs
Sync jobs are critical for keeping SQL AI's knowledge base updated with your latest data and training materials.
Understanding Sync Jobs
Sync jobs update the model's context with:
- Newly added or removed datasets
- Updates to training data
- Modified schema information
- New or deleted resources
Monitoring Sync Jobs
In the Sync Jobs section, user can:
- View all executed sync jobs
- Check their current status (completed, running, failed)
- See statistics on indexed, modified, or deleted documents
- Track when the last sync was performed
Running Manual Sync Jobs
To ensure your knowledge base is always up-to-date:
- Navigate to the Sync Jobs section
- Select the tenant for which user want to update the knowledge base
- Click "Run Sync Job"
- Monitor the job status until completion
Consider running a manual sync after:
- Adding new training data
- Modifying existing training documents
- Creating new datasets that should be available to SQL AI
- Making significant schema changes to existing datasets
Automated Sync Jobs
For convenience, SQL AI automatically runs sync jobs every 12 hours. However, manual syncs are recommended when immediate updates are needed.
When generating a query on resources across different tenants, training data from only one tenant will be considered in the context.
Best Practices for Training SQL AI
To get the best results from your trained SQL AI:
- Start small - Begin with a few high-quality training examples before expanding
- Focus on common queries - Prioritize training for frequently asked questions
- Update regularly - Review and refresh training data as your data and queries evolve
- Test thoroughly - After training, test SQL AI with various questions to validate improvements
- Be specific - Provide clear, unambiguous examples in your training data
- Include edge cases - Help SQL AI handle unusual or complex query scenarios