Version: v3.0 print this page

Train SQL AI

This powerful feature allows users to enhance SQL AI's performance by providing custom training data specific to your datasets and use cases. By configuring and syncing knowledge bases, user can dramatically improve the accuracy and relevance of SQL AI's responses, making it better adapted to your unique data environment.

With Train SQL AI, user can:

Associate training data with specific datasets
Provide examples of successful SQL queries
Add question-SQL pairs for better natural language understanding
Include domain-specific documentation
Keep your knowledge base updated through automated and manual syncs

This guide will walk user through everything user need to know about training SQL AI to achieve optimal performance.

train-sql-ai

Understanding Training Data

As the number of resources in your system grows, SQL AI may face challenges in delivering precise answers for specific resources. Training data helps mitigate these issues by supplying targeted context, ensuring more reliable and accurate responses tailored to your specific datasets.

Why Training Data Matters

Training data significantly improves SQL AI by:

Providing context specific to your datasets
Reducing inaccuracies or hallucinations in responses
Helping the system understand your unique query patterns
Aligning generated SQL with your business intent
Incorporating domain-specific knowledge

Working with Training Data

Training Data Types

SQL AI supports three types of training data, each serving a different purpose in enhancing the system's capabilities:

Training Data Type	Purpose	Benefits
SQL	Add successfully executed or commonly used queries	Improves understanding of query patterns and syntax preferences
QnA	Provide question-SQL pairs	Directly maps natural language questions to appropriate SQL queries
Documentation	Include relevant business or technical documentation	Enhances contextual understanding of your data domain

SQL Training Data

By adding examples of successful queries, user help SQL AI understand the typical query patterns used with your datasets. This is particularly useful for:

Complex query structures specific to your data
Commonly used filtering patterns
Preferred aggregation methods
Custom SQL functions user frequently use

QnA Training Data

Question-SQL pairs provide the most straightforward method for improving query accuracy. This approach:

Helps the system understand the context of questions
Creates direct mappings between natural language and SQL
Is especially valuable when queries could be ambiguous
Aligns SQL generation with user intent

Documentation Training Data

Including relevant documentation about your database, business, or industry helps SQL AI better understand the context of queries. This additional information:

Improves the accuracy and relevance of responses
Provides domain-specific knowledge
Helps interpret industry-specific terminology
Offers context about data relationships and business rules

Creating Training Data

Follow these steps to create new training data:

Select Create Training Data from the Training Data section
Fill in the following details:

Attribute	Description
Document Name	A descriptive name for your training data resource
Document Type	Select the type: SQL, QnA, or Documentation
Associated Resource Type	The resource type to which the training data will be linked
Associated Resource ID	The specific ID of the resource from the selected resource type

Upload the document containing your training data

Supported File Formats

SQL AI supports a wide range of file formats for training data:

Format	Extension
Plain Text	`.txt`
Markdown	`.md`
HyperText Markup Language	`.html`
Microsoft Word Document	`.doc` / `.docx`
Comma-Separated Values	`.csv`
Microsoft Excel Spreadsheet	`.xls` / `.xlsx`
Portable Document	`.pdf`

Best Practices

Keep training documents concise and focused on a specific topic
For QnA pairs, include a diverse range of question phrasings
For SQL examples, include comments explaining the purpose of each query
Ensure documentation clearly explains domain-specific terminology

train-sql-ai

Managing Training Data

Downloading Training Data

Once a training document is created, user can download the attached document from the details page for review or updates.

Limitations

Only one file can be attached per resource for each document type (SQL, QnA, or Documentation)
After adding or modifying training data, a sync job must be run to update the context

Deleting Training Data

If user find that adding training data negatively impacts the model's responses:

Navigate to the training data details page
Select the delete option
Confirm the deletion
Run a sync job to update the context

Note

All users with access to the training data's associated resource can utilize the enhanced SQL AI after the sync job has run.

Sync Jobs

Sync jobs are critical for keeping SQL AI's knowledge base updated with your latest data and training materials.

Understanding Sync Jobs

Sync jobs update the model's context with:

Newly added or removed datasets
Updates to training data
Modified schema information
New or deleted resources

Monitoring Sync Jobs

In the Sync Jobs section, user can:

View all executed sync jobs
Check their current status (completed, running, failed)
See statistics on indexed, modified, or deleted documents
Track when the last sync was performed

Running Manual Sync Jobs

To ensure your knowledge base is always up-to-date:

Navigate to the Sync Jobs section
Select the tenant for which user want to update the knowledge base
Click "Run Sync Job"
Monitor the job status until completion

When to Run Manual Syncs

Consider running a manual sync after:

Adding new training data
Modifying existing training documents
Creating new datasets that should be available to SQL AI
Making significant schema changes to existing datasets

Automated Sync Jobs

For convenience, SQL AI automatically runs sync jobs every 12 hours. However, manual syncs are recommended when immediate updates are needed.

Note

When generating a query on resources across different tenants, training data from only one tenant will be considered in the context.

sync-jobs

Best Practices for Training SQL AI

To get the best results from your trained SQL AI:

Start small - Begin with a few high-quality training examples before expanding
Focus on common queries - Prioritize training for frequently asked questions
Update regularly - Review and refresh training data as your data and queries evolve
Test thoroughly - After training, test SQL AI with various questions to validate improvements
Be specific - Provide clear, unambiguous examples in your training data
Include edge cases - Help SQL AI handle unusual or complex query scenarios

Understanding Training Data​

Why Training Data Matters​

Working with Training Data​

Training Data Types​

SQL Training Data​

QnA Training Data​

Documentation Training Data​

Creating Training Data​

Supported File Formats​

Managing Training Data​

Downloading Training Data​

Limitations​

Deleting Training Data​

Sync Jobs​

Understanding Sync Jobs​

Monitoring Sync Jobs​

Running Manual Sync Jobs​

Automated Sync Jobs​

Best Practices for Training SQL AI​