Skip to main content
Version: v3.0 print this page

Train SQL AI

This powerful feature allows users to enhance SQL AI's performance by providing custom training data specific to your datasets and use cases. By configuring and syncing knowledge bases, user can dramatically improve the accuracy and relevance of SQL AI's responses, making it better adapted to your unique data environment.

With Train SQL AI, user can:

  • Associate training data with specific datasets
  • Provide examples of successful SQL queries
  • Add question-SQL pairs for better natural language understanding
  • Include domain-specific documentation
  • Keep your knowledge base updated through automated and manual syncs

This guide will walk user through everything user need to know about training SQL AI to achieve optimal performance.

train-sql-ai

Understanding Training Data

As the number of resources in your system grows, SQL AI may face challenges in delivering precise answers for specific resources. Training data helps mitigate these issues by supplying targeted context, ensuring more reliable and accurate responses tailored to your specific datasets.

Why Training Data Matters

Training data significantly improves SQL AI by:

  • Providing context specific to your datasets
  • Reducing inaccuracies or hallucinations in responses
  • Helping the system understand your unique query patterns
  • Aligning generated SQL with your business intent
  • Incorporating domain-specific knowledge

Working with Training Data

Training Data Types

SQL AI supports three types of training data, each serving a different purpose in enhancing the system's capabilities:

Training Data TypePurposeBenefits
SQLAdd successfully executed or commonly used queriesImproves understanding of query patterns and syntax preferences
QnAProvide question-SQL pairsDirectly maps natural language questions to appropriate SQL queries
DocumentationInclude relevant business or technical documentationEnhances contextual understanding of your data domain

SQL Training Data

By adding examples of successful queries, user help SQL AI understand the typical query patterns used with your datasets. This is particularly useful for:

  • Complex query structures specific to your data
  • Commonly used filtering patterns
  • Preferred aggregation methods
  • Custom SQL functions user frequently use

QnA Training Data

Question-SQL pairs provide the most straightforward method for improving query accuracy. This approach:

  • Helps the system understand the context of questions
  • Creates direct mappings between natural language and SQL
  • Is especially valuable when queries could be ambiguous
  • Aligns SQL generation with user intent

Documentation Training Data

Including relevant documentation about your database, business, or industry helps SQL AI better understand the context of queries. This additional information:

  • Improves the accuracy and relevance of responses
  • Provides domain-specific knowledge
  • Helps interpret industry-specific terminology
  • Offers context about data relationships and business rules

Creating Training Data

Follow these steps to create new training data:

  1. Select Create Training Data from the Training Data section
  2. Fill in the following details:
AttributeDescription
Document NameA descriptive name for your training data resource
Document TypeSelect the type: SQL, QnA, or Documentation
Associated Resource TypeThe resource type to which the training data will be linked
Associated Resource IDThe specific ID of the resource from the selected resource type
  1. Upload the document containing your training data

Supported File Formats

SQL AI supports a wide range of file formats for training data:

FormatExtension
Plain Text.txt
Markdown.md
HyperText Markup Language.html
Microsoft Word Document.doc / .docx
Comma-Separated Values.csv
Microsoft Excel Spreadsheet.xls / .xlsx
Portable Document.pdf
Best Practices
  • Keep training documents concise and focused on a specific topic
  • For QnA pairs, include a diverse range of question phrasings
  • For SQL examples, include comments explaining the purpose of each query
  • Ensure documentation clearly explains domain-specific terminology

train-sql-ai

Managing Training Data

Downloading Training Data

Once a training document is created, user can download the attached document from the details page for review or updates.

Limitations

  • Only one file can be attached per resource for each document type (SQL, QnA, or Documentation)
  • After adding or modifying training data, a sync job must be run to update the context

Deleting Training Data

If user find that adding training data negatively impacts the model's responses:

  1. Navigate to the training data details page
  2. Select the delete option
  3. Confirm the deletion
  4. Run a sync job to update the context
Note

All users with access to the training data's associated resource can utilize the enhanced SQL AI after the sync job has run.

Sync Jobs

Sync jobs are critical for keeping SQL AI's knowledge base updated with your latest data and training materials.

Understanding Sync Jobs

Sync jobs update the model's context with:

  • Newly added or removed datasets
  • Updates to training data
  • Modified schema information
  • New or deleted resources

Monitoring Sync Jobs

In the Sync Jobs section, user can:

  • View all executed sync jobs
  • Check their current status (completed, running, failed)
  • See statistics on indexed, modified, or deleted documents
  • Track when the last sync was performed

Running Manual Sync Jobs

To ensure your knowledge base is always up-to-date:

  1. Navigate to the Sync Jobs section
  2. Select the tenant for which user want to update the knowledge base
  3. Click "Run Sync Job"
  4. Monitor the job status until completion
When to Run Manual Syncs

Consider running a manual sync after:

  • Adding new training data
  • Modifying existing training documents
  • Creating new datasets that should be available to SQL AI
  • Making significant schema changes to existing datasets

Automated Sync Jobs

For convenience, SQL AI automatically runs sync jobs every 12 hours. However, manual syncs are recommended when immediate updates are needed.

Note

When generating a query on resources across different tenants, training data from only one tenant will be considered in the context.

sync-jobs

Best Practices for Training SQL AI

To get the best results from your trained SQL AI:

  1. Start small - Begin with a few high-quality training examples before expanding
  2. Focus on common queries - Prioritize training for frequently asked questions
  3. Update regularly - Review and refresh training data as your data and queries evolve
  4. Test thoroughly - After training, test SQL AI with various questions to validate improvements
  5. Be specific - Provide clear, unambiguous examples in your training data
  6. Include edge cases - Help SQL AI handle unusual or complex query scenarios