SQL AI - Your SQL Assistant
SQL AI, your intelligent SQL assistant in the Amorphic Cloud Platform! This powerful feature leverages advanced Large Language Models (LLMs) to simplify your data querying experience. Whether you're a SQL expert or new to database queries, SQL AI helps you interact with your datasets using natural language, making data exploration more intuitive and accessible.
With SQL AI, you can:
- Convert natural language questions into SQL queries
- Generate SQL code without writing complex syntax
- Get explanations for existing SQL queries
- Receive suggestions for exploring your datasets
- Interact with your data using plain English
This guide will walk you through everything you need to know to leverage SQL AI's capabilities effectively.
Getting Started
Overview
SQL AI is a generative AI add-on that bridges the gap between natural language and SQL. It enables you to:
- Ask questions about your data in plain English
- Generate ready-to-use SQL queries from conversational prompts
- Understand existing SQL code through clear explanations
The system supports domain-specific queries, ensuring datasets in different domains are easily distinguishable, even if they share the same name.
Supported Data Sources
SQL AI currently supports:
-
S3-Athena datasets
- Standard S3 datasets accessible through AWS Athena
- Supports querying structured data in various formats (Parquet, ORC, CSV)
-
Redshift datasets
- Data stored in Amazon Redshift clusters
- Supports both provisioned and Redshift Serverless workgroups
-
System datasets
- Datasets created using Amorphic's system features
- Includes sys_cost_analysis_table, api-sys_cloudtrail_logs, etc. tables to query upon (System Datasets)
All datasets must be properly registered in the Amorphic platform to be accessible via SQL AI. For optimal performance, ensure that your datasets have complete and accurate metadata configured.
Key Features
Natural Language to SQL Translation
SQL AI leverages powerful LLMs to interpret your inquiries in natural language, converting them into executable SQL queries. This reduces the need for technical expertise, allowing you to ask questions in plain language while receiving the corresponding SQL code or results.
Feature | Description |
---|---|
Natural language processing | Converts English questions into SQL queries |
Domain-specific queries | Distinguishes between datasets with the same name in different domains |
Case-sensitive querying | Matches resource names exactly as created |
- When querying datasets with the same name but in different domains, use
[domain-name].[dataset-name]
for accurate results. (Querying is case-sensitive; resource names must match their exact case as created.) - Alternatively, you can select the dataset from the dropdown list.
- It is recommended to use clear column names and descriptions in datasets to provide better context, improving query accuracy.
- Generated queries are based on the schema of the data source and not the actual data.
Interactive Query Categories
SQL AI supports three main categories of interaction:
Category | Description | Example |
---|---|---|
Natural Language Suggestions | Get insights and suggestions about your data | "Suggest some queries I can run on the sales dataset." |
Query Generation | Convert natural language into SQL code | "Generate a query to find the top 10 salespeople by revenue." |
Query Explanation | Understand what existing SQL code does | "Explain this query: SELECT * FROM orders WHERE order_date > '2023-01-01'" |
Once a query is generated, you can review it and execute it directly from the playground using the Run Query
button in the right corner of the query tab.
Model Customization
SQL AI offers flexibility in choosing between multiple LLM models for query generation and explanation:
Available Models | Characteristics |
---|---|
Claude 3 Haiku | Faster response time, good for simple queries |
Claude 3 Sonnet | More comprehensive responses, better for complex queries |
Select the model that best suits your needs based on query complexity and response time requirements.
Resource Selection
You can prompt SQL AI on resources in the following ways:
- Select the resource from the dropdown list
- Enter the dataset name in your prompt
- Enter the domain name followed by the dataset name in your prompt
- At least one resource must be selected for query suggestions or generation-related prompts to work
- For optimal results, select only the resources relevant to your question
- Selecting more than 20 resources is not recommended, as responses may become inaccurate due to context limitations
Use Cases
Natural Language Suggestions
When you need ideas for exploring your data, SQL AI can suggest useful queries.
Example Prompt: "Suggest some queries I can run on the customer_orders dataset."
SQL AI Response: Here are some queries you might find useful:
- "Show me the total number of orders by customer"
- "Find the average order value by month"
- "Identify customers who haven't placed an order in the last 6 months"
SQL Generation
When you need to query your data but prefer not to write SQL code manually:
Example Prompt: "Generate a query to retrieve top 10 salespeople sorted in ascending order by sales amount."
SQL AI Response:
SELECT
salesperson_name,
SUM(sales_amount) as total_sales
FROM
sales_domain.sales_data
GROUP BY
salesperson_name
ORDER BY
total_sales ASC
LIMIT 10;
SQL Explanation
When you need to understand an existing SQL query:
Example Prompt: "Explain this query: SELECT * FROM test_domain.test_dataset WHERE DATE_FORMAT(ModifiedDate, '%Y') = '2006';"
SQL AI Response: This query is retrieving all columns from the table
test_dataset
in thetest_domain
schema where the year part of theModifiedDate
column equals '2006'.Breaking it down:
SELECT *
retrieves all columnsFROM test_domain.test_dataset
specifies the data sourceWHERE DATE_FORMAT(ModifiedDate, '%Y') = '2006'
filters for records from the year 2006- The
DATE_FORMAT
function extracts just the year portion from the date
Occasionally, implementing the suggested responses verbatim may not produce optimal results. For best outcomes, please incorporate prefixes such as 'Generate a query or SQL' when formulating your questions.
Limitations and Considerations
- The chatbot is limited to SQL-related actions mentioned above and cannot answer questions outside this context
- Generated queries might not always be correct; precise prompts lead to more precise queries
- Follow-up prompting on query results is not optimal
- Error identification and correction in generated SQL queries is not currently supported
- For Redshift resources outside the default tenant, you need to manually add the tenant name as the generated queries use only the
<domain>.<table_name>
format instead of the required<tenant_name>.<domain_name>.<table_name>
syntax
Tips for Better Results
To get the most out of SQL AI, consider these best practices:
- Be specific in your prompts - Clearly state what data you're looking for
- Mention column names when possible to improve query accuracy
- Start simple and build complexity through follow-up prompts
- Review generated queries before execution to ensure they match your intent
- Provide context about your data and what you're trying to achieve
Using clear and descriptive column names in your datasets will significantly improve SQL AI's ability to generate accurate queries. Consider adding comments and descriptions to your data schema when possible.