Data Retrieval API
The Amorphic Data retrieval API provides direct, query-based access to the row-level data stored within an Amorphic dataset. It allows you to retrieve the actual data contained within the source files, making it a powerful tool for previews, and integrations.
Currently, this functionality is supported for datasets of type Lake Formation (LF), S3 Athena, and Redshift.
Overview
Dataset data retrieval API provides direct access to dataset contents, enabling you to programmatically query and retrieve data for integration with external applications. API Endpoint:
- Method:
GET - Path:
/datasets/{id}/data
Filtering Mechanism
At its core, the API operates by running a SQL query behind the scenes against your data source.
Path Parameters
| Parameter | Description |
|---|---|
DatasetID | The dataset ID of an Amorphic dataset from which you want to consume data. |
Query Parameters
Filtering Data
The data retrieval API allows filtering and sorting the data based on dataset-specific attributes. It accepts a query parameter filter_condition which holds value containing our filter conditions.
Basic Filter Example:
/datasets/{id}/data?filter_condition=field1:"value1"
This will filter out records from the dataset that have data attribute field1 value equal to value1.
Range Filter Example:
filter_condition=field1:"[value1 TO value2]"
The API accepts TO to specify a range from one value to another.
Wildcard Matching:
filter_condition=field1:"*xyz*"
You can provide regex operators *, % to match fields that match a certain value.
Multiple Filters:
filter_condition=field1:"value1" AND field2:"[value2 TO value3]" NOT field3:"value4"
Multiple filters can be combined using AND, OR, NOT operators.
- Field values inside
""(double quotes) must not contain leading and trailing white spaces. - Invalid:
field:" value " - Valid:
field:"value" - This rule applies to range-type filter conditions as well.
Data Arrangement
The API accepts query parameters:
sortField- Sort the data based on a dataset attributesortOrder- Specify the sort order (ASC/DESC)
Boolean Operators
Boolean operators allow terms to be combined through logic operators. The Portal API supports AND, OR, NOT as Boolean operators.
Boolean operators must be in all caps.
AND
The AND operator finds records where all specified conditions are met. It acts as an intersection.
Example:
filter_condition=field1:"value1" AND field2:"value2"
OR
The OR operator finds records where at least one of the specified conditions is met. It acts as a union.
Example:
filter_condition=field1:"value1" OR field2:"value2"
NOT
The NOT operator excludes records that contain the term specified after NOT. It acts as a difference.
Example:
filter_condition=field1:"value1" NOT field2:"value2"
Operator Usage Rules
- Do not provide operator at the start of filter condition (except NOT)
- Invalid:
filter_condition=AND field:"value" - Valid:
filter_condition=NOT field:"value"
- Invalid:
- Do not insert consecutive operators in URL
- Invalid:
filter_condition=field:"value" AND OR field:"value" - Valid:
filter_condition=field:"value" AND NOT field:"value" - Invalid:
filter_condition=field:"value" NOT NOT field:"value"
- Invalid:
- Only NOT operator can be used consecutively with AND/OR
- Valid:
filter_condition=field:"value" AND NOT field:"value" - Valid:
filter_condition=field:"value" OR NOT field:"value"
- Valid:
Range Searches
Range searches allow you to match values for a field that fall between a lower and upper bound. These queries can be inclusive or exclusive of the bounds.
| Type | Syntax | Description |
|---|---|---|
| Inclusive Range | [ ] | Use square brackets to include the lower and upper bounds in the search. |
| Exclusive Range | { } | Use curly braces to exclude the lower and upper bounds from the search. |
Range searches are not just for date or numeric fields; they can also be used for text fields, where they will match based on alphabetical order.
Examples
Inclusive Numeric Range:
filter_condition=age:"[20 TO 23]"
This finds all records with an age between 20 and 23 (inclusive).
Exclusive Text Range:
filter_condition=product_code:"{PROD-A TO PROD-M}"
This will match codes like PROD-B and PROD-L, but it will not match PROD-A or PROD-M.
Response Syntax
Response Properties
| Property | Details |
|---|---|
columns | A JSON array of strings. Each string is a column name, defining the structure of the objects in the data array. |
data | A JSON array of item objects, where each object represents a single row of data from the dataset. |
total | The total number of records in the dataset that match the applied filter_condition. |
count | The number of results included in the current response page. This value corresponds to the limit query parameter. |
nextStart | A JSON object that provides information for retrieving the next page of results. It contains nextAvailable (boolean) and nextStart (the next starting offset). |
JSON Response Structure
{
"columns": [
"<COLUMN_NAME_1>",
"<COLUMN_NAME_2>",
"..."
],
"data": [
{
"<COLUMN_NAME_1>": "<value1>",
"<COLUMN_NAME_2>": "<value2>",
"...": "..."
}
],
"total": <total_record_count>,
"count": <records_in_this_response>,
"nextStart": {
"nextAvailable": <true_or_false>,
"nextStart": <next_page_start_index>
}
}
Example Request and Response
Request URL
https://{amorphic api endpoint}/data?filter_condition=spending_score%3A%22%5B50%20TO%2090%5D%22%20AND%20NOT%20gender%3A%22Female%22&limit=10&sortField=spending_score&sortOrder=DESC
The filter condition is URL encoded. The decoded string is: spending_score:"[50 TO 90]" AND NOT gender:"Female"
Response
{
"columns": [
"CUSTOMERID",
"GENDER",
"AGE",
"ANNUAL_INCOME",
"SPENDING_SCORE"
],
"data": [
{
"CUSTOMERID": 147,
"GENDER": "Male",
"AGE": 48,
"ANNUAL_INCOME": 90104532,
"SPENDING_SCORE": 63
},
{
"CUSTOMERID": 201,
"GENDER": "Male",
"AGE": 48,
"ANNUAL_INCOME": 90148976,
"SPENDING_SCORE": 63
},
{
"CUSTOMERID": 255,
"GENDER": "Male",
"AGE": 48,
"ANNUAL_INCOME": 90182310,
"SPENDING_SCORE": 63
}
],
"total": 1288,
"count": 10,
"nextStart": {
"nextAvailable": true,
"nextStart": 10
}
}