External API Datasource
From version 2.2, encryption(in-flight, at-rest) for all jobs and catalog is enabled. All the existing jobs(User created, and also system created) were updated with encryption related settings, and all the newly created jobs will have encryption enabled automatically.
External API datasources are used to import data from APIs to Amorphic Dataset. Only API Authentication of type BASIC is supported as of now.
Below are the ways to create an External API
BASIC
To create an External API datasource, user has to enter API Endpoint, HTTP Method and Query String Parameters. Below image shows how to create an External API Datasource
Attribute | Description |
---|---|
Datasource Name | Name of the datasource Amorphic |
Datasource Type | Type of datasource. In this case it is ExternalAPI |
Description | Datasource related information user wants to store |
Authorized Users | Amorphic users to whom user wants to have access to this datasource |
API Endpoint | Endpoint URL from which data needs to be extracted |
API Authentication | As of version 1.1.3 only BASIC is supported |
Method | HTTP Method, as of version 1.1.3 only GET and POST are allowed |
Query Params | Query string parameters which the API URL takes as input. |
Version | Enables the user to select what version of ingestion scripts to use (Amorphic specific). For any new feature/Glue version that gets added to the underlying ingestion script, new version will be added to the Amorphic. |
Additionally, the timeout for the ingestion process can be set during datasource creation by adding a key IngestionTimeout to DatasourceConfig in the input payload. The value should be between 1 and 2880 and is expected in minutes. If the value is not provided the default value of 480(8hours) would be used. Please note that this feature is available exclusively via API.
{
"DatasourceConfig": {
"url": "https://example.com/datafile.csv",
"auth_mechanism": "basic",
"query_parameters": {},
"method": "GET",
"IngestionTimeout": 222
},
}
This timeout can be overridden during schedule creation and schedule run by providing an argument MaxTimeOut.
External API details
In the details page, Estimated Cost of the Datasource is also displayed to show the approximate cost incurred since creation.
Edit
There is an option to edit an External API Datasource. To edit an External API Datasource, click the edit button which on the right corner.
Description and Authorised users of an External API Datasource can be changed.
Upgrade
Users have the option to upgrade a datasource if it's available, and this upgrade option will be displayed in the available options. The upgrade option is visible only when a new version is available; otherwise, it won't be shown.
Datasource upgrade, upgrades the underlying Glue version and the data ingestion script with new features.
Downgrade
Users have the capability to downgrade a datasource to a previous version if they believe the upgrade isn't meeting their requirements. It's important to note that a datasource can only be downgraded if it has previously been upgraded. For datasources that have been newly created, the option to downgrade is not available. If a datasource is compatible with downgrading, you will find the downgrade option.
Deletion
In the upper right corner, there is a button featuring an icon of a trash can. Click on it to delete.
Datasource Versions
1.1
In this version of external api datasources, we added auto-reload feature for datasets of type reload.
From this version onwards, data reloads process will trigger automatically as soon as the file upload finishes through the external api datasources. So that users don't need to manually trigger reload process after completion of file upload when ingesting data through external api ingestion datasource.
1.2
In this version we made code changes in the underlying glue script for the support dataset custom partitioning.
From this version onwards, the data will be loaded into into a S3 LZ with the prefix containing the partition key(if you specified any) for the targets which supports dataset partitioning.
Eg. For the partition keys KeyA
, KeyB
with the values ValueA
, ValueB
respectively, the S3 prefix will be in the format
Domain/DatasetName/KeyA=ValueA
/KeyB=ValueB
/upload_date=Unix_Timestamp/UserName/FileType/.
To understand more about custom data partitioning, read the docs about dataset custom partitioning here.
1.3
In this version of external API connection, we added support of skip LZ feature.
This feature enables users to directly upload data to the data lake zone by skipping the data validation. Please refer Skip LZ related docs for more details.
1.4
No major changes were made to the underlying glue script or design, but the logging has been enhanced.
1.5
The update in this version is specifically to ensure FIPS compliance, with no changes made to the script.
1.6
This version brings full support for Amorphic 3.0 along with an upgrade to the latest Python Glue job version.