Skip to main content
Version: v3.0 print this page

External API Datasource

info

From version 2.2, encryption(in-flight, at-rest) for all jobs and catalog is enabled. All the existing jobs(User created, and also system created) were updated with encryption related settings, and all the newly created jobs will have encryption enabled automatically.

External API datasources are used to import data from APIs to Amorphic Dataset. Only API Authentication of type BASIC is supported as of now.

Below are the ways to create an External API

BASIC

To create an External API datasource, user has to enter API Endpoint, HTTP Method and Query String Parameters. Below image shows how to create an External API Datasource

External API basic datasource

AttributeDescription
Datasource NameName of the datasource Amorphic
Datasource TypeType of datasource. In this case it is ExternalAPI
DescriptionDatasource related information user wants to store
Authorized UsersAmorphic users to whom user wants to have access to this datasource
API EndpointEndpoint URL from which data needs to be extracted
API AuthenticationAs of version 1.1.3 only BASIC is supported
MethodHTTP Method, as of version 1.1.3 only GET and POST are allowed
Query ParamsQuery string parameters which the API URL takes as input.
VersionEnables the user to select what version of ingestion scripts to use (Amorphic specific). For any new feature/Glue version that gets added to the underlying ingestion script, new version will be added to the Amorphic.

Additionally, the timeout for the ingestion process can be set during datasource creation by adding a key IngestionTimeout to DatasourceConfig in the input payload. The value should be between 1 and 2880 and is expected in minutes. If the value is not provided the default value of 480(8hours) would be used. Please note that this feature is available exclusively via API.

{
"DatasourceConfig": {
"url": "https://example.com/datafile.csv",
"auth_mechanism": "basic",
"query_parameters": {},
"method": "GET",
"IngestionTimeout": 222
},
}
info

This timeout can be overridden during schedule creation and schedule run by providing an argument MaxTimeOut.

External API details

External API Details

In the details page, Estimated Cost of the Datasource is also displayed to show the approximate cost incurred since creation.

Edit

There is an option to edit an External API Datasource. To edit an External API Datasource, click the edit button which on the right corner.

Description and Authorised users of an External API Datasource can be changed.

Upgrade

Users have the option to upgrade a datasource if it's available, and this upgrade option will be displayed in the available options. The upgrade option is visible only when a new version is available; otherwise, it won't be shown.

Datasource upgrade, upgrades the underlying Glue version and the data ingestion script with new features.

Downgrade

Users have the capability to downgrade a datasource to a previous version if they believe the upgrade isn't meeting their requirements. It's important to note that a datasource can only be downgraded if it has previously been upgraded. For datasources that have been newly created, the option to downgrade is not available. If a datasource is compatible with downgrading, you will find the downgrade option.

Deletion

In the upper right corner, there is a button featuring an icon of a trash can. Click on it to delete.

Datasource Versions

1.1

In this version of external api datasources, we added auto-reload feature for datasets of type reload.

From this version onwards, data reloads process will trigger automatically as soon as the file upload finishes through the external api datasources. So that users don't need to manually trigger reload process after completion of file upload when ingesting data through external api ingestion datasource.

1.2

In this version we made code changes in the underlying glue script for the support dataset custom partitioning.

From this version onwards, the data will be loaded into into a S3 LZ with the prefix containing the partition key(if you specified any) for the targets which supports dataset partitioning.

Eg. For the partition keys KeyA, KeyB with the values ValueA, ValueB respectively, the S3 prefix will be in the format Domain/DatasetName/KeyA=ValueA/KeyB=ValueB/upload_date=Unix_Timestamp/UserName/FileType/.

To understand more about custom data partitioning, read the docs about dataset custom partitioning here.

1.3

In this version of external API connection, we added support of skip LZ feature.

This feature enables users to directly upload data to the data lake zone by skipping the data validation. Please refer Skip LZ related docs for more details.

1.4

No major changes were made to the underlying glue script or design, but the logging has been enhanced.

1.5

The update in this version is specifically to ensure FIPS compliance, with no changes made to the script.

1.6

This version brings full support for Amorphic 3.0 along with an upgrade to the latest Python Glue job version.