Version: v3.1 print this page

External API Datasource

info

From version 2.2, encryption(in-flight, at-rest) for all jobs and catalog is enabled. All the existing jobs(User created, and also system created) were updated with encryption related settings, and all the newly created jobs will have encryption enabled automatically.

External API datasources are used to import data from REST APIs to Amorphic datasets. These datasources support various authentication mechanisms and HTTP methods to connect with external APIs and ingest data into your Amorphic Datasets.

Currently supported authentication types:

NoAuth - No authentication required
BasicAuth - Username and password authentication
OAuth1 - OAuth 1.0 authentication with consumer key/secret and access token/secret
OAuth2 - OAuth 2.0 authentication with client credentials grant type
BearerToken - Token-based authentication
ApiKey - API key authentication (can be added to header or query parameters)

Supported HTTP methods: GET, POST

Supported pagination types: Page, Offset, Cursor

How to create an External API Datasource?

External API datasource creation

To create a External API datasource, input the below details shown in the table or you can directly upload the JSON data.

Metadata

Name	Description
Datasource Name	Give the datasource a unique name
Description	Add datasource description
Keywords	Add keyword tags to connect it with other Amorphic components.
Datasource Type	Type of datasource. In this case it is ExternalAPI

Datasource Configuration

Configuration	Description
Version	Enables the user to select what version of ingestion scripts to use (Amorphic specific). For any new feature/Glue version that gets added to the underlying ingestion script, new version will be added to the Amorphic.
Request URL	The REST API endpoint URL from which data needs to be extracted. Must be a valid HTTP/HTTPS URL.
Request Method	HTTP method to use for the API request. Supported methods: GET, POST
Request Headers	Additional HTTP headers to include in the request (JSON format). Optional field.
Request Query Parameters	Query string parameters for the API request (JSON format). Optional field.
Request Body	Request body for POST requests (JSON format). Optional field.
Authentication Type	Choose the authentication type: NoAuth, BasicAuth, OAuth1, OAuth2, BearerToken, ApiKey
Authentication Configuration	Configuration specific to the selected authentication type (see details below)
Pagination Type	Type of pagination to use: Page, Offset, Cursor. Optional field.
Pagination Configuration	Configuration specific to the selected pagination type (see details below)

Authentication Configuration

Based on the selected AuthType, the following configurations are required in AuthConfig:

NoAuth

No additional configuration required.

BasicAuth

Field	Description
Username	Username for basic authentication
Password	Password for basic authentication

OAuth1

Field	Description
ConsumerKey	OAuth 1.0 consumer key
ConsumerSecret	OAuth 1.0 consumer secret
AccessToken	OAuth 1.0 access token
TokenSecret	OAuth 1.0 token secret

OAuth2

Field	Description
GrantType	OAuth 2.0 grant type (currently supports "ClientCredentials")
TokenUrl	URL to obtain the OAuth 2.0 token
ClientId	OAuth 2.0 client ID
ClientSecret	OAuth 2.0 client secret
Scope	OAuth 2.0 scope (optional, space-separated values)

BearerToken

Field	Description
Token	Bearer token for authentication

ApiKey

Field	Description
KeyName	Name of the API key parameter/header
KeyValue	Value of the API key
AddTo	Where to add the API key: "Header" or "Query"

Pagination Configuration

Based on the selected PaginationType, the following configurations are required in PaginationConfig:

Page

Field	Description
DataKey	JSON path to the data array in the response
PageParam	Parameter name for the page number
SizeParam	Parameter name for the page size
SizeValue	Number of records per page

Offset

Field	Description
DataKey	JSON path to the data array in the response
LimitParam	Parameter name for the limit
OffsetParam	Parameter name for the offset
LimitValue	Number of records per request

Cursor

Field	Description
DataKey	JSON path to the data array in the response
CursorParam	Parameter name for the cursor
CursorPathInResponse	JSON path to the next cursor value in the response

Ingestion Timeout Configuration (API Only)

You can configure the ingestion process timeout during datasource creation by adding the IngestionTimeout key to the DatasourceConfig payload.

Value range: 1 to 2880 minutes
Default value: 480 minutes (8 hours) if not specified

POST - /datasources

Sample request payload - No Authentication
{
  "DatasourceName": "ExternalAPI-NoAuth",
  "DatasourceType": "ext-api",
  "Description": "External API datasource with no authentication",
  "Keywords": [
    "Owner: Mark Liu"
  ],
  "DatasourceConfig": {
    "RequestUrl": "https://api.example.com/v1/data",
    "RequestMethod": "GET",
    "AuthType": "NoAuth",
    "IngestionTimeout": 720
  }
}

info

The IngestionTimeout value set in the datasource can be overridden during schedule creation or schedule run by providing a MaxTimeOut argument.

External API Datasource Details

The External API Datasource details page provides comprehensive information about your configured API connection, including authentication settings, request configuration, and connection status. This centralized view enables efficient monitoring and management of your external API integrations.

External API Details

Test Datasource

This functionality allows users to quickly verify the connectivity to the specified API endpoint. By initiating this test, users can confirm if the API configuration is accurate and functional, ensuring seamless access to the external API.

Edit Datasource

External API Datasources can be modified after creation to accommodate changing requirements or update configuration settings. The edit functionality provides flexibility to adjust various aspects of your API datasource without needing to recreate the entire datasource.

To edit an External API Datasource, locate and click the edit button in the upper right corner of the datasource details page. This will open the configuration interface where you can modify supported fields and settings.

info

Changes to critical configuration like API endpoint URL may require creating a new datasource.

Upgrade Datasource

Users have the option to upgrade a datasource if a new version is available. The upgrade option will be displayed in the available options when a newer version is released. The upgrade option is visible only when a new version is available; otherwise, it won't be shown.

Datasource upgrade upgrades the underlying Glue version and the data ingestion script with new features, performance improvements, and bug fixes.

Downgrade Datasource

Users have the capability to downgrade a datasource to a previous version if they believe the upgrade isn't meeting their requirements. It's important to note that a datasource can only be downgraded if it has previously been upgraded. For datasources that have been newly created, the option to downgrade is not available. If a datasource is compatible with downgrading, you will find the downgrade option.

Delete Datasource

To delete an External API Datasource, locate the delete button (trash can icon) in the upper right corner of the datasource details page and click on it. This action will permanently remove the datasource from your Amorphic environment.

warning

Deleting a datasource is irreversible. Ensure that no active schedules or jobs are dependent on this datasource before proceeding with deletion.

Runtime Query Parameters Support

External API datasources support query parameters to customize API requests. Query parameters can be configured at the schedule level and support both static values and dynamic date placeholders for time-based queries.

Date Placeholder Support

Users can pass date placeholders within the QueryParameters for dynamic date-based queries. This requires a parameter called DATE_FORMAT within QueryParameters to specify the date format.

Example: Consider a financial API that supports querying by transaction date and you want to automatically fetch the previous day's transactions every time the schedule runs.

Configure the following Query Parameters on the External API Datasource:

Parameter	Value
DATE_FORMAT	`%Y-%m-%d`
transaction_date	`%%CURRENT_DATE-1D`

Valid Date Placeholders

The date placeholders should be in the format %%CURRENT_DATE - Number Of Days/Hours/Minutes. Valid placeholder examples:

"%%CURRENT_DATE" - Current date
"%%CURRENT_DATE-1D" - 1 day before current date
"%%CURRENT_DATE-10H" - 10 hours before current time
"%%CURRENT_DATE-5M" - 5 minutes before current time

This feature enables flexible scheduling of API data ingestion with dynamic query parameters, making it easier to handle time-based data retrieval scenarios.

Runtime Query Parameter Override

External API datasources support runtime override of query parameters through schedule arguments. This allows users to dynamically modify query parameters without editing the datasource configuration.

Query parameters should be provided as a single schedule argument called QueryParameters whose value is a dictionary with the required parameters. For example:

Schedule Argument Name	Schedule Argument Value
QueryParameters	`{"limit": "5", "offset": "1"}`

Example with date placeholders:

{
  "DATE_FORMAT": "%Y-%m-%d",
  "start_date": "%%CURRENT_DATE-2D",
  "end_date": "%%CURRENT_DATE-1D"
}

info

Query Parameter Priority Order

When query parameters are defined at multiple levels, Amorphic uses the following priority order (highest to lowest):

Runtime Parameters - QueryParameters argument passed during manual schedule execution (overrides everything)
Schedule Arguments - QueryParameters argument defined in the schedule configuration (overrides datasource settings)
Datasource Configuration - Query Parameters configured on the External API Datasource

Datasource Versions

1.1

In this version of External API datasources, we added auto-reload feature for datasets of type reload.

From this version onwards, data reloads process will trigger automatically as soon as the file upload finishes through the external api datasources. So that users don't need to manually trigger reload process after completion of file upload when ingesting data through external api ingestion datasource.

1.2

In this version we made code changes in the underlying Glue script for the support of dataset custom partitioning.

From this version onwards, the data will be loaded into S3 LZ with the prefix containing the partition key (if you specified any) for the targets which support dataset partitioning.

For example, for the partition keys KeyA, KeyB with the values ValueA, ValueB respectively, the S3 prefix will be in the format: Domain/DatasetName/KeyA=ValueA/KeyB=ValueB/upload_date=Unix_Timestamp/UserName/FileType/.

To understand more about custom data partitioning, read the docs about dataset custom partitioning here.

1.3

In this version of external API datasource, we added support for the Skip LZ feature.

This feature enables users to directly upload data to the data lake zone by skipping the data validation. Please refer to Skip LZ related docs for more details.

1.4

No major changes were made to the underlying Glue script or design, but the logging has been enhanced for better debugging and monitoring capabilities.

1.5

The update in this version is specifically to ensure FIPS compliance, with no functional changes made to the script.

1.6

This version brings full support for Amorphic 3.0 along with an upgrade to the latest Python Glue job version.

2.0

This major version introduces comprehensive enhancements to External API datasource capabilities, providing extensive flexibility and control over API integrations.

Key features introduced in this version:

Multiple Authentication Types: Added support for NoAuth, BasicAuth, OAuth1, OAuth2, BearerToken, and ApiKey authentication mechanisms, allowing seamless integration with various API security models
Advanced Pagination Support: Implemented Page, Offset, and Cursor pagination types to handle large datasets efficiently and accommodate different API pagination patterns
Request Body Configuration: Introduced support for custom request bodies, enabling POST operation with structured JSON payloads for complex API interactions
Custom Headers Management: Added the ability to configure custom HTTP headers, providing full control over request formatting and API-specific requirements
Enhanced Request Configuration: Improved overall request handling with better parameter management and configuration options

How to create an External API Datasource?​

Metadata​

Datasource Configuration​

Authentication Configuration​

NoAuth​

BasicAuth​

OAuth1​

OAuth2​

BearerToken​

ApiKey​

Pagination Configuration​

Page​

Offset​

Cursor​

Ingestion Timeout Configuration (API Only)​

External API Datasource Details​

Test Datasource​

Edit Datasource​

Upgrade Datasource​

Downgrade Datasource​

Delete Datasource​

Runtime Query Parameters Support​

Date Placeholder Support​

Valid Date Placeholders​

Runtime Query Parameter Override​

Datasource Versions​

1.1​

1.2​

1.3​

1.4​

1.5​

1.6​

2.0​

How to create an External API Datasource?

Metadata

Datasource Configuration

Authentication Configuration

NoAuth

BasicAuth

OAuth1

OAuth2

BearerToken

ApiKey

Pagination Configuration

Page

Offset

Cursor

Ingestion Timeout Configuration (API Only)

External API Datasource Details

Test Datasource

Edit Datasource

Upgrade Datasource

Downgrade Datasource

Delete Datasource

Runtime Query Parameters Support

Date Placeholder Support

Valid Date Placeholders

Runtime Query Parameter Override

Datasource Versions

1.1

1.2

1.3

1.4

1.5

1.6

2.0