Skip to main content
Version: v3.1 print this page

External API Datasource

info

From version 2.2, encryption(in-flight, at-rest) for all jobs and catalog is enabled. All the existing jobs(User created, and also system created) were updated with encryption related settings, and all the newly created jobs will have encryption enabled automatically.

External API datasources are used to import data from REST APIs to Amorphic datasets. These datasources support various authentication mechanisms and HTTP methods to connect with external APIs and ingest data into your Amorphic Datasets.

Currently supported authentication types:

  • NoAuth - No authentication required
  • BasicAuth - Username and password authentication
  • OAuth1 - OAuth 1.0 authentication with consumer key/secret and access token/secret
  • OAuth2 - OAuth 2.0 authentication with client credentials grant type
  • BearerToken - Token-based authentication
  • ApiKey - API key authentication (can be added to header or query parameters)

Supported HTTP methods: GET, POST

Supported pagination types: Page, Offset, Cursor

How to create an External API Datasource?

External API datasource creation

To create a External API datasource, input the below details shown in the table or you can directly upload the JSON data.

Metadata

NameDescription
Datasource NameGive the datasource a unique name
DescriptionAdd datasource description
KeywordsAdd keyword tags to connect it with other Amorphic components.
Datasource TypeType of datasource. In this case it is ExternalAPI

Datasource Configuration

ConfigurationDescription
VersionEnables the user to select what version of ingestion scripts to use (Amorphic specific). For any new feature/Glue version that gets added to the underlying ingestion script, new version will be added to the Amorphic.
Request URLThe REST API endpoint URL from which data needs to be extracted. Must be a valid HTTP/HTTPS URL.
Request MethodHTTP method to use for the API request. Supported methods: GET, POST
Request HeadersAdditional HTTP headers to include in the request (JSON format). Optional field.
Request Query ParametersQuery string parameters for the API request (JSON format). Optional field.
Request BodyRequest body for POST requests (JSON format). Optional field.
Authentication TypeChoose the authentication type: NoAuth, BasicAuth, OAuth1, OAuth2, BearerToken, ApiKey
Authentication ConfigurationConfiguration specific to the selected authentication type (see details below)
Pagination TypeType of pagination to use: Page, Offset, Cursor. Optional field.
Pagination ConfigurationConfiguration specific to the selected pagination type (see details below)

Authentication Configuration

Based on the selected AuthType, the following configurations are required in AuthConfig:

NoAuth

No additional configuration required.

BasicAuth

FieldDescription
UsernameUsername for basic authentication
PasswordPassword for basic authentication

OAuth1

FieldDescription
ConsumerKeyOAuth 1.0 consumer key
ConsumerSecretOAuth 1.0 consumer secret
AccessTokenOAuth 1.0 access token
TokenSecretOAuth 1.0 token secret

OAuth2

FieldDescription
GrantTypeOAuth 2.0 grant type (currently supports "ClientCredentials")
TokenUrlURL to obtain the OAuth 2.0 token
ClientIdOAuth 2.0 client ID
ClientSecretOAuth 2.0 client secret
ScopeOAuth 2.0 scope (optional, space-separated values)

BearerToken

FieldDescription
TokenBearer token for authentication

ApiKey

FieldDescription
KeyNameName of the API key parameter/header
KeyValueValue of the API key
AddToWhere to add the API key: "Header" or "Query"

Pagination Configuration

Based on the selected PaginationType, the following configurations are required in PaginationConfig:

Page

FieldDescription
DataKeyJSON path to the data array in the response
PageParamParameter name for the page number
SizeParamParameter name for the page size
SizeValueNumber of records per page

Offset

FieldDescription
DataKeyJSON path to the data array in the response
LimitParamParameter name for the limit
OffsetParamParameter name for the offset
LimitValueNumber of records per request

Cursor

FieldDescription
DataKeyJSON path to the data array in the response
CursorParamParameter name for the cursor
CursorPathInResponseJSON path to the next cursor value in the response

Ingestion Timeout Configuration (API Only)

You can configure the ingestion process timeout during datasource creation by adding the IngestionTimeout key to the DatasourceConfig payload.

  • Value range: 1 to 2880 minutes
  • Default value: 480 minutes (8 hours) if not specified

POST - /datasources

Sample request payload - No Authentication
{
"DatasourceName": "ExternalAPI-NoAuth",
"DatasourceType": "ext-api",
"Description": "External API datasource with no authentication",
"Keywords": [
"Owner: Mark Liu"
],
"DatasourceConfig": {
"RequestUrl": "https://api.example.com/v1/data",
"RequestMethod": "GET",
"AuthType": "NoAuth",
"IngestionTimeout": 720
}
}
info

The IngestionTimeout value set in the datasource can be overridden during schedule creation or schedule run by providing a MaxTimeOut argument.

External API Datasource Details

The External API Datasource details page provides comprehensive information about your configured API connection, including authentication settings, request configuration, and connection status. This centralized view enables efficient monitoring and management of your external API integrations.

External API Details

Test Datasource

This functionality allows users to quickly verify the connectivity to the specified API endpoint. By initiating this test, users can confirm if the API configuration is accurate and functional, ensuring seamless access to the external API.

Edit Datasource

External API Datasources can be modified after creation to accommodate changing requirements or update configuration settings. The edit functionality provides flexibility to adjust various aspects of your API datasource without needing to recreate the entire datasource.

To edit an External API Datasource, locate and click the edit button in the upper right corner of the datasource details page. This will open the configuration interface where you can modify supported fields and settings.

info

Changes to critical configuration like API endpoint URL may require creating a new datasource.

Upgrade Datasource

Users have the option to upgrade a datasource if a new version is available. The upgrade option will be displayed in the available options when a newer version is released. The upgrade option is visible only when a new version is available; otherwise, it won't be shown.

Datasource upgrade upgrades the underlying Glue version and the data ingestion script with new features, performance improvements, and bug fixes.

Downgrade Datasource

Users have the capability to downgrade a datasource to a previous version if they believe the upgrade isn't meeting their requirements. It's important to note that a datasource can only be downgraded if it has previously been upgraded. For datasources that have been newly created, the option to downgrade is not available. If a datasource is compatible with downgrading, you will find the downgrade option.

Delete Datasource

To delete an External API Datasource, locate the delete button (trash can icon) in the upper right corner of the datasource details page and click on it. This action will permanently remove the datasource from your Amorphic environment.

warning

Deleting a datasource is irreversible. Ensure that no active schedules or jobs are dependent on this datasource before proceeding with deletion.

Runtime Query Parameters Support

External API datasources support query parameters to customize API requests. Query parameters can be configured at the schedule level and support both static values and dynamic date placeholders for time-based queries.

Date Placeholder Support

Users can pass date placeholders within the QueryParameters for dynamic date-based queries. This requires a parameter called DATE_FORMAT within QueryParameters to specify the date format.

Example: Consider a financial API that supports querying by transaction date and you want to automatically fetch the previous day's transactions every time the schedule runs.

Configure the following Query Parameters on the External API Datasource:

ParameterValue
DATE_FORMAT%Y-%m-%d
transaction_date%%CURRENT_DATE-1D

Valid Date Placeholders

The date placeholders should be in the format %%CURRENT_DATE - Number Of Days/Hours/Minutes. Valid placeholder examples:

  • "%%CURRENT_DATE" - Current date
  • "%%CURRENT_DATE-1D" - 1 day before current date
  • "%%CURRENT_DATE-10H" - 10 hours before current time
  • "%%CURRENT_DATE-5M" - 5 minutes before current time

This feature enables flexible scheduling of API data ingestion with dynamic query parameters, making it easier to handle time-based data retrieval scenarios.

Runtime Query Parameter Override

External API datasources support runtime override of query parameters through schedule arguments. This allows users to dynamically modify query parameters without editing the datasource configuration.

Query parameters should be provided as a single schedule argument called QueryParameters whose value is a dictionary with the required parameters. For example:

Schedule Argument NameSchedule Argument Value
QueryParameters{"limit": "5", "offset": "1"}

Example with date placeholders:

{
"DATE_FORMAT": "%Y-%m-%d",
"start_date": "%%CURRENT_DATE-2D",
"end_date": "%%CURRENT_DATE-1D"
}
info

Query Parameter Priority Order

When query parameters are defined at multiple levels, Amorphic uses the following priority order (highest to lowest):

  1. Runtime Parameters - QueryParameters argument passed during manual schedule execution (overrides everything)
  2. Schedule Arguments - QueryParameters argument defined in the schedule configuration (overrides datasource settings)
  3. Datasource Configuration - Query Parameters configured on the External API Datasource

Datasource Versions

1.1

In this version of External API datasources, we added auto-reload feature for datasets of type reload.

From this version onwards, data reloads process will trigger automatically as soon as the file upload finishes through the external api datasources. So that users don't need to manually trigger reload process after completion of file upload when ingesting data through external api ingestion datasource.

1.2

In this version we made code changes in the underlying Glue script for the support of dataset custom partitioning.

From this version onwards, the data will be loaded into S3 LZ with the prefix containing the partition key (if you specified any) for the targets which support dataset partitioning.

For example, for the partition keys KeyA, KeyB with the values ValueA, ValueB respectively, the S3 prefix will be in the format: Domain/DatasetName/KeyA=ValueA/KeyB=ValueB/upload_date=Unix_Timestamp/UserName/FileType/.

To understand more about custom data partitioning, read the docs about dataset custom partitioning here.

1.3

In this version of external API datasource, we added support for the Skip LZ feature.

This feature enables users to directly upload data to the data lake zone by skipping the data validation. Please refer to Skip LZ related docs for more details.

1.4

No major changes were made to the underlying Glue script or design, but the logging has been enhanced for better debugging and monitoring capabilities.

1.5

The update in this version is specifically to ensure FIPS compliance, with no functional changes made to the script.

1.6

This version brings full support for Amorphic 3.0 along with an upgrade to the latest Python Glue job version.

2.0

This major version introduces comprehensive enhancements to External API datasource capabilities, providing extensive flexibility and control over API integrations.

Key features introduced in this version:

  • Multiple Authentication Types: Added support for NoAuth, BasicAuth, OAuth1, OAuth2, BearerToken, and ApiKey authentication mechanisms, allowing seamless integration with various API security models
  • Advanced Pagination Support: Implemented Page, Offset, and Cursor pagination types to handle large datasets efficiently and accommodate different API pagination patterns
  • Request Body Configuration: Introduced support for custom request bodies, enabling POST operation with structured JSON payloads for complex API interactions
  • Custom Headers Management: Added the ability to configure custom HTTP headers, providing full control over request formatting and API-specific requirements
  • Enhanced Request Configuration: Improved overall request handling with better parameter management and configuration options