Version: v3.1 print this page

JDBC Datasources

info

From version 2.2, encryption(in-flight, at-rest) for all jobs and catalog is enabled. All the existing jobs(User created, and also system created) were updated with encryption related settings, and all the newly created jobs will have encryption enabled automatically. Users should upgrade the datasources to the latest version.

JDBC (Java Database Connectivity) is a Java API designed to facilitate connections to tabular data in various databases, with a particular focus on relational databases, and to execute queries.

Setting up a JDBC datasource in Amorphic will help you migrate data from a JDBC source(s) to Amorphic. Further, you can directly write the data into the Amorphic datasets, using this datasource.

How to set up a JDBC Datasource?

JDBC Datasource creation

To create a datasource, input the below details shown in the table or you can directly upload the JSON data.

Metadata

Name	Description
Datasource Name	Give the datasource a unique name
Description	Add datasource description
Keywords	Add keyword tags to connect it with other Amorphic components.

Datasource configuration

Datasource Configuration	Description
Ingestion Type	Select the data load type based on the data size, i.e. For a single table or smaller data size select Normal Load
JDBC Datasource URL	The data source's JDBC datasource url examples: General format: jdbc:protocol://host:port/dbname, jdbc:protocol://host:port;dbname=actual_name, jdbc:oracle:thin:@//HostName:Port/ServiceName(or)SID, jdbc:oracle:thin:@HostName:Port:SID, jdbc:oracle:thin:@HostName:Port/ServiceName(or)SID, jdbc:sqlserver://actual_servername\actual_instancename:port;databaseName=actual_name, jdbc:mysql:loadbalance://hostname:port/database_name, jdbc:mysql:replication://hostname:port/database_name, jdbc:mysql://hostname:port/, jdbc:as400://hostname
Username	This is the username of the source database starting with 'u_' — For example, "u_Johndoe"
Password	The password is of the database from where you are loading data. Note: You can reset the password from profile & settings > reset DHW password
Datasource Accessibility	Use the private datasource if your data warehouse is hosted in an amazon virtual private cloud.
SSL	This field applies to all Normal data load datasources, and for Bulk data load datasources, it applies only to PostgreSQL and SQL Server databases. It is used when an additional level of security needs to be enforced.
Review & Submit	Review the information and submit it to create the datasource.

What ingestion types apply to your use case?

In Amorphic, JDBC datasources are either Bulk, Normal or Advanced data loads. Let's learn about their differences & implementations.

1. JDBC Normal data load - Datasources

Normal load datasources are best to move one table or a small amount of data. The JDBC URL format changes depending on the database being used. This datasource works with all databases supported by AWS Glue, which you can find here.

info

Amazon Redshift (Both Provisioned and Serverless), Amazon RDS (Aurora, MS SQL Server, MySQL, Oracle, PostgreSQL), publicly accessible databases (Aurora, MariaDB, MS SQL Server, MySQL, Oracle, PostgreSQL, IBM Db2 (LUW)).

Ingestion Timeout Configuration (API Only)

You can configure the ingestion process timeout during datasource creation by adding the IngestionTimeout key to the DatasourceConfig payload.

Value range: 1 to 2880 minutes
Default value: 480 minutes (8 hours) if not specified

POST - /datasources

Sample request payload to create JDBC datasource
{
  "DatasourceName": "JDBC-Datasource",
  "DatasourceType": "jdbc",
  "IngestionType": "normaldataload",
  "Description": "Test description",
  "Keywords": [
    "Owner: Mark Liu"
  ],
    "DatasourceConfig": {
    "JdbcURL": "jdbc:mysql://example.com:3306/mydatabase",
    "Username": "markliu",
    "Password": "StrongPassword#54321",
    "PublicAccessibility": "yes",
    "Ssl": "false",
    "IngestionTimeout": 720
  }
}

info

The IngestionTimeout value set in the datasource can be overridden during schedule creation or schedule run by providing a MaxTimeOut argument.

2. JDBC Bulk data load - Datasources

Bulk load datasources are recommended to migrate a significant amount of data. The JDBC URL format varies from database to database. At present, the supported databases are:

info

Oracle, MySQL, PostgreSQL, Microsoft SQL Server, IBM Db2 (LUW), Aurora MySQL

Users should use this type of JDBC datasource when they need to ingest multiple tables from multiple schemas in the JDBC source.
By using this, they can also apply mapping rules to their source data on top of the tables or their corresponding data before migrating it to the target datasets.

What are Dataflows?

Dataflows automate the data ingestion process in Amorphic. It is a set of instructions that defines a data migration or replication process. It allows you to specify the name, migration type, target location, data format and extra connection attributes for the source and target databases. Read How to create a dataflow

What are Entities (Instances)?

An entity refers to a virtual server or a specific copy of a service that is used to run dataflows. It can be used for various purposes such as loading large amounts of data into a database. Shared instances is a feature that allows multiple dataflows to run using a single instance. These instances are specific to a datasources and independent of each other. Read How to create an entity

For more details on bulk data load ingestion, read: JDBC Bulk data load - Datasources

3. JDBC Metadata Only - Datasources

Metadata only datasources are used to fetch metadata from JDBC sources and sync it into the Amorphic catalog. This includes fetching all databases, tables, columns, and their metadata information into the Amorphic catalog, for example, schema, column, and row-related details, etc. Supported databases are:

info

Oracle, MySQL, PostgreSQL, Microsoft SQL Server, IBM Db2 (LUW), Aurora MySQL and Snowflake

To configure metadata only datasources, users can follow the video below: JDBC Metadata Only Datasource

info

To ingest this metadata into the Amorphic catalog, users need to trigger the sync job after datasource creation, either on-demand or through a scheduled sync job.

Note that filters and schedules are just optional. Users can leave them blank, and we will ingest all tables and views across all databases.
If users want to ingest only specific tables or views in particular databases that satisfy a regex operation, they can provide the regex in the filters section.

4. JDBC Advanced data load - Datasources

Advanced data load datasources are recommended to migrate a significant amount of data. The JDBC URL format varies from database to database. At present, the supported databases are:

info

Mariadb, MySQL, IBM Db2i (AS400)

Users should use this type of JDBC datasource when they need to ingest multiple tables from multiple schemas in the JDBC source.
Users can either create a new Amorphic dataset during the advanced data load setup or use existing datasets as the target for their advanced data load ingestion

Cluster Configuration

When creating a datasource, user can optionally provide custom cluster configuration. If not specified, default values will be applied.

Cluster Size : Defines the instance type and the overall processing power allocated for your advanced dataload datasource.
Storage : The amount of data storage available for each node handling your ingestion.
Number of Brokers : The number of nodes working together to manage and balance the data load.
Data Retention : The duration for which ingested data is kept before being cleaned up.
Kafka Version : The underlying streaming engine version used to process and transfer your data.

Auto Scaling

Auto scaling can be enabled under cluster configuration to automatically adjust storage based on actual usage.

Min Storage Size : Minimum storage capacity that the cluster will always maintain.
Max Storage Size : Maximum storage capacity the cluster can scale up to.
Target Utilization : Storage usage threshold (in %) at which scaling actions are triggered.
Scale Out Cooldown : Waiting period (in minutes) before another scaling action can occur.

JDBC Datasource Cluster Config

Pre-requisite for IBM DB2i (AS400) Database

Step 1: Download and Configure Terminal Emulator

Download TN5250 for Mac or any equivalent terminal emulator.
Install the application
Configure connection to your IBM i system:

Host: Your IBM i server IP/hostname
Port: 23 (default for telnet) or 992 (for SSL)
Device name: Leave default or set custom
SSL: Enable if your system requires secure connection

Step 2: Connect and Run Database Setup Commands

Open the terminal emulator (e.g., TN5250).
Connect to your IBM i system
Sign on with a user profile that has QSECOFR authority

Run The Following Commands

Create journal receiver:

CALL QSYS2.QCMDEXC('CRTJRNRCV JRNRCV(<YOUR_LIBRARY_NAME>/<YOUR_JOURNAL_RECEIVER_NAME>) THRESHOLD(100000)');

Create journal:

CALL QSYS2.QCMDEXC('CRTJRN JRN(<YOUR_LIBRARY_NAME>/<YOUR_JOURNAL_NAME>) JRNRCV(<YOUR_LIBRARY_NAME>/<YOUR_JOURNAL_RECEIVER_NAME>) MNGRCV(*SYSTEM)');

Start journaling for your Table

CALL QSYS2.QCMDEXC('STRJRNPF FILE(<YOUR_LIBRARY_NAME>/<YOUR_TABLE_NAME>) JRN(<YOUR_LIBRARY_NAME>/<YOUR_JOURNAL_NAME>) IMAGES(*BOTH)');

Step 3: Grant Permissions to user

Grants the user the ability to access (execute into) the specified library.

GRTOBJAUT OBJ(<YOUR_LIBRARY_NAME>) OBJTYPE(*LIB) USER(<YOUR_USERNAME>) AUT(*EXECUTE)

Grants the user permission to use all journal receivers in the library.

GRTOBJAUT OBJ(<YOUR_LIBRARY_NAME>/*ALL) OBJTYPE(*JRNRCV) USER(<YOUR_USERNAME>) AUT(*USE)

Grants the user permission to use the specified journal.

GRTOBJAUT OBJ(<YOUR_LIBRARY_NAME>/<YOUR_JOURNAL_NAME>) OBJTYPE(*JRN) USER(<YOUR_USERNAME>) AUT(*USE)

Provides execute authority to allow navigation within the library.

GRTOBJAUT OBJ(<YOUR_LIBRARY_NAME>) OBJTYPE(*LIB) USER(<YOUR_USERNAME>) AUT(*EXECUTE)

Grants the user permission to read all database tables/files in the library.

GRTOBJAUT OBJ(<YOUR_LIBRARY_NAME>/*ALL) OBJTYPE(*FILE) USER(<YOUR_USERNAME>) AUT(*USE)

Database Setup Complete

Your IBM DB2i database is now ready for advanced dataload datasource integration and can perform dataflows.

What are Dataflows?

For more details on advanced data load ingestion, read: JDBC Advanced data load - Datasources

Test datasource

This functionality allows users to quickly verify the connectivity to the specified database. By initiating this test, users can confirm if the datasource details provided are accurate and functional, ensuring seamless access to the database.

JDBC Datasource test

info

In the datasources details page, Estimated Cost of the datasource is also displayed to show the approximate cost incurred since creation.

Version

You can choose which version of the ingestion scripts to use (specific to Amorphic). Whenever a new feature or Glue version is added to the underlying ingestion script, a new version will be added to Amorphic.

Upgrade

You can upgrade your datasource if new versions are available. It will also update the underlying Glue version and the data ingestion script with new features.

Downgrade

You can downgrade a datasource to previous version if you think the upgrade is not serving its purpose. A datasource can only be downgraded if it has been upgraded.

Note

Upgrade option is only applicable for jdbc normal load datasource

Normal data load datasource versions

2.1

In version 2.1, we made code changes to the underlying glue script to support dataset custom partitioning. Data will be loaded into an S3 LZ with a prefix containing the partition key (if specified) for the targets.

For example, if the partition keys are KeyA and KeyB with values ValueA and ValueB respectively, the S3 prefix will be in the format Domain/DatasetName/KeyA=ValueA/KeyB=ValueB/upload_date=YYYY-MM-DD/UserName/FileType/.

3.0

No changes were made to the underlying glue script or design, but the version was updated from 2.1 to 3.0 to match the AWS glue version.

3.1

In this version of the normal data load JDBC datasource, we added support for the Skip LZ feature. This feature allows users to directly upload data to the data lake zone without having to go through data validation. For more information, refer to the Skip LZ related documentation.

4.0

No changes were made to the underlying glue script or design, but the version was updated from 3.1 to 4.0 to match the AWS glue version.

4.1

In this version of the normal data load JDBC datasource, we made code changes to the glue script in-order to support glue encryption.

4.2

No major changes were made to the underlying glue script or design, but the logging has been enhanced.

4.3

The update in this version is specifically to ensure FIPS compliance, with no changes made to the script.

5.0

In version 5.0 of JDBC normal data load ingestion we've introduced query type data ingestions. Using this functionality users can pass an SQL query during creation of the schedule as an argument and the data will be loaded into dataset based on the SQL statement. This functionality is only supported for SELECT type SQL statements. The schedule for the JDBC ingestion supports the arguments query and prepareQuery and values of these arguments should be a valid SELECT SQL statements.

5.1

This version brings full support for Amorphic 3.0 along with an upgrade to the latest Python Glue job version.

How to set up a JDBC Datasource?​

Metadata​

Datasource configuration​

What ingestion types apply to your use case?​

1. JDBC Normal data load - Datasources​

Ingestion Timeout Configuration (API Only)​

2. JDBC Bulk data load - Datasources​

What are Dataflows?​

What are Entities (Instances)?​

3. JDBC Metadata Only - Datasources​

4. JDBC Advanced data load - Datasources​

What are Dataflows?​

Test datasource​

Version​

Upgrade​

Downgrade​

Normal data load datasource versions​

2.1​

3.0​

3.1​

4.0​

4.1​

4.2​

4.3​

5.0​

5.1​

How to set up a JDBC Datasource?

Metadata

Datasource configuration

What ingestion types apply to your use case?

1. JDBC Normal data load - Datasources

Ingestion Timeout Configuration (API Only)

2. JDBC Bulk data load - Datasources

What are Dataflows?

What are Entities (Instances)?

3. JDBC Metadata Only - Datasources

4. JDBC Advanced data load - Datasources

What are Dataflows?

Test datasource

Version

Upgrade

Downgrade

Normal data load datasource versions

2.1

3.0

3.1

4.0

4.1

4.2

4.3

5.0

5.1