JDBC Datasources
From version 2.2, encryption(in-flight, at-rest) for all jobs and catalog is enabled. All the existing jobs(User created, and also system created) were updated with encryption related settings, and all the newly created jobs will have encryption enabled automatically. Users should upgrade the datasources to the latest version.
JDBC (Java Database Connectivity) is a Java API designed to facilitate connections to tabular data in various databases, with a particular focus on relational databases, and to execute queries.
Setting up a JDBC datasource in Amorphic will help you migrate data from a JDBC source(s) to Amorphic. Further, you can directly write the data into the Amorphic datasets, using this datasource.
How to set up a JDBC Datasource?
To create a datasource, input the below details shown in the table or you can directly upload the JSON data.
Metadata
Name | Description |
---|---|
Datasource Name | Give the datasource a unique name |
Description | Add datasource description |
Keywords | Add keyword tags to connect it with other Amorphic components. |
Datasource configuration
Datasource Configuration | Description |
---|---|
Ingestion Type | Select the data load type based on the data size, i.e. For a single table or smaller data size select Normal Load |
JDBC Datasource URL | The data source's JDBC datasource url examples: General format: jdbc:protocol://host:port/dbname, jdbc:protocol://host:port;dbname=actual_name, jdbc:oracle:thin:@//HostName:Port/ServiceName(or)SID, jdbc:oracle:thin:@HostName:Port:SID, jdbc:oracle:thin:@HostName:Port/ServiceName(or)SID, jdbc:sqlserver://actual_servername\actual_instancename:port;databaseName=actual_name, jdbc:mysql:loadbalance://hostname:port/database_name, jdbc:mysql:replication://hostname:port/database_name |
Username | This is the username of the source database starting with 'u_' — For example, "u_Johndoe" |
Password | The password is of the database from where you are loading data. Note: You can reset the password from profile & settings > reset DHW password |
Datasource Accessibility | Use the private datasource if your data warehouse is hosted in an amazon virtual private cloud. |
SSL | This field only applies to the Normal data load datasources and is used when an additional level of security needs to be enforced. |
Review & Submit | Review the information and submit it to create the datasource. |
What ingestion types apply to your use case?
In Amorphic, JDBC datasource are either Bulk or Normal data loads. Let's learn about their differences & implementations.
1. JDBC Bulk data load - Datasources
Bulk load datasources are recommended to migrate a significant amount of data. The JDBC URL format varies from database to database. At present, the supported databases are:
Oracle, MySQL, PostgreSQL, Microsoft SQL Server, IBM Db2 (LUW), Aurora MySQL
- Users should use this type of JDBC datasource when they need to ingest multiple tables from multiple schemas in the JDBC source.
- By using this, they can also apply mapping rules to their source data on top of the tables or their corresponding data before migrating it to the target datasets.
- Users can either create a new Amorphic dataset during the bulk load setup or use existing datasets as the target for their bulk data load ingestion
What are Dataflows?
Dataflows automate the data ingestion process in Amorphic. It is a set of instructions that defines a data migration or replication process. It allows you to specify the name, migration type, target location, data format and extra connection attributes for the source and target databases. Read How to create a dataflow
What are Entities (Instances)?
An entity refers to a virtual server or a specific copy of a service that is used to run dataflows. It can be used for various purposes such as loading large amounts of data into a database. Shared instances is a feature that allows multiple dataflows to run using a single instance. These instances are specific to a datasources and independent of each other. Read How to create an entity
For more details on bulk data load ingestion, read: JDBC Bulk data load - Datasources
2. JDBC Normal data load - Datasources
Normal load datasources are best to move one table or a small amount of data. The JDBC URL format changes depending on the database being used. This datasource works with all databases supported by AWS Glue, which you can find here.
Amazon Redshift (Both Provisioned and Serverless), Amazon RDS (Aurora, MS SQL Server, MySQL, Oracle, PostgreSQL), publicly accessible databases (Aurora, MariaDB, MS SQL Server, MySQL, Oracle, PostgreSQL, IBM Db2 (LUW)).
Additionally, the timeout for the ingestion process can be set during datasource creation by adding a key IngestionTimeout to DatasourceConfig in the input payload. The value should be between 1 and 2880 and is expected in minutes. If the value is not provided the default value of 480(8hours) would be used. Please note that this feature is available exclusively via API.
{
"DatasourceConfig": {
"JdbcURL": "jdbc:mysql://example.com:3306/mydatabase",
"Username": "dbuser",
"Password": "StrongP@ss!",
"IngestionTimeout": 222
},
}
This timeout can be overridden during schedule creation and schedule run by providing an argument MaxTimeOut.
Test datasource
This functionality allows users to quickly verify the connectivity to the specified database. By initiating this test, users can confirm if the datasource details provided are accurate and functional, ensuring seamless access to the database.
In the datasources details page, Estimated Cost of the datasource is also displayed to show the approximate cost incurred since creation.
Version
You can choose which version of the ingestion scripts to use (specific to Amorphic). Whenever a new feature or Glue version is added to the underlying ingestion script, a new version will be added to Amorphic.
Upgrade
You can upgrade your datasource if new versions are available. It will also update the underlying Glue version and the data ingestion script with new features.
Downgrade
You can downgrade a datasource to previous version if you think the upgrade is not serving its purpose. A datasource can only be downgraded if it has been upgraded.
Upgrade option is only applicable for jdbc normal load datasource
Normal data load datasource versions
2.1
In version 2.1, we made code changes to the underlying glue script to support dataset custom partitioning. Data will be loaded into an S3 LZ with a prefix containing the partition key (if specified) for the targets.
For example, if the partition keys are KeyA
and KeyB
with values ValueA
and ValueB
respectively, the S3 prefix will be in the format Domain/DatasetName/KeyA=ValueA/KeyB=ValueB/upload_date=YYYY-MM-DD/UserName/FileType/
.
3.0
No changes were made to the underlying glue script or design, but the version was updated from 2.1 to 3.0 to match the AWS glue version.
3.1
In this version of the normal data load JDBC datasource, we added support for the Skip LZ feature. This feature allows users to directly upload data to the data lake zone without having to go through data validation. For more information, refer to the Skip LZ related documentation.
4.0
No changes were made to the underlying glue script or design, but the version was updated from 3.1 to 4.0 to match the AWS glue version.
4.1
In this version of the normal data load JDBC datasource, we made code changes to the glue script in-order to support glue encryption.
4.2
No major changes were made to the underlying glue script or design, but the logging has been enhanced.
4.3
The update in this version is specifically to ensure FIPS compliance, with no changes made to the script.
5.0
In version 5.0 of JDBC normal data load ingestion we've introduced query type data ingestions. Using this functionality users can pass an SQL query during creation of the schedule as an argument and the data will be loaded into dataset based on the SQL statement.
This functionality is only supported for SELECT
type SQL statements. The schedule for the JDBC ingestion supports the arguments query
and prepareQuery
and values of these arguments should be a valid SELECT SQL statements.
5.1
This version brings full support for Amorphic 3.0 along with an upgrade to the latest Python Glue job version.