Skip to main content
 print this page

JDBC Advanced data load

To perform Advanced data migration, users need to create dataflow within the JDBC datasource. A advanced data load dataflow in Amorphic can be set up using the "Add Dataflow" option, which is available on the Datasource Details page under the "Dataflow" tab.

Note

As part of Amorphic advanced data load, datasets are not required to be created already as a pre-requisite. When user creates a dataflow then datasets will be created as part of the dataflow and will be used for data ingestion. The user needs to toggle the 'CreateDataset' option during dataflow creation. These datasets are tightly coupled with the dataflow and cannot be used for normal data upload either from UI or any other process.

The following picture depicts the dataflows page in Amorphic

JDBC datasource flows

Dataflows

Dataflow refers to a unit of work that migrates data from a source database to a target database.

The steps for creating the dataflows are outlined below.

How to create a Dataflow

Below are the steps that are required to create a advanced data load dataflow in Amorphic.

Dataflow Specifications

  • Dataflow Name : Name of the dataflow to be used, an identifier of the dataflow.

  • Process Type : Full Load, Change Data Capture (CDC) & Full Load and CDC

    • Full Load : This option simply migrates the data from your source database to your target database.
    • Full load and CDC (Migrate existing data and replicate ongoing changes) : This option performs a full data load while capturing changes on the source. After the full load is complete, captured changes are applied to the target and henceforth.
    • CDC only (Replicate data changes only) : In this option, only ongoing data changes are captured.
  • Target Location : Select the target where the data has to be migrated.

    Note
    • For S3Athena target types, the file will include headers by default when create dataset is set to True.
    • For S3Athena target types with Parquet format, if using an existing dataset the user needs to add the following columns to the dataset.
      • __ts_ms (type: timestamp)
      • __op (type: string)
      • target_location (type: string)
  • Data Format: Data Format is used to determine the target data format of a dataflow. For S3 and S3Athena target dataflows files will be created for the datasets in the format that is specified here.

Dataset Specs

  • Create Datasets for each table : Set this option to True to create new datasets as the targets. Disable this option to select existing Amorphic datasets as the targets for each source table. This setting applies on a per-dataflow basis and affects all selected source tables. For more details, refer to the Ingestion Configuration section below.

  • Target Table Preparation Mode : To indicate how to handle loading the target at load startup. Available options are Truncate, Append & Cdc, "Truncate" will delete the data in the dataset before load while "Append & Cdc" will add the data to existing.

    Note

    This is an optional field. If the user opts to select existing Amorphic datasets instead of creating new ones during dataflow creation and no option is selected for this attribute, it will default to Append in the backend.

Table(s) selection

  • Select all the tables that are needed to be loaded into Amorphic.
  • Schemas and tables can be filtered if necessary with the filter on the top.
  • After selecting the tables, click on Next which is at the end of the page to proceed with the metadata edit/update.

Ingestion Configuration

The user can configure data ingestion in the dataflow with two options: 'Create Datasets' set to true or false.

1. Dataflows Involving the Creation of New Datasets

  • Informations in this page will be used as metadata to register the datasets in Amorphic.

    • Dataset Name : This option is used to edit the Dataset Name's in bulk by adding prefix/suffix to the generated names.
    • Dataset Description : Edit the generated description by using this option.
    • Domain : Edit domain for the datasets.
    • Keywords : Edit Keywords for the datasets.

2. Dataflows Using Existing Amorphic Datasets

  • Dataset Name : This option is used to select the dataset in Amorphic where the data needs to be ingested. User can select the datasets based on selected target location. For example, If user selects S3 as target location then only S3 datasets are applicable. For S3 and S3Athena datasets, only append-type datasets can be selected, while for Redshift datasets, both append and update (latest record) types can be selected.
Note
  • Users should ensure they select the correct dataset when using CreateDataset as False for the Dataflows.
  • For S3Athena dataflows with Parquet format, only Iceberg datasets should be selected.
  • For S3Athena dataflows with JSON format, only datasets with SerDe = OpenXJSONSerDe should be selected.
  • For S3Athena dataflow, users should select only CDC datasets for CDC type of dataflow and Append for Append type of dataflows
  • In case of S3 target location whole schema can also be selected instead of individual selection.

  • Enable Filters, Transforms : This option will enable additional config that can be applied either on the source tables or on the target data while running the dataflow and during the data ingestion.

    Source Table

    idnamedescriptionmanager_idbudgetcreated_at
    123John Doe{"address": {"city": "Los Angeles", "zip": "90001"}}M78920002025-05-20 07:53:39
    456Alice Ray{"address": {"city": "New York", "zip": "10001"}}M23410002025-05-19 04:23:30
    125Bob Lee{"address": {"city": "Chicago", "zip": "60601"}}M32115002025-05-21 02:53:13
Note

The table shown above represents the source table. All the following filter and transformation rules will be applied to this source table, and the resulting output will be referred to as the target table

Filter & Transformation Rules

  • Filter rules are used to filter the data while ingesting the data to Amorphic. Below are the rules that are supported during ingestion.

    • Filter Records by Expression
    JDBC Filter Records By Expression

    If a user applies FilterRecordsByExpression with Expression = value.__op == 'u' && value.budget == '2000', then only update (u) events where the budget field equals 2000 will be captured and passed downstream.

    idnamedescriptionmanager_idbudgetcreated_at
    123John Doe{"address": {"city": "Los Angeles", "zip": "90001"}}M78920002025-05-20 07:53:39
  • Transformation rules are used to modify the column names or datatype of the table while ingesting the data to target, below is the list of transformation rules that are available.

    • Add New Column
    JDBC Add New Column

    If a user performs Add New Column with Column Name = gender and Column Value = male, then the target table will contain an additional column named gender, where every row will have the value male.

    Target Table:

    idnamedescriptionmanager_idbudgetgendercreated_at
    123John Doe{"address": {"city": "Los Angeles", "zip": "90001"}}M7892000male2025-05-20 07:53:39
    456Alice Ray{"address": {"city": "New York", "zip": "10001"}}M2341000male2025-05-19 04:23:30
    125Bob Lee{"address": {"city": "Chicago", "zip": "60601"}}M3211500male2025-05-21 02:53:13
    • Rename Column
    JDBC Rename Column

    If a user performs Rename Column with Column Name = name and Rename To = full_name, then the column name in the target table will be renamed to full_name.

    Target Table:

    idfull_namedescriptionmanager_idbudgetcreated_at
    123John Doe{"address": {"city": "Los Angeles", "zip": "90001"}}M78920002025-05-20 07:53:39
    456Alice Ray{"address": {"city": "New York", "zip": "10001"}}M23410002025-05-19 04:23:30
    125Bob Lee{"address": {"city": "Chicago", "zip": "60601"}}M32115002025-05-21 02:53:13
    • Include Column
    JDBC Include Column

    If a user selects Include Column with Column Name = name, then only the name column will be retained in the target table, and all other columns will be excluded.

    Target Table:

    name
    John Doe
    Alice Ray
    Bob Lee
    • Mask Sensitive Data
    JDBC Mask Sensitive Data

    If a user applies Mask Sensitive Data on Column Name = manager_id, then the manager_id values in the target table will be replaced with masked values (e.g., ####).

    Target Table:

    idfull_namedescriptionmanager_idbudgetcreated_at
    123John Doe{"address": {"city": "Los Angeles", "zip": "90001"}}####20002025-05-20 07:53:39
    456Alice Ray{"address": {"city": "New York", "zip": "10001"}}####10002025-05-19 04:23:30
    125Bob Lee{"address": {"city": "Chicago", "zip": "60601"}}####15002025-05-21 02:53:13
    • Flatten Nested Fields
    JDBC Flatten Nested Fields

    If a user applies Flatten Nested Fields with Delimeter = _ then all nested fields will be flattened into a single level with field names separated by _

    Target Table:

    idfull_namedescription_address_citydescription_address_zipmanager_idbudgetcreated_at
    123John DoeLos Angeles90001M78920002025-05-20 07:53:39
    456Alice RayNew York10001M23410002025-05-19 04:23:30
    125Bob LeeChicago60601M32115002025-05-21 02:53:13
    • Convert Time Stamp
    JDBC Convert Timestamp

    If a user applies Convert Time Stamp on created_at with Timestamp Format= yyyy-MM-dd HH:mm:ss, Target Type = Timestamp and Precision = Microseconds, then that column in the target table will be explicitly a Timestamp type which keeps precision upto Microseconds (e.g., 2025-08-21 14:23:45.123456)

    Target Table:

    idfull_namedescriptionmanager_idbudgetcreated_at
    123John Doe{"address": {"city": "Los Angeles", "zip": "90001"}}M78920002025-05-20 07:53:39.234
    456Alice Ray{"address": {"city": "New York", "zip": "10001"}}M23410002025-05-19 04:23:30.345
    125Bob Lee{"address": {"city": "Chicago", "zip": "60601"}}M32115002025-05-21 02:53:13.542
    • Change Column Case
    JDBC Change Column Case

    If a user applies Change Column Case with Convert From = Lower Underscore and Convert To = Upper Underscore, then all column names in the table will be converted from lower case with underscores to upper case with underscores.

    IDNAMEDESCRIPTIONMANAGER_IDBUDGETCREATED_AT
    123John Doe{"address": {"city": "Los Angeles", "zip": "90001"}}M78920002025-05-20 07:53:39
    456Alice Ray{"address": {"city": "New York", "zip": "10001"}}M23410002025-05-19 04:23:30
    125Bob Lee{"address": {"city": "Chicago", "zip": "60601"}}M32115002025-05-21 02:53:13
    • Extract Nested Fields
    JDBC Extract Nested Field

    If a user applies Extract Nested Field on Column Name = description with From Field Location = address and Output Field Location = location, then the address field inside the nested description column will be extracted into a new column named location.

    idnamedescriptionmanager_idbudgetcreated_atlocation
    123John Doe{"address": {"city": "Los Angeles", "zip": "90001"}}M78920002025-05-20 07:53:39{"city": "Los Angeles", "zip": "90001"}
    456Alice Ray{"address": {"city": "New York", "zip": "10001"}}M23410002025-05-19 04:23:30{"city": "New York", "zip": "10001"}
    125Bob Lee{"address": {"city": "Chicago", "zip": "60601"}}M32115002025-05-21 02:53:13{"city": "Chicago", "zip": "60601"}
    • Rename Columns by Pattern
    JDBC Rename Columns by Pattern

    If a user applies Rename Columns by Pattern with Pattern = ^Bob and Replacement = Fog then all column names that start with abc will be renamed by replacing abc with xyz

    idnamedescriptionmanager_idbudgetcreated_at
    123John Doe{"address": {"city": "Los Angeles", "zip": "90001"}}M78920002025-05-20 07:53:39
    456Alice Ray{"address": {"city": "New York", "zip": "10001"}}M23410002025-05-19 04:23:30
    125Fog Lee{"address": {"city": "Chicago", "zip": "60601"}}M32115002025-05-21 02:53:13
    • Add Current Timestamp
    JDBC Add Current Timestamp

    If a user applies Add Current Timestamp with Output Field Name = Date, then the target table will have an extra column Date with the current timestamp added to every row.

    idnamedescriptionmanager_idbudgetcreated_atDate
    123John Doe{"address": {"city": "Los Angeles", "zip": "90001"}}M78920002025-05-20 07:53:392025-07-12 06:55:23
    456Alice Ray{"address": {"city": "New York", "zip": "10001"}}M23410002025-05-19 04:23:302025-07-12 06:57:35
    125Bob Lee{"address": {"city": "Chicago", "zip": "60601"}}M32115002025-05-21 02:53:132025-07-12 06:56:39
    • Hash Value
    JDBC Hash Value

    If a user applies Hash Value on Column Name = manager_id, then the manager_id values will be replaced with their hashed values (e.g., SHA-256) in the target table. Skip Missing or Null as Yes will skip the empty values and Skipping Missing or Null as No will perform hash on Null Values

idnamedescriptionmanager_idbudgetcreated_at
123John Doe{"address": {"city": "Los Angeles", "zip": "90001"}}3c1e…a92f20002025-05-20 07:53:39
456Alice Ray{"address": {"city": "New York", "zip": "10001"}}dfe8…f12b10002025-05-19 04:23:30
125Bob Lee{"address": {"city": "Chicago", "zip": "60601"}}7b91…d4f015002025-05-21 02:53:13
  • Concatenate Field
JDBC Concatenate Field

If a user applies Concatenate Fields on Column Names = name and manager_id with Output Field Name = name_mid and Delimiter = "_" and Replace Missing = "N/A", then a new column name_mid will be created by combining both fields separated by an underscore, replacing any missing value with N/A.

idnamedescriptionmanager_idbudgetcreated_atname_mid
123John Doe{"address": {"city": "Los Angeles", "zip": "90001"}}M78920002025-05-20 07:53:39John Doe_M789
456Alice Ray{"address": {"city": "New York", "zip": "10001"}}M23410002025-05-19 04:23:30Alice Ray_M234
125Bob Lee{"address": {"city": "Chicago", "zip": "60601"}}M32115002025-05-21 02:53:13Bob Lee_M321
  • Convert to LowerCase
JDBC Convert to Lowercase

If a user applies Convert to LowerCase on Column Name = name, then all values in the name column will be converted to lowercase in the target table.

idnamedescriptionmanager_idbudgetcreated_at
123john doe{"address": {"city": "Los Angeles", "zip": "90001"}}M78920002025-05-20 07:53:39
456alice ray{"address": {"city": "New York", "zip": "10001"}}M23410002025-05-19 04:23:30
125bob lee{"address": {"city": "Chicago", "zip": "60601"}}M32115002025-05-21 02:53:13
  • Convert to UpperCase
JDBC Convert to Uppercase

If a user applies Convert to UpperCase on Column Name = name, then all values in the name column will be converted to uppercase in the target table.

idnamedescriptionmanager_idbudgetcreated_at
123JOHN DOE{"address": {"city": "Los Angeles", "zip": "90001"}}M78920002025-05-20 07:53:39
456ALICE RAY{"address": {"city": "New York", "zip": "10001"}}M23410002025-05-19 04:23:30
125BOB LEE{"address": {"city": "Chicago", "zip": "60601"}}M32115002025-05-21 02:53:13
  • Custom Scripted Transform
JDBC Custom Scripted Transform

If a user applies Custom Scripted Transform on the Column Name = budget with the given Expression = def transform(value) { if (value == null) return null; return value*0.90; }, then each value will be transformed by multiplying it by 0.90, effectively reducing a 10% budget, while keeping null values unchanged.

idnamedescriptionmanager_idbudgetcreated_at
123John Doe{"address": {"city": "Los Angeles", "zip": "90001"}}M78918002025-05-20 07:53:39
456Alice Ray{"address": {"city": "New York", "zip": "10001"}}M2349002025-05-19 04:23:30
125Bob Lee{"address": {"city": "Chicago", "zip": "60601"}}M32113502025-05-21 02:53:13
  • Change Data Type (Only Integer, Float, Boolean & String data types are available for conversion now)
JDBC Change Data Type

If a user applies Change Data Type on Column Name = budget with Data Type = Float, then the budget column in the target table will be converted from integer to float values.

idnamedescriptionmanager_idbudgetcreated_at
123John Doe{"address": {"city": "Los Angeles", "zip": "90001"}}M7892000.02025-05-20 07:53:39
456Alice Ray{"address": {"city": "New York", "zip": "10001"}}M2341000.02025-05-19 04:23:30
125Bob Lee{"address": {"city": "Chicago", "zip": "60601"}}M3211500.02025-05-21 02:53:13
Note

When multiple transformation rules are applied on a single column then AWS will consider only the first rule that is applied and rest others are ignored. Incase if multiple transformation rules needed to applied on a column then consider using Rename Column Transformation rule

Please follow the below animation as a reference to create a dataflow. JDBC Dataflow create

After successful creation, the dataflow can be started with the 'Start' option

JDBC Dataflow start

View Dataflow Details

Once the dataflow is started, the status of the dataflow will be changed to starting and once dataflow is created successfully the status will be changed to running. The latest status can be fetched with the page refresh page.

JDBC Dataflow run details

Dataflow Logs

To view data migration statistics, simply click on the dataflow name in the dataflow details section. Users can access logs by clicking the three dots in the top right corner. Please refer to the video below for a demonstration.

JDBC Dataflow log details