Skip to main content
Version: v3.3 print this page

Datasets

Amorphic supports unstructured, semi-structured, and structured Datasets while also providing comprehensive data lake visibility.

Below is the sample resource definition file for Dataset:

{
"rCicdDataset": {
"Type": "Dataset",
"Properties": {
"DatasetName": "cicd_dataset",
"DatasetDescription": "Dataset created from CICD",
"Domain": {
"!DependsOn": "rDomain.DomainName"
},
"Keywords": ["Owner: johndoe"],
"DatasourceType": "api",
"IsDataValidationEnabled": true,
"SerDe": "OpenCSVSerde",
"FileDelimiter": ",",
"FileType": "csv",
"IsDataCleanupEnabled": false,
"IsDataProfilingEnabled": true,
"LifeCyclePolicyStatus": "Disabled",
"TargetLocation": "s3athena",
"SkipFileHeader": true,
"SkipRowCount": { "header": 1, "footer": 0 },
"SkipLZProcess": false,
"TableUpdate": "append",
"DataMetricsCollectionOptions": { "IsMetricsCollectionEnabled": false },
"DatasetType": "internal",
"DatasetSchema": [
{
"name": "FirstName",
"description": "",
"type": "varchar(256)",
"is_not_null": false
},
{
"name": "LastName",
"description": "",
"type": "varchar(256)",
"is_not_null": false
}
]
}
}
}

When creating a dataset through CICD, you must provide the schema under the DatasetSchema key.

info

CICD has limitations when managing dataset schemas.

Once a dataset has been created, its schema cannot be modified via CICD.

Even if you update the schema in your configuration and re-run CICD, the pipeline logs may display an update as successful, but the schema in Amorphic will not actually be updated.

If schema changes are required after creation, they must be performed directly in the Amorphic console.