Creating a Parquet dataset

Amorphic supports dataset creation for Parquet on the following Target types,

Redshift
S3Athena
S3
LakeFormation

Note

AuroraMySQL isn't currently available for Parquet type dataset
The Sample file to be uploaded during dataset creation should be of CSV format

Following are the steps, as shown below, to create a Parquet dataset,

Go to create dataset page, select any one of creation method out of Use Template, Import JSON and Create From Scratch.
Creating a parquet dataset through 'Use Template'
- Inside Use Template we have already have iceberg_s3athena_dataset_system_template with file type parquet.
- After choosing required template, we need to fill domain, dataset name, ingestion type while other metadata will be pre-filled.
- In the second page, Upload Schema, you'd be asked to upload a sample file to extract schema for the dataset. The sample file, should be of CSV file type even for Parquet type datasets.
- After uploading the sample file, click view details to review to the extracted schema and if the schema looks as per requirement, click Publish Dataset
Creating parquet dataset through 'Import JSON'
- Inside Import JSON we upload a json file that contains required metadata for parquet dataset creation.
- After choosing required template, we need fill domain, dataset name, ingestion type while other metadata will be pre-filled.
- In the second page, Upload Schema, you'd be asked to upload a sample file to extract schema for the dataset. The sample file, should be of CSV file type even for Parquet type datasets.
- After uploading the sample file, click view details to review to the extracted schema and if the schema looks as per requirement, click Publish Dataset
Creating parquet dataset from scratch
- Go to create from scratch, select File Type as Parquet and fill out the appropriate values to remaining fields as per your requirement
- In the second page, Upload Schema, you'd be asked to upload a sample file to extract schema for the dataset. The sample file, should be of CSV file type even for Parquet type datasets.
- After uploading the sample file, click next to review to the extracted schema and if the schema looks as per requirement, click Publish Dataset

Creating a parquet dataset through 'Use Template'​

Creating parquet dataset through 'Import JSON'​

Creating parquet dataset from scratch​

Creating a parquet dataset through 'Use Template'

Creating parquet dataset through 'Import JSON'

Creating parquet dataset from scratch