Skip to main content
 print this page

Creating a Parquet dataset

Amorphic supports dataset creation for Parquet on the following Target types,

  • Redshift
  • S3Athena
  • S3
  • LakeFormation
Note
  • AuroraMySQL isn't currently available for Parquet type dataset
  • The Sample file to be uploaded during dataset creation should be of CSV format

Following are the steps, as shown below, to create a Parquet dataset,

  • Go to create dataset page, select any one of creation method out of Use Template, Import JSON and Create From Scratch.

    parquet

  • Creating a parquet dataset through 'Use Template'

    • Inside Use Template we have already have iceberg_s3athena_dataset_system_template with file type parquet.

      parquet

    • After choosing required template, we need to fill domain, dataset name, ingestion type while other metadata will be pre-filled.

      parquet

    • In the second page, Upload Schema, you'd be asked to upload a sample file to extract schema for the dataset. The sample file, should be of CSV file type even for Parquet type datasets.

      parquet

    • After uploading the sample file, click view details to review to the extracted schema and if the schema looks as per requirement, click Publish Dataset

      parquet

  • Creating parquet dataset through 'Import JSON'

    • Inside Import JSON we upload a json file that contains required metadata for parquet dataset creation.

      parquet

    • After choosing required template, we need fill domain, dataset name, ingestion type while other metadata will be pre-filled.

      parquet

    • In the second page, Upload Schema, you'd be asked to upload a sample file to extract schema for the dataset. The sample file, should be of CSV file type even for Parquet type datasets.

      parquet

    • After uploading the sample file, click view details to review to the extracted schema and if the schema looks as per requirement, click Publish Dataset

      parquet

  • Creating parquet dataset from scratch

    • Go to create from scratch, select File Type as Parquet and fill out the appropriate values to remaining fields as per your requirement

    parquet

    • In the second page, Upload Schema, you'd be asked to upload a sample file to extract schema for the dataset. The sample file, should be of CSV file type even for Parquet type datasets.

    parquet

    • After uploading the sample file, click next to review to the extracted schema and if the schema looks as per requirement, click Publish Dataset