๐๏ธ Intro
The Amorphic Dataset portal enables the creation of unstructured, semi-structured, and structured datasets while providing comprehensive data lake visibility.
๐๏ธ Edit Schema
The Edit Schema functionality allows users to modify the schema of existing datasets, providing flexibility in managing dataset structures.
๐๏ธ Data Profiling
Data Profiling is the process of analyzing an existing data source to gather statistics and generate summaries about the data. It helps identify anomalies, assess data quality, and gain insights into the datasetโs structure and characteristics.
๐๏ธ Files
The Files section provides a centralized interface for managing files that contain actual data for datasets. It allows users to view, track, and manage files associated with their datasets, along with their respective statuses.
๐๏ธ Dataset Lifecycle Policy
The Dataset Lifecycle Policy is a feature that helps manage objects in Amazon S3 to optimize storage costs over time. It enables users to control object transition and expiration using predefined rules that dictate how Amazon S3 handles stored objects.
๐๏ธ Athena Datasets
Athena Datasets allows users to store structured data in Glue tables and files in Amazon S3 and run SQL queries on the data. Data validation can be enabled to check for corrupt data, and the playground can be used to run queries on the dataset.
๐๏ธ Delta Lake Datasets
In Amorphic, User can create Delta Lake datasets with Lakeformation target location which creates Delta Lake table in the backend to store the data.
๐๏ธ DynamoDB Datasets
Amorphic DynamoDB Datasets refer to structured data stored as key-value pairs, acting as a single, reliable source of truth across all organizational departments.
๐๏ธ External Datasets
External datasets in Amorphic allow users to directly consume their existing data stored in S3 buckets, without the need to ingest it into a new Amorphic dataset.
๐๏ธ View Type Datasets
View Type Datasets in Amorphic are a specialized type of dataset that allows users to create structured representations of data. These view-type datasets can be shared with other authorized users and tags within the organization, providing a flexible way to interact with data.
๐๏ธ Hudi Datasets
Introduction
๐๏ธ Iceberg Datasets
In Amorphic, User can create Iceberg datasets with S3Athena and Lake Formation target location which creates Iceberg table in the backend to store the data.
๐๏ธ Lakeformation Datasets
Lakeformation extends S3-Athena datasets with added security and supports CSV, TSV, XLSX, JSON and Parquet files. It also checks data integrity and offers ACID transactions, data compaction, and time-travel queries.
๐๏ธ Redshift Datasets
Redshift datasets allow users to create datasets in Amorphic utilizing the power of AWS Redshift. Amorphic enables users to store CSV, TSV, XLSX and Parquet files in Amazon S3 with Redshift as the target location. This feature includes optional partial data validation, which is enabled by default. The validation process helps detect and correct corrupt or invalid data files while supporting a variety of data types, including:
๐๏ธ Data Quality Checks
Amorphic provides data quality checks to help detect errors in data before it is utilized by other systems or machine learning algorithms. Users can create rules for columns in structured datasets and run checks to identify rule violations.