ποΈ Intro
The Amorphic Dataset portal enables the creation of unstructured, semi-structured, and structured datasets while providing comprehensive data lake visibility.
ποΈ Edit Schema
The Edit Schema functionality allows users to modify the schema of existing datasets, providing flexibility in managing dataset structures.
ποΈ Data Profiling
Data Profiling is the process of analyzing an existing data source to gather statistics and generate summaries about the data. It helps identify anomalies, assess data quality, and gain insights into the datasetβs structure and characteristics.
ποΈ Files
The Files section provides a centralized interface for managing files that contain actual data for datasets. It allows users to view, track, and manage files associated with their datasets, along with their respective statuses.
ποΈ Dataset Lifecycle Policy
The Dataset Lifecycle Policy is a feature that helps manage objects in Amazon S3 to optimize storage costs over time. It enables users to control object transition and expiration using predefined rules that dictate how Amazon S3 handles stored objects.
ποΈ Athena Datasets
Athena Datasets allows users to store structured data in Glue tables and files in Amazon S3 and run SQL queries on the data. Data validation can be enabled to check for corrupt data, and the playground can be used to run queries on the dataset.
ποΈ Delta Lake Datasets
In Amorphic, User can create Delta Lake datasets with Lakeformation target location which creates Delta Lake table in the backend to store the data.
ποΈ DynamoDB Datasets
Amorphic DynamoDB Datasets refer to structured data stored as key-value pairs, acting as a single, reliable source of truth across all organizational departments.
ποΈ External Datasets
External datasets in Amorphic allow users to directly consume their existing data stored in S3 buckets, without the need to ingest it into a new Amorphic dataset.
ποΈ View Type Datasets
View Type Datasets in Amorphic are a specialized type of dataset that allows users to create structured representations of data. These view-type datasets can be shared with other authorized users and tags within the organization, providing a flexible way to interact with data.
ποΈ Hudi Datasets
Introduction
ποΈ Iceberg Datasets
In Amorphic, User can create Iceberg datasets with S3Athena and Lake Formation target location which creates Iceberg table in the backend to store the data.
ποΈ Lakeformation Datasets
Lakeformation extends S3-Athena datasets with added security and supports CSV, TSV, XLSX, JSON, NDJSON, JSONLand Parquet files. It also checks data integrity and offers ACID transactions, data compaction, and time-travel queries.
ποΈ Redshift Datasets
Redshift datasets allow users to create datasets in Amorphic utilizing the power of AWS Redshift. Amorphic enables users to store CSV, TSV, XLSX, JSON, NDJSON, JSONL and Parquet files in Amazon S3 with Redshift as the target location. This feature includes optional partial data validation, which is enabled by default. The validation process helps detect and correct corrupt or invalid data files while supporting a variety of data types, including:
ποΈ Data Quality Checks
Amorphic provides data quality checks to help detect errors in data before it is utilized by other systems or machine learning algorithms. Users can create rules for columns in structured datasets and run checks to identify rule violations.
ποΈ Data Retrieval API
The Amorphic Data retrieval API provides direct, query-based access to the row-level data stored within an Amorphic dataset. It allows you to retrieve the actual data contained within the source files, making it a powerful tool for previews, and integrations.
ποΈ Spatial Datasets
Spatial datasets in Amorphic enable users to store, manage, and analyze geospatial data with built-in support for various spatial data formats and coordinate reference systems. These datasets provide specialized functionality for handling geographic information, including support for Well-Known Text (WKT), Well-Known Binary (WKB), GeoJSON, and coordinate-based data.