Skip to main content
Version: v3.0 print this page

Datasets

S3 Athena Datasets

1. Empty Data for Numeric or Non-String Columns

Error message: Data parsing failed, empty field data found for non-string column

Issue description: When the user tries to load empty/null values for non-string columns, the load process fails throwing a data validation failed error message.

Explanation: This issue arises because the file parser does not currently support null or empty values for non-string fields. As per the documentation, one workaround is to import the data as string columns and create dataset views on top of it, casting them to the required data types.

2. File parsing:

Error message: Data validation failed with message, new-line character seen in unquoted field - do you need to open the file in universal-newline mode?

Issue description: This is one of the data validation errors which occurs while loading the data into Dataset.

Explanation: This error message is thrown by the file parser which currently does not support the usage of embedded line breaks in the csv/tsv/xlsx file. Please follow the documentation. Possible solution of this is to perform a regex replace on in-appropriate new line or carriage return characters in the file.

3. Field validations:

Error message: N/A

Issue description: Validations not available for all data types

Explanation: Currently data type validations are only limited primitive types Strings/Varchar, Integers, Double, Boolean, Date and Timestamp. Support of complex structures is yet to be added. Moreover for data types like Date and Timestamp, value formats are not strictly validated as they are multi formatted.

4. Data Profiling:

Error message: Data Profiling Failing with Failed to create any executor tasks Error

Issue description: This error occurs when there are insufficient IP addresses available in the subnet for the Data Profiling Glue job execution.

Explanation: The job fails to start if the number of required IP addresses exceeds those available in the subnet. Re-triggering the Data Profiling Job for that dataset after some time can resolve this issue, as IP addresses used by other job executions will become available again shortly after completion.


Dataset Views

  1. Issue Description: Unable to create a dataset view using another dataset view as its source when dataset type is set to 'view'.

    Affected Versions: 2.6, 2.6.1

    Explanation: When attempting to create a dataset view with an existing dataset view as its source, the operation results in a 'GE-1008 - Could not complete the request. Please try again.' error.

    Workaround: Initially create the dataset view with a SQL statement that references an invalid or non-existent table or dataset view. This will result in the dataset view being created but placed in a failed or invalid state due to the missing source. Once the dataset view metadata is in place, you can subsequently update it by modifying the SQL statement to reference the correct source dataset view . This approach allows the dataset view to be successfully updated and function as expected. In the case of materialized dataset views, the process is similar, but you will have to use the API to update the dataset view as update of materialized dataset view is not supported from the UI.

  2. Issue Description: Users losing access to dataset views after making a dataset views listing call.

    Affected Versions: 2.6, 2.6.1

    Explanation: This issue arises during a metadata consistency check during the dataset view listing call, where user-accessible dataset views are retrieved from the groups-dataset-views table and validated against the dataset views table. If a dataset view is present in the groups-dataset-views table but not found in the dataset views table, it is flagged and removed. Due to DynamoDB limitations, such as the 16 MB data return limit for a single BatchGetItem request and the 1 MB per-partition read limit, retrieval may be truncated, resulting in an incomplete item set. This can lead to dataset views being incorrectly identified as missing and subsequently removed from the groups-dataset-views table, causing users to lose access to them. This situation is more likely if users have a large number of dataset views with very long SQL statements (around 400 KB each).

    Workaround: Contact Amorphic support to get a fix for this issue.

  3. Issue Description: Dataset View listing call fails when trying to retrieve a large number of dataset views on one page.

    Affected Versions: 2.6, 2.6.1

    Explanation: The dataset view listing call fails when attempting to retrieve more items than the default limit in a single call. For example, with a default page limit of 12 items, users attempting to fetch 24, 50, or 100 items per page, especially with dataset views that contain large SQL statements, may experience a failure. This occurs because the system cannot handle payloads that exceed 6 MB.

    Workaround: Reset the application and log in again. Then, try listing fewer items per page to avoid exceeding the payload size limit.

  4. Issue Description: Unable to update dataset views when the dataset view has a large SQL statement (greater than 256 KB).

    Affected Versions: 2.6, 2.6.1, 2.7

    Explanation: Updating a dataset view with a large SQL statement fails with the error message GE-1008 - Could not complete the request. Please try again. This happens because the backend processes the update via an asynchronous Lambda invocation, which supports a maximum payload size of 256 KB. If the SQL statement exceeds this limit, the update fails.

    Workaround: Create a new dataset view with updated sql statement