Skip to main content
print this page

Files get re-ingested during s3 ingestion to append type datasets

· 2 min read
Fix Available
Fix Available
This bug has been fixed

It is observed that during consecutive runs of S3 ingestion for datasets of append type, existing files within the dataset are erroneously re-ingested.

Affected Versions: 2.3 2.2 2.1 2.0

Fix Version: 2.4

Root cause(s)

The system used Etags of S3 files to determine file existence. However, due to the Etags not being the MD5 hash for larger files, different Etags were generated, causing failed comparisons and resulting in the ingestion of duplicate files.

Impact

This issue results in a failure to accurately identify previously ingested files, leading to their inadvertent re-ingestion. This recurrence may cause duplication of files, impacting data integrity and overall system efficiency.

Mitigation

Fix available

A fix is available in Amorphic v2.4. Please upgrade to the latest version to resolve this issue.

Timeline

gantt
title Timeline
dateFormat YYYY-MM-DD
tickInterval 5day
axisFormat %b-%d
todayMarker off
section Tracker
%% update the ticket number and date of bug report
CLOUD-3937: done, 2023-09-11, 0d
section Identification
Reported : crit, 2023-09-11, 1d
section Mitigation
%% Update number of days took for each step below
Bug fixed: milestone, 2023-10-05, 1d
section Delivery
%% update the date of each step below
testing complete: milestone, 2023-10-06, 1d
  • 2023-09-11: Bug reported/identified (CLOUD-3937)
  • 2023-09-11: Bug triaged
  • 2023-10-05: Bug fixed
  • 2023-10-06: Testing completed and fix is available