Data loss when uploading files with the same name from different folders
Files with identical names from different folders may not upload correctly, causing some of your data to be lost during S3 ingestion.
Affected Versions: 2.7
3.0
Fix Version: 3.1
Root cause(s)
When you upload files with the same name from different folders in your S3 bucket, Amorphic doesn't always process all of them correctly.
For example, if your S3 bucket has these files:
reports/2025-01-01/daily_sales.csv
reports/2025-01-02/daily_sales.csv
reports/2025-01-03/daily_sales.csv
Amorphic sees them all as just daily_sales.csv
and treats them as the same file. This means only the last file gets processed successfully - the others get overwritten and lost.
This happens because the system focuses on the filename only and ignores which folder each file came from. When multiple files are processed at the same time, they conflict with each other and some data gets lost.
Impact
- Your data goes missing: Files with the same name get overwritten, and you lose data without any warning
- Unpredictable results: Which files get processed depends on timing, so results vary each time you run ingestion
- Misleading error messages: System provides file not found error message for the ingestion exception.
- Affects all S3 connections: This happens whether you use access keys or bucket policies to connect to S3
- Historical data loss: You might lose important historical records stored in date-organized folders
Mitigation
A fix is available in Amorphic version 3.1 that will ensure all your files are processed correctly, even when they have the same name.
What you can do right now:
- Make filenames unique: Add dates, timestamps, or folder names to your filenames (like
daily_sales_2025-01-01.csv
) - Check your data: Review your ingestion results to make sure all expected data is present
- Process folders one at a time: If possible, upload files from one folder at a time instead of all at once
- Monitor carefully: Keep a close eye on your data uploads until the fix is available
What's available in version 3.1:
- All files will be processed correctly, even if they have the same name
- Better tracking to ensure no data is lost
Timeline
- 2025-08-05: Users reported missing data after S3 ingestion
- 2025-08-07: Problem identified and analyzed - files with same names overwriting each other
- 2025-08-14: Solution developed and completed to handle identical filenames properly
- 2025-08-15: Solution available in version 3.1