Incorrect handling of failed data profiling job retries

April 2, 2023 · 2 min read

Fix In Progress

Bug identified and fix is in progress

Workaround Available

Temporary workaround available

When data profiling job(backend job) is failed with an unhandled exception, the job does an inordinate number of retries causing additional cost to the customer.

This issue occurs if the Amorphic deployed with single tenancy and have datasets with data profiling enabled.

Affected Versions: 1.11, 1.12, 1.13, 1.14, 2.0, 2.1

Fix Version: 2.2

Root cause(s)

Because of incorrect failed job retry configuration, data profiling job retries inordinate number of times.
Unhandled exceptions in data profiling job(scheduled backend job)
- When redshift cluster is paused, data profiling job errors out with timeout exception.

Impact

Account accrues additional cost for unnecessary job executions.

Mitigation

Workaround

Make sure the redshift cluster is in active state around the schedule of data profiling job(everyday 00:00 UTC). Or Disable the data profiling flag on datasets.

Timeline

2023-04-06: Bug reported/identified (CLOUD-3209)
2023-04-06: Bug triaged

Root cause(s)​

Impact​

Mitigation​

Timeline​

Root cause(s)

Impact

Mitigation

Timeline