Dataset Lifecycle Policy
The Dataset Lifecycle Policy is a feature that helps manage objects in Amazon S3 to optimize storage costs over time. It enables users to control object transition and expiration using predefined rules that dictate how Amazon S3 handles stored objects.
Key Features:
- Cost Optimization: Objects start in the S3 Standard storage class by default, but they can be transitioned to lower-cost storage classes based on usage patterns.
- Automated Management: Users can set rules to automatically transition or delete objects, ensuring efficient storage management.
- Flexible Storage Options: For example, an object accessed once every three months can be moved to a more cost-effective storage class, even if retrieval takes longer.
For more details, refer to Amazon S3 Storage Classes.
Lifecycle Policy Rules
There are two types of rules that can be configured when a lifecycle policy is enabled:
-
Transition Rules:
- Define the number of days after which objects in the dataset transition to a specified storage class based on their usage frequency.
- The transition period is calculated from the moment objects are uploaded.
-
Expiration Rules:
- Specify the number of days before objects expire and are permanently deleted from the dataset.
- The expiration period is calculated from the moment objects are added.
By using lifecycle policies, users can efficiently manage datasets, reduce storage costs, and automate data retention based on users requirements.
Important Considerations
-
Dataset Limitations:
- Users can have up to 1,000 datasets with lifecycle rules. If this limit is exceeded, users must delete an existing lifecycle policy before enabling a new one.
-
Effect of Restoring Temporarily Deleted Files:
- Restored files are treated as new objects, and their Upload Date metadata is updated.
- As a result, the lifecycle rule will be applied based on the restore date instead of the original upload date.
- The storage class of the file does not remain the same after restoration.
-
Storage Class Transition Rules:
- One-way transition: Files can only move forward in storage class hierarchy and cannot revert to a previous class.
- For details, refer to AWS Documentation.
-
GLACIER & DEEP_ARCHIVE Restrictions:
- Files stored in GLACIER or DEEP_ARCHIVE cannot be accessed from S3 via an ETL Job script or Amorphic-UI.
- Users cannot temporarily delete files from these storage classes, but permanent deletion and truncation are allowed.
-
Monitoring & Cost Considerations:
- Objects smaller than 128 KB are not monitored and remain in the Frequent Access tier.
- For more details, see S3 Intelligent-Tiering.
- S3 Standard-IA & S3 One Zone-IA are cost-efficient for objects larger than 128 KB stored for at least 30 days.
- Objects smaller than 128 KB are billed as 128 KB.
- Early deletions (before 30 days) are charged for 30 days.
- For pricing, check Amazon S3 Pricing.
-
Bulkload v1 Connection Limitation:
- Lifecycle policies cannot be enabled for bulkload v1 type connections, as doing so creates a new dataset.
- Users must manually edit the dataset to enable lifecycle policies.
-
Lifecycle Policy State & File Operations:
- When a dataset's LifeCyclePolicyStatus is in Enabling, Disabling, or Deleting state:
- File deletion is not allowed.
- The truncate feature will not work.
- Users must wait for the lifecycle policy to be fully applied before deleting any files.
- When a dataset's LifeCyclePolicyStatus is in Enabling, Disabling, or Deleting state:
Enable/Disable Dataset Lifecycle Policy
Users can enable or disable the lifecycle policy of a dataset at any time, as demonstrated in the GIF below.
Fields in a Dataset Lifecycle Policy
-
Enable Life Cycle Policy:
- Select Yes to activate the policy or No to disable it.
- If enabled, user must define either Expiration Days, Transition Rules, or both.
-
Expiration Days:
- The number of days after which a file is automatically deleted from the dataset after being uploaded.
-
Transition Rules:
- Define rules to move files to a different storage class after a specified number of days, based on the upload date.
Bulk Management of Lifecycle Policies
Users can bulk update or delete lifecycle policies for multiple datasets using Bulk Management. For more details, refer to Bulk update/delete lifecycle policies.
Delete Dataset Lifecycle Policy
Users can delete a dataset lifecycle policy at any time by clicking the Delete Life-Cycle Policy button.
Notification Alerts & Error Handling
- If an error occurs while enabling, disabling, or deleting a lifecycle policy, an
ErrorMessage
will be displayed under theLife Cycle Policy Details
section on the dataset details page. The policy will then be reverted to its previous state. - Users subscribed to email alerts in the Amorphic application will receive an email notification after each lifecycle policy operation (enable, disable, delete).
Dataset Lifecycle Use Case
A company has a large dataset of customer information stored in Amazon S3. Since the data is frequently accessed and updated, it is initially stored in the S3 Standard storage class for fast retrieval.
Optimizing Storage Costs
- After a certain period, some data is no longer frequently accessed.
- Instead of keeping it in S3 Standard, the company transitions these objects to a cheaper storage class (e.g., S3 Standard-IA or Glacier).
- This reduces storage costs while ensuring data remains accessible when needed.
Seamless Data Retrieval
- If the archived data is required again, it can be moved back to S3 Standard with minimal effort.
- This approach helps the company balance cost-efficiency and accessibility.
This use case highlights how the Dataset Lifecycle Policy helps optimize storage costs while maintaining data availability as per business needs.