Skip to main content
Version: v3.3 print this page

ETL Jobs

ActionLimit
ETL Jobs900*

Why the limit ?

By default, Amorphic system and AWS occupies some of the underlying resources (default ETL jobs, IAM roles) which are nearly 100.

The specified maximum limit is a consolidated count of all the below Amorphic resources created in the specific environment:

  • All connections (S3, Ext-API, JDBC Normal) except JDBC bulk load connection
  • ETL Jobs
  • All Workflow nodes
  • Forecast Jobs (Consumes IAM roles)
  • DeepSearch Indices (Consumes IAM roles)
  • Glue Endpoints with Dataset Access (Consumes IAM roles)
  • ML Notebooks with Dataset Access (Consumes IAM roles)
  • ETL Notebooks with Dataset Access (Consumes IAM roles)
  • Kinesis stream consumers (Consumes IAM roles)

For example, If it is a new Amorphic deployment and no other resources are created then user can create 900 ETL Jobs. Even if user creates 900 ETL jobs there are some restrictions for the job executions based on the type of network configuration:

  • Public - No restrictions
  • App-Public – Based on the Glue Public Subnet CIDR range specified during the Amorphic deployment. If it is /24 then only 254 DPUs can run at a time (For ex: Can execute approximately 25 ETL spark jobs with 10 DPUs at once)
  • App-Private - Based on the Glue Private Subnet CIDR range specified during the Amorphic deployment. If it is /24 then only 254 DPUs can run at a time (For ex: Can execute approximately 25 ETL spark jobs with 10 DPUs at once)
ActionLimit
Maximum concurrent executions of different ETL jobs50**
Maximum concurrent executions of same ETL job1000***

** Maximum is calculated based on the AWS default limit of 1000 ETL Jobs and 1000 IAM roles in new AWS account. If both the AWS Glue job limit and IAM role limit is increased to a new higher limit then the limit will be calculated accordingly.

*** Maximum is equivalent to AWS default limit. Can be adjusted by requesting service quota increase.

For more information on the AWS Service Quotas, visit the AWS documentation.

note

IAM role policy and shared domains

When a domain is shared with an ETL job, Amorphic prefers a single domain/* wildcard in the job's IAM role policy over listing every dataset path, to keep the policy compact.

Because Iceberg datasets require a different IAM statement shape than non-Iceberg datasets, domain/* is only used when every dataset in the domain is of the same type:

  • Domain with all non-Iceberg datasets → policy uses domain/*
  • Domain with all Iceberg datasets → policy uses domain/* (under the Iceberg-shaped statement)
  • Domain with a mix of Iceberg and non-Iceberg datasets → policy falls back to individual domain_name/dataset_name/* entries, since the two statement shapes cannot share a single wildcard

If you want the shorter domain/* form on a domain that already contains an Iceberg dataset, every other dataset in that domain must also be Iceberg type. Mixing types forces the per-dataset expansion.