Skip to main content
 print this page

Schedules

Amorphic Schedules automate processes and jobs in Amorphic. Users can configure custom schedules based on specific requirements.

Creating a Schedule

To create a schedule:

  1. Navigate to the Resource Details page.
  2. Click on the Schedules tab.
  3. Select Add New Schedule and provide the required details

Schedules Home Page

Schedule configuration

TypeDescription
Schedule NameA unique name that identifies the schedule's specific purpose.
Job TypeUsers can select a job type from the dropdown list (details provided in the Job Type table below).
Schedule TypeTwo types of schedules are available:
  • Time-based – Executes the schedule at a specified time.
  • On-Demand – Runs the schedule as needed.
Schedule ExpressionRequired for time-based schedules. Supports rate or cron expressions (e.g., Every 15 minutes, Daily, cron(0 12 * * ? *), etc.).

Job Types

Job TypeDescription
ETL JobSchedules an ETL job.
JDBC CDCSynchronizes data between a data warehouse and S3 for Dataflows with Change Data Capture (CDC) process type. Only tasks with "SyncToS3" set to "yes" are visible for scheduling.
Data IngestionSchedules a data ingestion job for JDBC, S3, and external API data sources.
JDBC FullLoadSchedules a JDBC Bulk Data Load full-load task.
DataPipelinesSchedules a DataPipeline execution.
Data Quality ChecksSchedules a data quality check for a dataset.
HCLS-StoreSchedules an import job for HealthLake Store, Omics Storage: Sequence Store, Omics Analytics: Variant Store, Annotation Store, and HealthImaging Store.
Health Image Data ConversionSchedules a job which converts DICOM files in a dataset to NDJSON format and stores them in a different dataset.
Export to S3Schedules export to s3 operation for a dynamoDB Dataset .

Supported Schedules Types by Resource

Resource TypeSchedule Job Type
DatasetsData Ingestion, Data Quality Checks, Export to S3, Health Image Data Conversion
DatasourcesJDBC CDC, JDBC FullLoad
DataPipelinesDataPipelines
HCLS omics analyticsHCLS-Store
HCLS omics storageHCLS-Store
HCLS healthlakeHCLS-Store
HCLS health imagingHCLS-Store
JobsETL Job
info

If the schedule job type is 'Data Ingestion' and the dataset is of type 'reload', the execution will automatically reload the data.

  • Data Ingestion

    Used to schedule a data ingestion job for supported data sources.

    Supported Arguments

    • For JDBC Datasource Schedules

      • NumberOfWorkers : Specifies the number of worker nodes allocated for the Glue job. (Valid range: 2–100).
      • WorkerType: Specifies the worker type (computing resources) to use for the jobs. The worker type determines the amount of memory, CPU, and overall processing power allocated to each worker. Allowed values are Standard, G.1X, G.2X only.
      • query : Allows users to specify a SQL SELECT query and ingest the data from source database retrieved by that SQL command.
      • prepareQuery : Specifies a prefix that will form the final SQL query together with query argument. This argument offers a way to run such complex queries. Read here for more information.
    • For S3 and Ext-API Datasource Schedules

      • MaxTimeOut : Overrides the default timeout setting of the datasource for the specific schedule (Valid range: 1–2880).
      • MaxCapacity : Defines the number of AWS Glue data processing units (DPUs) that can be allocated when the job runs. (Allowed values: 1, 0.0625).
      • FileConcurrency : Applicable to S3 data sources, determining the number of parallel file uploads.
  • Health Image Data Conversion

    This schedule type converts DICOM files in a dataset to NDJSON format in order to upload it to Healthlake store, which only supports NDJSON file formats while importing data.

    • Input dataset for these jobs must contain DICOM files.
    • User have to specify output dataset id in arguments with key outputDatasetId and its value should be id of a valid dataset with Target Location as S3 and file type as others.
    • Converted NDJSON files will be stored into the specified output dataset.
    • An optional argument selectFiles can have the following values:
      • latest (default) – Selects only files uploaded after the last job run.
      • all – Selects all files for conversion.

Schedule Details

Schedule details

After creating a schedule, it will be listed on the Schedules page for the resource. Users can perform various actions, such as running, disabling, enabling, editing, cloning, or deleting the schedule.

Running a Schedule

Schedule run

Users can run a schedule by clicking the Run Schedule button on the schedule details page. To check the execution details, users can click on the schedule details, which provide information on whether the job is running, completed successfully, or failed.

Schedule execution

info
  • Schedule execution will fail if the related S3 datasource is using any of Amorphic S3 buckets as source. For ex: <projectshortname-region-accountid-env-dlz>
  • For Data Ingestion Schedules, the following arguments can be provided during schedule runs:
    • MaxTimeOut: This argument allows users to override the timeout setting of the datasource for the specific run. It accepts values from 1 to 2880.
    • FileConcurrency: This argument enables users to configure the number of parallel file ingestion that occur for S3 datasource. It accepts values from 1 to 100 and has a default value of 20.

Schedule use case

When the schedule execution is completed, an email notification will be sent out, based on the notification setting and schedule execution status. Users can also view the execution logs of each schedule run, which includes Output Logs, Output Logs (Full), and Error Logs.

For example, if User needs to create a schedule that runs an ETL job and sends out important emails every 4 hours, user can create a DataPipeline with an ETL Job Node followed by a Mail Node. This workflow can then be scheduled to run every 4 hours, every day.

info

to see how to create schedules on resource, check How to create schedules