Data Pipeline Nodes
This document provides a detailed overview of the various nodes that can be utilized within a data pipeline. Each node type is designed to perform specific tasks, enabling the integration and execution of complex data workflows. Below, you will find the necessary fields required to create each type of node.
ETL Job Node
The ETL Job Node facilitates the execution of ETL jobs, allowing for the inclusion of arguments to be utilized within the job. For instance, this node can be employed to execute an ETL job that identifies the highest paying job and its corresponding salary, as well as to trigger subsequent jobs based on the output of the initial job.
| Attribute | Description |
|---|---|
| Resource | Select an ETL job from the dropdown list of available jobs. |
| Node Name | A unique identifier for the node. |
| Input Configurations | Arguments that can be used within the job. |
ML Model Inference Node
The ML Model Inference Node is designed to run machine learning models using input arguments to make predictions or decisions. For example, this node can process customer data to predict churn probabilities, with the results forwarded to the next node in the pipeline.
| Attribute | Description |
|---|---|
| Resource | Choose a machine learning model from the list of accessible models. |
| Node Name | A unique identifier for the node. |
| Input Dataset | The dataset containing files for ML model inference. |
| Select Latest File | Automatically selects the latest file for inference if set to 'Yes'. |
| File Name Execution Property Key | Required if 'Select Latest File' is set to 'No'; specifies the file name from the dataset. (must be an execution property key) |
| Target Dataset | The dataset where inference results are saved. |
Users can perform inference on the input dataset (with a soft limit of 10,000 files) or on up to 100 selected files.