Workflows
What is an ER Workflow?
An ER (Entity Resolution) Workflow is a configurable process that performs entity resolution on input datasets to identify and merge duplicate records, ultimately generating a unified master dataset. A workflow orchestrates the entire entity resolution pipeline, from matching records across different sources to applying survivorship rules that determine which values to keep when merging records.
For example, a supplier matching workflow might take supplier records from procurement systems, finance systems, and ERP systems, match them based on business rules, and produce a single Unified Master Record for each unique supplier with the best available data from all sources.
How does an ER Workflow work?
MDM Workflows are responsible for performing entity resolution on the input datasets and ultimately generating the unified master dataset. The workflow process consists of four main stages:
-
Workflow Details: Define the workflow name, description, and entity mappings that link source datasets to their entity schemas.
-
Matching Technique: Configure how records are matched using either rule-based or machine learning-based matching, define matching rules with match keys, and specify output field configurations.
-
Survivorship Rules: Define strategies for selecting values when multiple source records are merged into a unified master record. This includes default governance strategies, source priorities, and attribute-specific rules.
-
Output Data: Configure where the resolved records are written—either to a new dataset or an existing Amorphic Gold Zone dataset.
Once configured, workflows can be run manually or automatically (depending on the processing cadence), and they process input records to identify matches, apply survivorship logic, and produce unified master records for stewardship and downstream use.
CSV input requirements
When using CSV files as workflow input:
- First row must be headers. The first row must contain the column names (schema headers). These headers are required for schema registration and attribute mapping.
- Use underscores, not spaces. In Amorphic Data Platform, column names that contain spaces are automatically converted to underscores during schema registration. Headers in your CSV should use underscores and match the schema defined in Amorphic. Using spaces (e.g.,
member id,full name) will not match the registered schema (e.g.,member_id,full_name), causing schema mapping or ingestion errors and workflow run failures.
Valid example (recommended):
member_id,full_name,patient_ref_id,policy_number,insurance_plan,effective_date,group_id
MEM-5001,mIchAEl thOrnTon mD,PID-1000,POL-99123,Premium Gold,2022-01-01,GRP-A1
Invalid example (causes schema mismatch and workflow failure):
member id,full name,patient ref id,policy number,insurance plan,effective date,group id
MEM-5001,mIchAEl thOrnTon mD,PID-1000,POL-99123,Premium Gold,2022-01-01,GRP-A1
File formats and casing
Use supported input file formats (e.g., CSV) and consistent casing for column headers. Header names are case-sensitive for schema mapping—use the same casing as in your Amorphic entity schema to avoid mapping errors.
How to create an ER Workflow
Rule-based workflow

Ml matching based workflow

Follow these steps to create an ER Workflow:
Step 1: Workflow Details
-
Enter Workflow Name: Provide a unique, descriptive name for your workflow (e.g.,
supplier_matching_workflow,customer_data_reconciliation). Use a naming convention that indicates the purpose or domain so workflows are easy to identify and manage. The workflow name can only contain upper(A-Z) and lowercase(a-z) letters, numbers(0-9), hyphens(-), and underscores(_). The Workflow name must be between 1-255 characters. -
Add Description (Optional): Optionally provide a description that explains the purpose or scope of this workflow. For example, "Primary workflow for matching supplier records from procurement and finance systems." The description can have 1-255 characters.
-
Add Entity Mapping: Add entities that link a source dataset to its entity schema for matching. You can select multiple entities, and each entity represents a mapping between a dataset and the entity structure that will be used for matching.
- Select one or more entities from the available list
- Each entity mapping connects a source dataset to its corresponding entity schema
- You can add multiple entities to process data from different sources in a single workflow
NoteRequired attribute types for ML matching workflows
While selectig a schema mapping for an ML–based matching workflow, the souce dataset must contain at least one of the columns mentioned below mapped to it's appropriate Attribute Type in the Schema Mapping.
- Any name related column mapped to attribute type FULL_NAME. OR
- Any phone number related column mapped to attribute type FULL_PHONE. OR
- Any email address related column mapped to attribute type EMAIL. OR
- Any address related column mapped to attribute type FULL_ADDRESS. OR
- Any Date of Birth related column mapped to attribute type DATE and match key as Date of Birth.
Step 2: Matching Technique
Configure how your data will be matched and processed:
Matching Method
-
Resolution Type: Choose how you want your data to be matched:
- Rule-based matching: Use customized rules to find exact matches. Suitable when you have clear business rules for matching records.
- Machine learning-based matching: Use machine learning models to help find a broader range of matches. Suitable when you need to identify matches that may not be exact but are likely the same entity.
-
Rule Type (Required for Rule-based matching): Choose the complexity level of your matching rules.
- Simple: Suitable for exact matching and schema mappings with multiple data columns mapped to the same input types. Use this when you need straightforward matching logic.
- Advanced: Suitable for fuzzy matching, exact matching, and schema mappings with data columns mapped one-to-one with input types. Use this when you need complex rule conditions with operators like
Exact(),Fuzzy(), and logical operators (AND,OR).
The current version of the application supports only Simple rule type.
-
Processing Cadence: Determine how often to run your matching workflow job:
- Manual: Your matching workflow job is run on demand. Useful for bulk processing when you want full control over when the workflow executes.
- Incremental Processing: Your matching workflow job is run on incremental demand. Useful for bulk processing of new or updated records.
- Automatic/Full Processing: Your matching workflow job is run automatically when you add or update your data inputs. Useful for incremental updates. This option is available only for rule-based matching.
-
Normalize Data (Optional): Check this option to normalize data values before matching. This is recommended for better accuracy, as it standardizes formats, removes extra spaces, and handles case variations.
Normalization is supported only for NAME, ADDRESS, PHONE, and EMAIL_ADDRESS. To normalize related attributes, assign them to the corresponding groupName in your entity schema:
- NAME: Assign
NAME_FIRST,NAME_MIDDLE, andNAME_LASTto the NAME groupName. - ADDRESS: Assign
ADDRESS_STREET1,ADDRESS_STREET2,ADDRESS_STREET3,ADDRESS_CITY,ADDRESS_STATE,ADDRESS_COUNTRY, andADDRESS_POSTALCODEto the ADDRESS groupName. - PHONE: Assign
PHONE_NUMBERandPHONE_COUNTRYCODEto the PHONE groupName.
- NAME: Assign
-
Attribute Matching: Choose how records can be matched:
- One-to-One: Each record can only match with one other record. Use this when you expect a single match per record.
- Many-to-Many: Records can match with multiple other records. Use this when a single record might legitimately match with multiple records.
-
Comparison Type (For Simple Rule Type): Choose how to compare similar data stored in different input fields:
- Single Input Field: Limit comparison within a single input field, when similar data stored across multiple input fields should not be matched.
- Multiple Input Fields: Find any combination of matches across data stored in multiple input fields, regardless of whether the data is in the same or different input field.