Appearance
Workflows
What is a Workflow?
A Workflow in YuzeData is a data integration process that orchestrates the flow of data between systems. Workflows connect datapoints, connectors, and master data to automate your data operations.
Each workflow is built from workflow steps - reusable steps that perform specific operations like fetching data from an API, transforming values, or producing datapoints.
Workflows and Workflow Steps
Workflows and workflow steps work together:
| Concept | Description |
|---|---|
| Workflow | The container that defines the overall data integration process. Created from scratch for your specific needs. |
| workflow step | A step within a workflow. Built from templates that define what the step can do. |
Workflow Step Templates and Instances
Workflow steps follow a template/instance pattern:
- workflow step template: A blueprint defining a type of operation
- workflow step instance: A configured step in your workflow, created from a template with your specific settings
When you add a step to a workflow, you select a workflow step template and configure it to create an instance. This allows you to reuse the same template multiple times with different configurations.
Example Templates
Common workflow step templates include:
| Template | Description |
|---|---|
| Pull data feed | Pulls data via a configured connector into the platform as a feed |
| Push data feed upstream | Pushes a datafeed upstream via a configured connector |
| Pull and push data | Pulls data from a source system and immediately pushes it to a target system without storing it on the platform |
| Filter | Filters datapoints and creates a new stream from them |
| Datapoint aggregation | Consumes datapoints and aggregates them |
| Deduplication | Takes data from a source bucket and adds it to a target bucket while ensuring no duplicate records |
| Lookup | Retrieves data from an external system using a configured connector |
| YuzeScript | Executes a YuzeScript expression for custom transformations |
| Execute Connector | Executes a connector operation as fire-and-forget (useful for one-time queries or scheduled operations) |
| Import master data | Imports data from a connector and stores results as master data |
| Export master data | Exports master data to a connector |
| Map Master data items | Maps master data items between different systems by adding mapping identifiers from external systems |
Triggers
Workflow steps can be triggered in three ways:
| Trigger | Description |
|---|---|
| Schedule | Runs automatically at specified intervals (e.g., every hour, daily at midnight) |
| Data Feed | Triggered when new datapoints arrive in a consumed feed |
| Parent workflow step | Triggered by another workflow step, enabling workflow chaining |
Schedule Trigger
Schedule-based triggers run a workflow step at regular intervals. You can configure schedules like:
- Every 15 minutes
- Every hour
- Daily at a specific time
- Weekly on specific days
Data Feed Trigger
When a workflow step is configured to consume a datapoint feed, it can be set to trigger automatically when new datapoints arrive. This creates a reactive pipeline where data flows through workflow steps as it becomes available.
Parent Workflow Step Trigger
A workflow step can be configured to run after another workflow step completes. This enables chaining multiple steps together, where the output of one step feeds into the next.
Consuming and Producing Data
Workflow steps process data by consuming input and producing output. This is how data flows through your workflows.
Consuming Data
A workflow step can consume data from several sources:
| Source | Description |
|---|---|
| Datapoint Feed | Reads datapoints from a bucket with a specific schema |
| Connector | Fetches data directly from an external system via a connector operation |
| Master Data | Reads reference data items for enrichment or processing |
| Nothing | The workflow step does not consume any data |
When consuming from a datapoint feed, you configure:
- Schema: Which schema's datapoints to subscribe to
- Bucket: Which datapoint bucket to read from
- Batch Size: How many datapoints to process per run
When consuming from a connector, you can apply schema mappings to transform the connector's input and output data to match your workflow's schemas.
Producing Data
A workflow step can either produce datapoints or produce nothing:
| Strategy | Description |
|---|---|
| Datapoint Feed | Writes datapoints to a bucket with a specific schema |
| Nothing | The workflow step does not produce output (e.g., when writing directly to an external system) |
When producing to a datapoint feed, you configure:
- Schema: The structure of the output data
- Bucket: Which datapoint bucket to write to
Chaining Workflow Steps
Consuming and producing enables chaining - connecting workflow steps together so data flows through multiple processing steps.
- Workflow step A produces datapoints to bucket "metrics"
- Workflow step B consumes from bucket "metrics"
- Workflow step B processes the data and produces to bucket "alerts"
- Workflow step C consumes from bucket "alerts" and sends notifications
This pattern allows you to build complex data pipelines from simple, focused steps.
Settings
Each workflow step instance has settings that control its behavior. Settings are defined by the template and configured when you create an instance.
Template-Defined Settings
The workflow step template defines which settings are available. Common settings include:
| Setting Type | Examples |
|---|---|
| Filter conditions | Field comparisons, AND/OR logic |
| Aggregation rules | Group by fields, aggregation operations |
| Deduplication fields | Which fields to use for duplicate detection |
| Script expressions | YuzeScript code for custom transformations |
Connector Settings
When a workflow step uses a connector, you configure which connector instance and operation to use. This links the workflow step to your deployed connectors.
Capacity Settings
For workflow steps that process large amounts of data or require more resources, you can configure capacity settings to control how the workflow step executes.
Execution Preference
| Mode | Description |
|---|---|
| In-Process | Runs within the standard processing infrastructure (default) |
| Out-of-Process | Runs in a dedicated container with configurable resources |
Compute Specifications
When using out-of-process execution, you can select a compute specification:
| Specification | CPU | Memory |
|---|---|---|
| Small | 0.25 cores | 0.5 GB |
| Medium | 1 core | 2 GB |
| Large | 2 cores | 4 GB |
| Extra Large | 4 cores | 8 GB |
Use higher capacity settings for workflow steps that:
- Process large batches of datapoints
- Perform complex transformations
- Call external systems with slow response times
