Skip to content

Workflows

What is a Workflow?

A Workflow in YuzeData is a data integration process that orchestrates the flow of data between systems. Workflows connect datapoints, connectors, and master data to automate your data operations.

Each workflow is built from workflow steps - reusable steps that perform specific operations like fetching data from an API, transforming values, or producing datapoints.

Workflows and Workflow Steps

Workflows and workflow steps work together:

Workflow with workflow steps

ConceptDescription
WorkflowThe container that defines the overall data integration process. Created from scratch for your specific needs.
workflow stepA step within a workflow. Built from templates that define what the step can do.

Workflow Step Templates and Instances

Workflow steps follow a template/instance pattern:

  • workflow step template: A blueprint defining a type of operation
  • workflow step instance: A configured step in your workflow, created from a template with your specific settings

When you add a step to a workflow, you select a workflow step template and configure it to create an instance. This allows you to reuse the same template multiple times with different configurations.

Example Templates

Common workflow step templates include:

TemplateDescription
Pull data feedPulls data via a configured connector into the platform as a feed
Push data feed upstreamPushes a datafeed upstream via a configured connector
Pull and push dataPulls data from a source system and immediately pushes it to a target system without storing it on the platform
FilterFilters datapoints and creates a new stream from them
Datapoint aggregationConsumes datapoints and aggregates them
DeduplicationTakes data from a source bucket and adds it to a target bucket while ensuring no duplicate records
LookupRetrieves data from an external system using a configured connector
YuzeScriptExecutes a YuzeScript expression for custom transformations
Execute ConnectorExecutes a connector operation as fire-and-forget (useful for one-time queries or scheduled operations)
Import master dataImports data from a connector and stores results as master data
Export master dataExports master data to a connector
Map Master data itemsMaps master data items between different systems by adding mapping identifiers from external systems

Triggers

Workflow steps can be triggered in three ways:

TriggerDescription
ScheduleRuns automatically at specified intervals (e.g., every hour, daily at midnight)
Data FeedTriggered when new datapoints arrive in a consumed feed
Parent workflow stepTriggered by another workflow step, enabling workflow chaining

Schedule Trigger

Schedule-based triggers run a workflow step at regular intervals. You can configure schedules like:

  • Every 15 minutes
  • Every hour
  • Daily at a specific time
  • Weekly on specific days

Data Feed Trigger

When a workflow step is configured to consume a datapoint feed, it can be set to trigger automatically when new datapoints arrive. This creates a reactive pipeline where data flows through workflow steps as it becomes available.

Parent Workflow Step Trigger

A workflow step can be configured to run after another workflow step completes. This enables chaining multiple steps together, where the output of one step feeds into the next.

Consuming and Producing Data

Workflow steps process data by consuming input and producing output. This is how data flows through your workflows.

Consuming Data

A workflow step can consume data from several sources:

SourceDescription
Datapoint FeedReads datapoints from a bucket with a specific schema
ConnectorFetches data directly from an external system via a connector operation
Master DataReads reference data items for enrichment or processing
NothingThe workflow step does not consume any data

When consuming from a datapoint feed, you configure:

  • Schema: Which schema's datapoints to subscribe to
  • Bucket: Which datapoint bucket to read from
  • Batch Size: How many datapoints to process per run

When consuming from a connector, you can apply schema mappings to transform the connector's input and output data to match your workflow's schemas.

Producing Data

A workflow step can either produce datapoints or produce nothing:

StrategyDescription
Datapoint FeedWrites datapoints to a bucket with a specific schema
NothingThe workflow step does not produce output (e.g., when writing directly to an external system)

When producing to a datapoint feed, you configure:

  • Schema: The structure of the output data
  • Bucket: Which datapoint bucket to write to

Chaining Workflow Steps

Consuming and producing enables chaining - connecting workflow steps together so data flows through multiple processing steps.

Workflow step chaining

  1. Workflow step A produces datapoints to bucket "metrics"
  2. Workflow step B consumes from bucket "metrics"
  3. Workflow step B processes the data and produces to bucket "alerts"
  4. Workflow step C consumes from bucket "alerts" and sends notifications

This pattern allows you to build complex data pipelines from simple, focused steps.

Settings

Each workflow step instance has settings that control its behavior. Settings are defined by the template and configured when you create an instance.

Template-Defined Settings

The workflow step template defines which settings are available. Common settings include:

Setting TypeExamples
Filter conditionsField comparisons, AND/OR logic
Aggregation rulesGroup by fields, aggregation operations
Deduplication fieldsWhich fields to use for duplicate detection
Script expressionsYuzeScript code for custom transformations

Connector Settings

When a workflow step uses a connector, you configure which connector instance and operation to use. This links the workflow step to your deployed connectors.

Capacity Settings

For workflow steps that process large amounts of data or require more resources, you can configure capacity settings to control how the workflow step executes.

Execution Preference

ModeDescription
In-ProcessRuns within the standard processing infrastructure (default)
Out-of-ProcessRuns in a dedicated container with configurable resources

Compute Specifications

When using out-of-process execution, you can select a compute specification:

SpecificationCPUMemory
Small0.25 cores0.5 GB
Medium1 core2 GB
Large2 cores4 GB
Extra Large4 cores8 GB

Use higher capacity settings for workflow steps that:

  • Process large batches of datapoints
  • Perform complex transformations
  • Call external systems with slow response times