Skip to content

Configuration section: transformations

Description

Transformations allow arbitrary conversions of Data Frames into a desired destination format. Transformations allow running actions against multiple Data Frames with a single Data Frame as output.

Transformations can be chained as resulting Data Frames can be consumed by either Destination Adapters or Data Validations

Specification

name (string)

A descriptive name for the transformation. This name is attached to the results of this transformation step as the transformation_name in the Results File.

input_data_frames (string[])

A list with names of Data Frames that are available as input to this transformation.

output_data_frame (string)

The name of the Data Frame where transformation results are delivered.

type ("sql_inline" | "sql_file")

The type of transformation. The format of the configuration section depends on the selected type.

sql_inline

Executes a SQL based transformation (based on the DuckDB SQL dialect). Frame names from input_data_frames are treated like table names and the output from the statement is delivered to the output_data_frame.

The SQL statement is provided into the Configuration file. Convenient for simple statements but for complex statements the sql_file type is recommended.

For details, see SQL transformations.

Example configuration

json
{
  ...,
  "type": "sql_inline",
  "configuration": {"statement": "SELECT * FROM employees"}
}

sql_file

Executes a SQL based transformation (based on duckdb SQL dialect). Frame names from input_data_frames are treated like table names and the output from the statement is delivered to the output_data_frame.

The SQL statement is provided in a dedicated file. Convenient for complex, multi-line statements.

The file containing the SQL statement shall have a .sql extension and is copied to the Drop Zone Folder before the Configuration file.

For details, see SQL transformations.

Example configuration

json
{
  ...,
  "type": "sql_file",
    "configuration": {"file_name": "employee_transformation.sql"}
}

employee_transformation.sql

SELECT *
FROM
  employees

configuration (object) — depends on type

Type-specific configuration. For details, refer to the respective type section above.

Example

json
{
  "version": 1,
  "transformations": [
    {
      "name": "Generate Unique French IDs",
      "input_data_frames": ["employees_france"],
      "output_data_frame": "employees_france_unique",
      "type": "sql_inline",
      "configuration": {"statement": "SELECT * EXCLUDE empid, 'FR-' || empid AS empid"}
    },
    {
      "name": "Combine datasets",
      "input_data_frames": ["employees_france_unique", "employees_germany"],
      "output_data_frame": "employees",
      "type": "sql_file",
      "configuration": {"file_name": "combine_datasets.sql"}
    }
  ]
}