Appearance
Introduction
Features
The DataBridge Engine is a lightweight ETL engine. It has built-in support for:
- File ingestion through SFTP, supports ingestion of the following file formats:
- CSV
- Excel
- JSON
- Source connectors
- Data Transformations
- Configured through SQL statements
- Configured through dedicated transformation functions
- Data Validation
- Failed records are delivered to a dead-letter-queue which can be delivered back to the user
- Destination connectors
Data Frame
A core concept for the DataBridge engine is the Data Frame. An Excel file, a CSV file, a JSON file or data from a source system is converted to a Data Frame in the first stage of the pipeline. A concise definition is:
Data Frame
Two-dimensional, tabular data structure (columns and rows), similar to a spreadsheet or a SQL table
A Data Frame name can be between 1-63 characters and only the following symbols are allowed:
a-z0-9_
As a demonstration, my-data-frame, my data frame and MY_DATA_FRAME are illegal Data Frame names, where my_data_frame is a legal data frame name. "Well" named files are transparently handled by the pipeline and it is recommended to name your files accordingly (e.g. my_data.xlsx is automatically converted to the my_data data frame.)
Data Frames can be generated with or without a dedicated configuration, with the respective Data Frame types defined as follows:
Transparent Data Frame
A Data Frame generated from a source file with a name that complies with Data Frame naming rules and is either a CSV file, JSON file or a single-sheet, unencrypted Excel file.
In simple terms, the file my_data_frame.xlsx is automatically converted to the my_data_frame for later steps in the data pipeline.
Configured Data Frame
A Data Frame generated with the assistance of a configuration section in drop_zone_files. Reasons for using a Configured Data Frame may include:
- Limited control of input data files (i.e. the input data file is named
Workers Full.xlsx) - Input data file is encrypted (Excel password or PGP encryption)
- Input data file is an Excel file with multiple sheets
- Input data file is an Excel file where the header row is in row 2 (instead of row 1 as expected)
See drop_zone_files for a full list of configuration options.
:::
Files that don't conform to Data Frame naming rules require an entry in the drop_zone_files configuration entry.
Operation
The DataBridge Engine continuously monitors SFTP areas for a file named config.json. Once a file with that name is copied to the SFTP server, a DataBridge Engine Pipeline is executed according to instructions from the configuration file.
The operation is described with the diagram below:
