Skip to content

Configuration section: drop_zone_files

Description

The structure of delivered Source Data Files is described in the Drop Zone Folder section. The configuration for advanced pre-processing of Source Data Files (e.g. decryption, file pre-processing, field type coercion) is provided through the drop_zone_files section of the configuration.

Specification

file_name (string, required)

The name of the file this configuration is applied to, e.g. employees.xlsx

data_tables (array, required)

A "Data Table" can be either a single sheet in an Excel file or a single JSON or a CSV file. Therefore, an Excel file can contain one or more Data Tables but a JSON or a CSV file always contains a single Data Table.

The configuration for respective source files shall be as follows

  • CSV and JSON files: A single entry in data_tables, has neither sheet_index nor sheet_name
  • Excel files: One or more entries in data_tables, all entries have either sheet_index or sheet_name.

At least one data_tables entry is required; an empty array is invalid and will be rejected.

data_frame_name (string, required)

The name of the Data Frame which can be referenced in later steps of the pipeline

header_row (integer, optional)

The row number which contains the header (first row is indicated with 1). If not specified then 1 is assumed.

data_start_row (integer, optional)

The first row which contains data. If not specified then 2 is assumed.

sheet_name (string, optional)

The name of the Excel sheet this configuration shall be applied to. Only applicable for Excel files. Cannot be specified with sheet_index.

sheet_index (integer, optional)

The index of the Excel sheet this configuration shall be applied to (first sheet is indicated with 1). Only applicable for Excel files. Cannot be specified with sheet_name.

pgp_configuration (optional)

A single object with:

pgp_key_secret_name (string, required) - The name of the secret that contains the PGP private key.

pgp_key_passphrase_secret_name (string, optional) - The name of the secret that contains the password for the PGP private key.

excel_password_configuration (object, optional)

password_secret_name (string, required) - The name of the secret that contains the password for an encrypted Excel file.

Example

json
{
  "version": 1,
  "drop_zone_files": [
    {
      "file_name": "employees-1.xlsx",
      "data_tables": [
        {
          "data_frame_name": "employees_1",
          "sheet_index": 1
        }
      ]
    },
    {
      "file_name": "employees-2.xlsx",
      "excel_password_configuration": {"password_secret_name": "excel-pass"},
      "data_tables": [
        {
          "data_frame_name": "employees_2",
          "sheet_name": "Workers"
        }
      ]
    },
    {
      "file_name": "employees-3.xlsx.pgp",
      "pgp_configuration": {"pgp_key_secret_name": "pgp-key", "pgp_key_passphrase_secret_name": "pgp-pass"},
      "data_tables": [
        {
          "data_frame_name": "employees_3",
          "sheet_name": "Workers"
        }
      ]
    },
    {
      "file_name": "employees-4.csv",
      "data_tables": [
        {
          "data_frame_name": "employees_4"
        }
      ]
    },
    {
      "file_name": "employees-5.xlsx",
      "data_tables": [
        {
          "data_frame_name": "employees_5_a",
          "field_configurations": [],
          "header_row": 1,
          "data_start_row": 2,
          "transpose": false,
          "sheet_index": 1
        },
        {
          "data_frame_name": "employees_5_b",
          "field_configurations": [],
          "header_row": 2,
          "data_start_row": 3,
          "transpose": false,
          "sheet_index": 2
        }
      ]
    }
  ]
}