Appearance
Configuration section: drop_zone_files
Description
The structure of delivered Source Data Files is described in the Drop Zone Folder section. The configuration for advanced pre-processing of Source Data Files (e.g. decryption, file pre-processing, field type coercion) is provided through the drop_zone_files section of the configuration.
Specification
file_name (string, required)
The name of the file this configuration is applied to, e.g. employees.xlsx
data_tables (array, required)
A "Data Table" can be either a single sheet in an Excel file or a single JSON or a CSV file. Therefore, an Excel file can contain one or more Data Tables but a JSON or a CSV file always contains a single Data Table.
The configuration for respective source files shall be as follows
CSVandJSONfiles: A single entry indata_tables, has neithersheet_indexnorsheet_name- Excel files: One or more entries in
data_tables, all entries have eithersheet_indexorsheet_name.
At least one data_tables entry is required; an empty array is invalid and will be rejected.
data_frame_name (string, required)
The name of the Data Frame which can be referenced in later steps of the pipeline
header_row (integer, optional)
The row number which contains the header (first row is indicated with 1). If not specified then 1 is assumed.
data_start_row (integer, optional)
The first row which contains data. If not specified then 2 is assumed.
sheet_name (string, optional)
The name of the Excel sheet this configuration shall be applied to. Only applicable for Excel files. Cannot be specified with sheet_index.
sheet_index (integer, optional)
The index of the Excel sheet this configuration shall be applied to (first sheet is indicated with 1). Only applicable for Excel files. Cannot be specified with sheet_name.
pgp_configuration (optional)
A single object with:
pgp_key_secret_name (string, required) - The name of the secret that contains the PGP private key.
pgp_key_passphrase_secret_name (string, optional) - The name of the secret that contains the password for the PGP private key.
excel_password_configuration (object, optional)
password_secret_name (string, required) - The name of the secret that contains the password for an encrypted Excel file.
Example
json
{
"version": 1,
"drop_zone_files": [
{
"file_name": "employees-1.xlsx",
"data_tables": [
{
"data_frame_name": "employees_1",
"sheet_index": 1
}
]
},
{
"file_name": "employees-2.xlsx",
"excel_password_configuration": {"password_secret_name": "excel-pass"},
"data_tables": [
{
"data_frame_name": "employees_2",
"sheet_name": "Workers"
}
]
},
{
"file_name": "employees-3.xlsx.pgp",
"pgp_configuration": {"pgp_key_secret_name": "pgp-key", "pgp_key_passphrase_secret_name": "pgp-pass"},
"data_tables": [
{
"data_frame_name": "employees_3",
"sheet_name": "Workers"
}
]
},
{
"file_name": "employees-4.csv",
"data_tables": [
{
"data_frame_name": "employees_4"
}
]
},
{
"file_name": "employees-5.xlsx",
"data_tables": [
{
"data_frame_name": "employees_5_a",
"field_configurations": [],
"header_row": 1,
"data_start_row": 2,
"transpose": false,
"sheet_index": 1
},
{
"data_frame_name": "employees_5_b",
"field_configurations": [],
"header_row": 2,
"data_start_row": 3,
"transpose": false,
"sheet_index": 2
}
]
}
]
}