Avro Input
Apache Avro is a data serialization system. The Avro Input step decodes binary or JSON Avro data and extracts fields from the structure it defines. This step extracts data from an Avro file to be used in the PDI stream.
General
The following fields and button are general to this transformation step:
Field | Description |
---|---|
Step Name | Specifies the unique name of the Avro Input step on the canvas. You can customize the name or leave it as the default. |
Location |
Indicates the file system or specific cluster where the source file you want to input is located. Options are as follows:
|
Folder/File Name |
The fullly qualified URL of the source file name for the input fields.
|
Preview | Display the rows generated by this step. |
Options
The Avro Input transformation step features several tabs with fields. Each tab is described below.
Fields Tab
The table in the Fields tab defines the following input fields from the Avro source:
Field | Description |
---|---|
Path |
The location of the Avro source |
Name | The name of the input field |
Type | The type of the input field, such as ‘String’ or ‘Date’ |
The default format mask for the date type is yyyy-MM-dd. The default format mask for the timestamp type is yyyy-MM-dd HH:mm:ss.SSS. If the data stored is any other format, and was stored as a string data type, it will not be possible to retrieve the column data. In that case, null will be returned for that column.
You can manually define the fields in the table, or you can click Get Fields to populate them from the incoming PDI stream.
Schema Tab
This tab includes the following field to define the source for your Avro schema:
- File name: Specify the Avro schema file by entering its path as a fully qualified URL (file:///C:/avro-output-schema for example) or by clicking Browse. A separate schema file is not required. If you do not specify the schema file, PDI will attempt to retrieve the fields from the embedded schema in the Avro data file.
Metadata Injection Support
All fields of this step support metadata injection. You can use this step with ETL Metadata Injection to pass metadata to your transformation at runtime.