Avro Output
Apache Avro is a data serialization system. Avro relies on schema for decoding binary and extracting data. The Avro output step serializes data into Avro binary or JSON format from the PDI data stream, then writes it to file.
This output step creates the following files:
- A file containing output data in Avro format
- An Avro schema file defined by the fields in this step
Fields can be defined manually or extracted from incoming steps.
General
Enter the following information in the transformation step fields:
Field | Description |
---|---|
Step Name | Specifies the unique name of the Avro Output step on the canvas. You can customize the name or leave it as the default. |
Location |
Indicates the file system or specific cluster on which the item you want to output can be found. Options are as follows:
|
Folder/File Name |
Specifies the location and/or name of the file or folder to which to write. Click Browse to display the Open File window and navigate to the file or folder.
|
Options
The Avro Output transformation step features several tabs with fields. Each tab is described below.
Fields Tab
The table in the Fields tab defines the following fields that make up the Avro schema created by this step:
Field | Description |
---|---|
Avro path | The name of the field as it will appear in the Avro data and schema files |
Name | The name of the PDI field |
Type | The data type of the field |
Default value | The default value of the field if it is null or empty |
Null | Is this field allowed to have null values? |
To avoid having the transformation fail, make sure the Default value field contains values for all fields where Null is set to No.
These fields can be defined manually, or you can click Get Fields to populate the fields from the incoming PDI stream.
Schema Tab
The following options in the Schema tab define how the Avro schema file will be created:
Option | Description |
---|---|
File Name |
Specifies the fully qualified URL (file:///C:/avro-output-schema for example) where the Avro schema file will be written. The URL may be in different format depending on file system type (Location field). If a schema file already exists, it will be overwritten. If you do not specify a separate schema file for your output, PDI will write an embedded schema in your Avro data file. |
Namespace | Specifies the name, together with the Record Name field, that defines the "full name" of the schema (‘example.avro’ for example). |
Record Name | Specifies the name of the Avro record (‘User’ for example). |
Doc Value | Specifies the documentation provided for the schema. |
Options Tab
The Compression option in this tab defines which of the following codec is used to compress blocks in the Avro output file:
- None: No compression.
- Deflate: The data blocks are written using the deflate algorithm as specified in RFC 1951, and typically implemented using the zlib library.
- Snappy: The data blocks are written using Google's Snappy compression library, and are followed by the 4-byte, big-endian CRC32 checksum of the uncompressed data in each block.
See https://avro.apache.org/docs/1.8.1/s...ontainer+Files for additional information on these codecs.
Metadata Injection Support
All fields of this step support metadata injection. You can use this step with ETL Metadata Injection to pass metadata to your transformation at runtime.