Skip to main content

Pentaho+ documentation has moved!

The new product documentation portal is here. Check it out now at docs.hitachivantara.com

 

Hitachi Vantara Lumada and Pentaho Documentation

Using the Avro Output step on the Pentaho engine

Parent article

If you are running your transformation on the Pentaho engine, use the following instructions to set up the Avro Output step.

General

Enter the following information in the transformation step fields:

FieldDescription
Step nameSpecify the unique name of the Avro Output step on the canvas. You can customize the name or leave it as the default.
Folder/File name

Specify the location and name of the file or folder. You can also click Browse to navigate to the destination file or folder through your VFS connection. See Connecting to Virtual File Systems for more information. The Avro files are created.

Overwrite existing output fileSelect to overwrite an existing file that has the same file name and extension.

Options

The Avro Output transformation step features several tabs with fields. Each tab is described below.

Fields tab

Avro Output Fields tab
NoteThe table in the Fields tab defines the following fields that make up the Avro schema created by this step:
FieldDescription
Avro pathThe name of the field as it will appear in the Avro data and schema files.
NameThe name of the PDI field.
Avro typeDefines the Avro data type of the field.
PrecisionApplies only to the Decimal Avro type, the total number of digits in the number. The default is 10.
ScaleApplies only to the Decimal Avro type, the number of digits after the decimal point. The default is 0.
Default valueThe default value of the field if it is null or empty.
NullSpecify if the field can contain null values.
NoteTo avoid a transformation failure, make sure the Default value field contains values for all fields where Null is set to No.
NoteAs shown in the table below, you can click Get Fields to populate the fields from the incoming PDI stream or these fields can be defined manually. During the retrieval of fields, a PDI type is converted to an appropriate Avro type. If desired, you can change the converted field type to another Avro type.
PDI TypeAvro Type
InetAddressString
StringString
TimeStampTimeStamp
BinaryBytes
BigNumberDecimal
BooleanBoolean
DateDate
IntegerLong
NumberDouble

Schema tab

Avro Output Schema tab

The following options in the Schema tab define how the Avro schema file will be created:

OptionDescription
File nameSpecify the fully qualified URL where the Avro schema file will be written. The URL may be in a different format depending on file system type. You can also click Browse to navigate to the schema file on your file system. If a schema file already exists, it will be overwritten. If you do not specify a separate schema file for your output, PDI will write an embedded schema in your Avro data file.
NamespaceSpecify the name, together with the Record name field, that defines the "full name" of the schema (example.avro for example).
Record nameSpecify the name of the Avro record (User for example).
Doc valueSpecify the documentation provided for the schema.

Options tab

Avro Output Step Options tab
OptionDescription
Compression

Specify which of the following codecs is used to compress data blocks in the Avro output file:

  • None: No compression is used (default).
  • Deflate: The data blocks are written using the deflate algorithm as specified in RFC 1951, and typically implemented using the zlib library.
  • Snappy: The data blocks are written using Google's Snappy compression library, and are followed by the 4-byte, big-endian CRC32 checksum of the uncompressed data in each block.

See https://avro.apache.org/docs/1.8.1/spec.html#Object+Container+Files for additional information on these codecs.

Include date in filenameAdd the system date that the file was generated to the output file name with the default format yyyyMMdd (20181231 for example).
Include time in filenameAdd the system time that the file was generated to the output file name with the default format HHmmss (235959 for example).
Specify date time formatAdd a different date time format to the output file name from the options available in the drop-down list.

Metadata injection support

All fields of this step support metadata injection. You can use this step with ETL metadata injection to pass metadata to your transformation at runtime.