Skip to main content

Pentaho+ documentation has moved!

The new product documentation portal is here. Check it out now at docs.hitachivantara.com

 

Hitachi Vantara Lumada and Pentaho Documentation

Read metadata from Copybook

Parent article

The Read metadata from Copybook step reads a binary fixed-length copybook definition file and outputs the file and column descriptor information as fields to PDI rows. You can then use these rows with the ETL Metadata Injection step to populate the Copybook Input step. Also, you can use this step to create a metadata template for multiple data files or to create a data model for a relational database. See Copybook steps in PDI for more information.

This step is required to use metadata injection with the Copybook Input step.

General

  • Step name: Specify the unique name of the step on the canvas. You can customize the name or leave it as the default.
Read metadata from Copybook step
Schema

These options define the location of the copybook definition file and include mapping options for the binary data files.

OptionDescription
COBOL Copybook file pathSpecify the file path to the copybook definition file. You can enter any VFS or SFTP file path, or click Browse to open the system file browser. After selecting a file, click Validate to verify that the definition file can be accessed and parsed.
COBOL Copybook line structureSpecify the line structure of the definition file:
  • Standard columns (6 to 72)

    Select this option when the definition file contains line numbers. The first 6 columns of text from each line are ignored. Any data beyond column 72 is ignored.

  • Full line

    Select this option when the definition file does not contain line numbers.

Binary format

Use these options to describe the binary format of the selected file:

OptionDescription
Source architectureSelect the machine architecture of the binary data source files. The values are:
  • Big endian (mainframe)

    The most significant byte first and the least significant byte last.

  • Little endian

    The least significant byte first and the most significant byte last.

Source charset nameSelect the character encoding set for the binary data file.

Mainframe EBCDIC is typically encoded using IBM037 or cp1047 character sets. For more information about character sets and their aliases, see Supported Encodings in the Oracle® documentation.

Packed decimal (COMP-3) convention

Select how COMP-3 packed decimals are parsed when reading the binary data at runtime of the Copybook Input step.

  • Strict

    Must follow the IBM S370FPD specification to avoid validation errors. Validation is performed to verify that all nibbles (half-bytes), except the sign nibble, are decimal digits (0-9). This is the default value.

    • For signed packed decimals, the sign nibble must be C (positive) or D (negative).
    • For unsigned packed decimals, the sign nibble must be F.
  • Lenient

    Validation is performed to verify that all nibbles contain decimal digits and the sign nibble contains a hexadecimal value of A-F. The sign nibble is only used to interpret a negative number if the value is D.

  • Lenient - unchecked

    No validation is performed on the source bytes. The sign nibble may contain any hexadecimal value 0-F, and the last nibble is not included in the result. The sign nibble is only used to interpret a negative number if the value is D.

NoteThe selection of these options changes the output field_record_type field. See Example below.
Output

Use this option to include the metadata of the parent group in the definition file.

  • Extract parent groups?: Select this check box if you want to include parent group metadata in the output stream. Clear this check box to exclude parent group metadata from the output stream.

Example

In this example, we are using the accounts.cbl sample copybook definition file available in the design-tools/data-integration/samples/transformations/copybook/redefines_example/accounts.cbl directory.

Sample copybook definition file

The Standard columns (6 to 72) option was selected to match the file format. The Extract parent groups option was selected to include the group information. The following image shows how the data displays in the PDI stream after running the transformation using the sample file.

Step output to PDI stream

The field_kettle_type column displays the data types that are generated to the PDI stream.

Metadata injection support

All fields of this step support metadata injection. You can use this step with ETL metadata injection to pass metadata to your transformation at runtime.