Skip to main content

Pentaho+ documentation is moving!

The new product documentation portal is here. Check it out now at docs.hitachivantara.com

 

Hitachi Vantara Lumada and Pentaho Documentation

ETL metadata injection

Parent article

The ETL Metadata Injection step inserts data from multiple sources into another transformation at runtime. This insertion reduces the need to call repetitive tasks each time a different input source is used.

In PDI, you can create a transformation to use as a template for your repetitive tasks. This transformation is known as the template transformation. The template transformation is a child transformation that is reused by the ETL Metadata Injection step with the metadata created from various input sources. You will create another transformation to prepare what common values you want to use as metadata and inject these selected values through the ETL Metadata Injection step into your template transformation, as shown in the following diagram:

ETL Metadata Injection Process

For example, you might have a simple transformation to load transaction data values from a supplier, filter specific values, and output them to a file. If you have more than one supplier, you would need to run this simple transformation for each supplier. Yet, with metadata injection, you can expand this simple repetitive transformation by inserting metadata from another transformation that contains the ETL Metadata Injection step. The ETL Metadata Injection step coordinates the data values from the various inputs through the metadata you define. This process reduces the need for you to adjust and run the repetitive transformation for each specific input. See the Example section for more details on the example.

The following basic procedure is recommended for using this step to inject metadata:

  1. Optimize your data for injection, such as preparing folder structures and inputs.
  2. Develop transformations for the following task:
    • The repetitive process (the template transformation)
    • Metadata injection through the ETL Metadata Injection step
    • Handling of multiple inputs (as needed)

The metadata is injected into the template transformation through any step that supports metadata injection. See Steps supporting metadata injection for which steps support metadata injection.

General

Enter the following information in the transformation step fields:

FieldDescription
Step NameSpecifies the unique name of the ETL Metadata Injection step on the canvas. You can customize the name or leave it as the default.
Transformation

Specify the transformation you want to use as a template for your repetitive tasks by entering in its path. Click Browse to navigate to your template transformation in the VFS browser.

If you select a transformation that has the same root path as the current transformation, the variable ${Internal.Entry.Current.Directory} will automatically be inserted in place of the common root path. For example, if the current transformation's path is /home/admin/transformation.ktr and you select a transformation in the folder /home/admin/path/sub.ktr than the path will automatically be converted to ${Internal.Entry.Current.Directory}/path/sub.ktr.

If you are working with a repository, specify the name of the template transformation in your repository. If you are not working with a repository, specify the XML file name of the template transformation on your system.

Template transformations previously specified by reference are automatically converted to be specified by name within the Pentaho Repository.

Options

The ETL Metadata Injection step features two tabs with fields. Each tab is described below.

Inject Metadata tab

ETL Metadata Injection Step Metadata Tab

The columns of the table in this tab specify what fields in the template transformation are injected with metadata. The following table describes these columns:

ColumnDescription
Target injection step keyThe available fields in each step of the template transformation that can be injected with metadata.
Target descriptionHow the target fields relate to their target steps.
Source stepThe step associated with the fields to be injected into the target fields as metadata.
Source fieldThe fields to be injected into the target fields as metadata.

Specify the source field

To specify the source field as metadata to be injected, perform the following steps:

Procedure

  1. In the Target injection step key column, double-click the field for which you want to specify a source field.

    The Source field dialog box opens.
  2. Select a source field and click OK.

  3. (Optional) Select Use constant value to specify a constant value for the injected metadata through one of the following actions:

    • Manually entering a value.
    • Using an internal variable to set the value: ${Internal.Step.Unique.Count}, for example.
    • Using a combination of manually specified values and parameter values: ${FILE_PREFIX}_${FILE_DATE}.txt, for example.

Injecting metadata into the ETL Metadata Injection step

For injecting metadata into the ETL Metadata Injection step itself, the following exceptions apply:

  • To inject a method for how to specify a field (such as by FILENAME, REPOSITORY_BY_NAME, or REPOSITORY_BY_REFERENCE), set a TRANS_SPECIFICATION_METHOD constant to the field of an input step. You can then map the field as a source to the TRANS_SPECIFICATION_METHOD constant in the ETL Metadata Injection step.
  • The target field for the ETL Metadata Injection step inserting the metadata into the original injection is defined by [GROUP NAME].[FIELD NAME]. For example, if the GROUP NAME is 'OUTPUT_FIELDS' and the FIELD NAME is 'OUTPUT_FIELDNAME', you would set the target field to 'OUTPUT_FIELDS.OUTPUT_FIELDNAME'.

Options tab

ETL Metadata Injection Step Options Tab

Enter the following optional settings:

OptionDescription
Step to read from (optional)(Optional) Select a step in your template transformation to pass data directly to a step following the ETL Metadata Injection step in your current transformation.
Field nameIf Step to read from is selected, enter the name of the field passed directly from the step in the template transformation.
TypeIf Step to read from is selected, select the type of the field passed directly from the step in the template transformation.
LengthIf Step to read from is selected, enter the length of the field passed directly from the step in the template transformation.
PrecisionIf Step to read from is selected, enter the precision of the field passed directly from the step in the template transformation.
Optional target file (KTR after injection)For initial transformation development or debugging, specify an optional file for creating and saving a transformation of your template after metadata injection occurs. The resulting transformation will be your template transformation with the metadata already injected as constant values.
Streaming source stepSelect a source step in your current transformation to directly pass data to the Streaming target step in the template transformation.
Streaming target stepSelect the target step in your template transformation to receive data directly from the Streaming source step.
Run resulting transformationSelect to inject metadata and run the template transformation. If this option is not selected, metadata injection occurs, but the template transformation does not run.

Example

In this example, you have a template transformation to load transaction data values from a supplier’s spreadsheet, filter specific values to examine, and output them to a text file. The template transformation is injected with metadata values stored in Microsoft spreadsheets.

The example is in the pentaho/design-tools/data-integration/samples/transformations/metadata-injection-example folder of your PDI distribution. The folder contains the following structure:

Metadata injection example folder strucutre

Microsoft spreadsheets containing input data are stored in the metadata-injection-example/data/in folder. Metadata values are stored in spreadsheets within the metadata-injection-example/metadata folder. The template and the transformation for injecting the metadata are in the metadata-injection-example/transformations folder.

NoteThis example assumes a basic understanding of working with transformations and steps.

Input data

Data files are frequently uploaded from multiple sources. This example models a situation where two suppliers have uploaded spreadsheets into the metadata-injection-example/data/in folder.

When using metadata injection, you usually want to focus on a subset of data values common to all your input files. Metadata for the following values are used in this example:

  • Transaction date
  • Transaction invoice number
  • Net value of the transaction
  • Currency used in the transaction

The metadata for these values and the output target text file are created and maintained in the metadata-injection-example/metadata folder.

Transformations

Metadata injection involves a main repetitive process. For this example, the 03_process_supplier_file transformation in the metadata-injection-example/transformation folder is the template transformation, which is applied to each supplier’s file. The 02_process_supplier transformation, which contains the ETL Metadata Injection step, injects metadata into the repetitive template transformation (03_process_supplier_file). Since this example pertains to the insertion of data from multiple files, the 02_process_supplier transformation is called from another transformation (01_process_all_suppliers) per each supplier file.

This example contains the following three transformations:

  • Transformation for all input sources (01_process_all_suppliers)

    The transformation going through all the suppliers’ spreadsheets, calling the metadata injection transformation per each supplier, and logging the entire process (for possible troubleshooting, if needed). Each input source is specified through a variable in a Transformation Executor step, which calls the 02_process_supplier transformation.

  • Metadata injection transformation (02_process_supplier)

    The transformation defining the structure of the metadata and how it is injected into the main transformation. For this example, the metadata values are in separate spreadsheet files. This transformation extracts these values, prepares them for the injection, and then inserts them into the template transformation through the ETL Metadata Injection step.

  • Template transformation (03_process_supplier_file)

    The main repetitive transformation for processing the data per each supplier’s spreadsheet. The settings for each step in this transformation pertain to metadata injection, instead of data values of a single specific source. For example, the supplier field is a variable that depends on which supplier’s data is being accessed at that time.

Results

You run the entire process for all the supplier file by running the 01_process_all_suppliers transformation, which calls the 02_process_supplier transformation for each supplier input file. The 02_process_supplier transformation then calls the template 03_process_supplier_file transformation through the ETL metadata injection step.

Perform the following steps to run this example:

Procedure

  1. In PDI, open the 01_process_all_suppliers.ktr file found in the metadata-injection-example/transformation folder.

  2. Run the 01_process_all_suppliers transformation.

  3. Examine the processed_data_{today’s date}.txt file in the metadata-injection-example/data/out folder and the log_{timestamp}.txt file in the metadata-injection-example/logging folder.

Results

These transformations create a single source text output file in the metadata-injection-example/data/out folder. The logs generated by the 01_process_all_suppliers transformation are in the metadata-injection-example/logging folder.

The output text file contains values from the input files for the following common values:

  • Transaction date
  • Transaction invoice number
  • Net value of the transaction
  • Currency used in the transaction

Reference links

Below are links to articles and videos about using the ETL Metadata Injection step in PDI.

Articles

The following articles provide more information about the ETL Metadata Injection step.

Video

The following video provides more information about the ETL Metadata Injection step.

https://youtu.be/EjzgzOanq1o