Skip to main content

Pentaho+ documentation has moved!

The new product documentation portal is here. Check it out now at docs.hitachivantara.com

 

Hitachi Vantara Lumada and Pentaho Documentation

Hierarchical JSON Input

 

You can use the Hierarchical JSON input step to load JSON data into PDI from a file. You can use filters to load only the desired data. The data can be split on a hierarchical data path using wildcards. You can specify the input file directly in this step or use a list of files from an input field. See Hierarchical data for an overview of hierarchical data in Pentaho.

You can use filters on the input even if you do not use the Split rows across path field, but the filters must be set to the root level of the HDT you want to load. When you use the Split rows across path field you must specify all filter paths rooted at the split path. If you do not use the Split rows across path field a normal HDT extraction path is used. See the Hierarchical data path specifications.

General

 
  • Step name: Specifies the unique name of the Hierarchical JSON input step on the canvas. You can customize the name or leave it as the default.

Options

 

The Hierarchical JSON input step features several tabs with fields. Each tab is described below.

Source tab Hierarchical JSON Input step dialog box showing source tab
Option/Field Description
From file Select to specify the file path and name of the JSON file you want to load into PDI.
File name File path and name of the JSON file to load.
From field Select to use an incoming field as the JSON file path.
Field with file name The incoming field containing the JSON file path.
Output tab Hierarchical JSON Input step Output tab
Field Description
Output field Specify the field name for output column.
Split rows across path Specify the JSON path to be parsed. See Hierarchical data path specifications
NoteThe Split rows across path option is especially useful when loading JSON array objects within large JSON files.
Filters tab Hierarchical JSON Input step filters tab

Use the Path field (Optional) to specify the filters to apply while using the Split rows across path option to fetch the subset of a JSON file. See Hierarchical data path specifications

Examples

 

The following data is example JSON data in a file that you can load into PDI:

{
     "employees": [
           {
                    "name" : "emp_name_1" ,
                    "age" : 35,
                    "addresses" :[
                           {
                                  "country":"Country_1"
                           },
                           {
                                  "country":"Country_2"
                           }
                    ]
           },
           {
                    "name" : "emp_name_2",
                    "age" : 35,
                    "addresses" :[
                           {
                                  "country" :"Country_3"
                            },
                            {
                                  "country" :"Country_4"
                            {
                    ]
           }
     ]
}
Example 1

The following data is extracted from this JSON file when you specify the Split rows across path option as $.employees[*] and do not specify any filters:

Hierarchical JSON Input step example output
Example 2

if you configure the step with an example split path of $.employees[*], and are only interested in the name and age, use the filters of $.name and $.age on the Filters tab. This produces two rows on the stream of the Hierarchical JSON Input step:

Row 1

 {
                    "name" : "emp_name_1" ,
                    "age" : 35
           }

Row 2

{
                    "name" : "emp_name_2",
                    "age" : 35
           }
Example 3

If you wanted a filtered entry in a single HDT row, leave the Split rows across path field blank, and use the filter paths

$.employees[*].name
$.employees[*].age

This will result in a single row with one HDT that does not have the input split as follows:

{
     "employees": [
           {
                    "name" : "emp_name_1" ,
                    "age" : 35
           },
           {
                    "name" : "emp_name_2",
                    "age" : 35
           }
     ]
}