The ORC Input step reads the fields data from an Apache ORC (Optimized Row Columnar) file into the PDI data stream.
Before using the ORC Input step, you must configure a named connection for your distribution, even if you set your Location to Local. For information on named connections, see Set up the Pentaho Server to connect to a Hadoop cluster.
Enter the following information in the ORC Input step fields:
|Step name||Specify the unique name of the ORC Input step on the canvas. You can customize the name or use the provided default.|
|Folder/File Name||Specify the fully qualified URL of the source file or folder name for the input fields. Click Browse to display the Open File window and navigate to the file or folder. For the supported file system types, see Connecting to Virtual File Systems. The Pentaho engine reads a single ORC file as input.|
The Fields section contains the following items:
- A Pass through fields from the previous step option that allows you to read the fields from the input file without redefining any of the fields.
- A table defining data about the columns to read from the ORC file.
The table in the Fields section defines the fields to read as input from the ORC file, the associated PDI field name, and the data type of the field. Enter the information for the ORC Input step fields as shown in the following table:
|ORC path (ORC type)||Specify the name of the field as it will appear in the ORC data file or files, and the ORC data type.|
|Name||Specify the name of the input field.|
|Type||Specify the data type of the input field.|
|Format||Specify the date format when the Type specified is Date.|
You can define the fields manually, or you can provide a path to an ORC data file and click Get Fields to populate all the fields. When the fields are retrieved, the ORC type is converted into an appropriate PDI type. You can preview the data in the ORC file by clicking Preview. You can change the PDI type by using the Type drop-down or by entering the type manually.
The ORC to PDI data type values are shown in the table below:
|ORC Type||PDI Type|