Parquet Input
The Parquet Input step decodes Parquet data formats and extracts fields using the schema defined in the Parquet source files. The Parquet Input and the Parquet Output transformation steps gather data from various sources and move that data into the Hadoop ecosystem in the Parquet format.
Before using the Parquet Input step, you must configure a named connection for your distribution, even if your Location is set to Local. For information named connections, see Connecting to a Hadoop cluster with the PDI client.
Select an Engine
You can run the Parquet Input step on the Pentaho engine or on the Spark engine. Depending on your selected engine, the transformation runs differently. Select one of the following options to view how to set up the Parquet Input step for your selected engine.
- Using Parquet Input on the Pentaho engine: Learn how to set up this step when using the Pentaho engine.
- Using Parquet Input on the Spark engine: Learn how to set up this step when using the Spark engine.
For instructions on selecting an engine from your transformation, see Run configurations.