Using the Text File Output step on the Spark engine
You can set up the Text file output step to run on the Spark engine. Spark processes null values differently than the Pentaho engine, so you may need to adjust your transformation to process null values following Spark's processing rules.
General
Enter the following information in the transformation step name field:
- Step name: Specifies the unique name of the Text file output transformation step on the canvas. You can customize the name or leave it as the default.
Options
The Text file output step features several tabs. Each tab is described below.
File tab
On the File tab, you can define basic properties about the file being created by this step.
Option | Description |
Filename |
Specify the file name and location of the output file. Click Browse to display the Open File window and navigate to the file or folder. For the supported file system types, see Connecting to Virtual File Systems. Spark attempts to create a new directory structure based on the name entered. The new directory structure contains subdirectories and CSV files corresponding to the processing results. For example: /_SUCCESS, PART-00000-.CSV, PART-00001-.CSV, and so forth. CautionIf a directory with the name
specified in this field already exists, Spark will not overwrite it and the
transformation aborts. To process, you must specify a different directory name
OR delete the existing directory structure, including all the subdirectories and
files contained within it.
If you are using this step to write data to Amazon Simple Storage Service (S3), specify the URI of the S3 system. Both S3 and S3n are supported. When the date and time are to be appended, do not include a file extension. |
Pass output to servlet | This field is either not used by the Spark engine or not implemented for Spark on AEL. |
Create Parent folder | This option is ignored by the Spark engine. Spark attempts to create a new directory structure based on how it processes the Filename. |
Do not create file at start | This field is either not used by the Spark engine or not implemented for Spark on AEL. |
Accept file name from field? | This field is either not used by the Spark engine or not implemented for Spark on AEL. |
File name field | This field is either not used by the Spark engine or not implemented for Spark on AEL. |
Extension | This field is either not used by the Spark engine or not implemented for Spark on AEL. |
Include stepnr in filename? | This field is either not used by the Spark engine or not implemented for Spark on AEL. |
Include partition nr in filename? | This field is either not used by the Spark engine or not implemented for Spark on AEL. |
Include date in filename? | Select to include the system date in the file name (_20181231, for example). |
Include time in filename? | Select to include the system time in the file name (_235959, for example). |
Specify Date time format | Select to include the date time in the file name using a format from the Date time format drop-down list. |
Date time format | Select the date time format. |
Add filenames to result | This field is either not used by the Spark engine or not implemented for Spark on AEL. |
Click Show filename(s) to display a list of the files that will be generated. This is a simulation and depends on the number of rows that go into each file.
Content tab
The Content tab allows you to include the following options for the output text file.
Option | Description |
Append | Select to append lines to the end of the file. |
Separator | Specify the character used to separate the fields in a single line of text, typically a semicolon or tab. Click Insert Tab to place a tab in the Separator field. The default value is semicolon (;). |
Enclosure | Specify to enclose fields with a pair of specified strings. It allows for separator characters in fields. This setting is optional and can be left blank. The default value is double quotes (") |
Force the enclosure around fields? | Specify to force all field names to be enclosed with the character specified in the Enclosure option. |
Disable the enclosure fix? | This field is either not used by the Spark engine or not implemented for Spark on AEL. |
Header | Clear to indicate that the first line in the output file is not a header row. |
Footer | This field is either not used by the Spark engine or not implemented for Spark on AEL. |
Format | Select LF UNIX. |
Compression | This field is either not used by the Spark engine or not implemented for Spark on AEL. |
Encoding | This field is either not used by the Spark engine or not implemented for Spark on AEL. |
Right pad fields | This option is supported on the Spark engine when you select the Minimal width button on the Fields tab. |
Fast data dump (no formatting) | On the Spark engine, select this option if for fixed length file types. |
Split every ... rows | This field is either not used by the Spark engine or not implemented for Spark on AEL. |
Add Ending line of file | This field is either not used by the Spark engine or not implemented for Spark on AEL. |
Fields tab
The Fields tab allows you to define the properties for the fields being exported.
Column | Description |
Name | Specify the name of the field. |
Type | Select the field's data type from the dropdown list or enter it manually. |
Format | Select the format mask (number type) from the dropdown list or enter it manually. See Common Formats for information on common valid date and numeric formats you can use in this step. |
Length |
Specify the length of the field, according to the following field types:
|
Precision | Specify the number of floating point digits for number-type fields. |
Currency | Specify the symbol used to represent currencies (for example, $ or €). |
Decimal | Specify the symbol used to represent a decimal point, either a period (.) as in 10,000.00 or it can be a comma (,) as in 5.000,00. |
Group | Specify the method used to separate units of thousands in numbers of four digits or larger, either a comma (,) as in 10,000.00 or (.) as in 5.000,00. |
Trim type | Select the trimming method (none, left, right, or both) to apply to a string, which truncates the field before processing. Trimming only works when no field length is specified. |
Null | Specify the string to insert into the output text file when the value of the field is null. |
Get Fields (button) | Click to retrieve a list of fields from the input stream. |
Minimal width (button) | Click to minimize the field length by removing unnecessary characters. If selected, string fields will no longer be padded to their specified length. |
See also
If you want to use a text file output to run a command, script, or database bulk loader, see the Text File Output (Legacy) transformation step.
Metadata injection support
All fields of this step support metadata injection. You can use this step with ETL metadata injection to pass metadata to your transformation at runtime.