ElasticSearch Bulk Insert
Elastic is a platform that consists of products that search, analyze, and visualize data. The Elastic platform includes ElasticSearch, which is a Lucene-based, multi-tenant capable, and distributed search and analytics engine. The ElasticSearch Bulk Insert step sends one or more batches of records to an ElasticSearch server for indexing. Because you can specify the size of a batch, you can use this step to send one, a few, or many records to ElasticSearch for indexing.
Use this step if you have records that you want to submit to an ElasticSearch server to be indexed. When record data flows out of the ElasticSearch Bulk Insert step, PDI sends it to ElasticSearch along with metadata that you indicate such as the index and type. This step is commonly used when you want to send a batch of data to an ElasticSearch server and create new indexes of a certain type (category). It is also used when you want to add a batch of data to an index or category.
Because this is an output step, it is often placed at the end of the transformation.
Before you begin
You need the following items:
- A working server that has ElasticSearch version 6.4.2 already installed. You should be able to connect to ElasticSearch from the computer that you are running PDI on.
- Insert, Update, and Create privileges for the directories on the ElasticSearch server that you need to access.
- Files or data you want ElasticSearch to index.
Enter the following information in the transformation step field.
- Step Name: Specifies the unique name of the ElasticSearch Bulk Insert step on the canvas. You can customize the name or leave it as the default.
This step consists of four tabs: General, Servers, Fields, and Settings.
|Index||Specifies the name of the index you want to add data to. If an index with that name doesn't yet exist in ElasticSearch, it creates one.|
|Type||Indicates the category the data should be placed in. You define the category. In general practice, the type sometimes describes the data. For example, if the index is "twitter" the type might be tweet.|
|Test Index||Checks whether the index exists in ElasticSearch.|
|Batch Size||Indicates the number of items in the batch. (If you set the batch size is set to one, it is not a bulk insert, but setting it to a higher number is.)|
|Stop on Error||Stops processing if there is an error, such as a problem with adding the document or the bulk push to the index or if the JSON is not well-formed. If this option is not selected, and an error occurs, the row is not processed, but the transformation keeps running so that other rows are processed.|
|Batch Timeout||Indicates how long batch should be processed before the batch times out, and processing ends.|
|ID Field||Indicates the name of the ID Field in the file.|
|Overwrite if exists||If the output file exists because this transformation was run before, allows the output to be overwritten.|
|Output Rows||Sends the rows that are successfully processed by ElasticSearch to the to the next step (or the output). If you've checked Stop on Error, the rows that were successful up until the time the error occurs is sent to the next step (or the output). Otherwise, rows successfully processed by Elastic search rows are sent to the next step (or the output).|
|ID Output Field||Indicates the name if the ID field that is in the output. If this is left blank, the value in the ID Field is used instead.|
|JSON Input||Indicates whether the input is a JSON file.|
|JSON Field||Indicates the JSON node from which processing should begin.|
|#||Number of the server entry.|
|Address||IP address of the server you want to connect to.|
|Port||Port number for the server you want to connect to.|
|Test Connection||Verifies that the connection can be made to the servers listed in this tab.|
|#||Number of the fields entry.|
|Name||Name from the input.|
|Target Name||Output field name.|
|Get Fields||Retrieves the fields from the input.|
|#||Number of the settings entry.|
|Setting||Name of the batch.|
|Value||Value for the batch.|
Elastic, which is the company that makes ElasticSearch, has an API as well as user documentation that can give you more background on the fields in this step.
- ElasticSearch reference information can be found here: https://www.elastic.co/guide/en/ElasticSearch/reference/current/index.html.
- The Bulk API is here: https://www.elastic.co/guide/en/ElasticSearch/reference/current/docs-bulk.html.