Skip to main content

Pentaho+ documentation has moved!

The new product documentation portal is here. Check it out now at docs.hitachivantara.com

 

Hitachi Vantara Lumada and Pentaho Documentation

Kafka Offset

 

You can use the Kafka Offset job entry to reset the offset value of all partitions in a topic based upon a given timestamp or epoch value.

General

 

The following field is general to this job entry:

  • Job Name: Specify the unique name of the Kafka Offset entry on the canvas. You can customize the name or leave it as the default.

Options

 

The Kafka Offset entry includes three tabs for the step. Each tab is described below.

Setup tab

 
Setup tab for Kafka Offset job

In this tab, define the connections used for receiving messages, topics to which you want to subscribe, and the consumer group for the topics.

Option Description
Bootstrap servers

Specify the Bootstrap servers from which you want to receive the Kafka streaming data.

Topics Enter the name of each Kafka topic from which you want to consume streaming data (messages). You must include all topics that you want to consume.
Consumer group

Enter the name of the group of which you want this consumer to be a member. Each Kafka Consumer step will start a single thread for consuming.

When part of a consumer group, each consumer is assigned a subset of the partitions from topics it has subscribed to, which locks those partitions. Each instance of a Kafka Consumer step will only run a single consumer thread.

Options tab

 
Options tab for Kafka Offset job

Use this tab to send the parameters to the broker when connecting. The parameter values can be encrypted. A few of the most common options are included for your convenience. You can enter any desired Kafka property. For further information on these input names, see the Apache Kafka documentation site: https://kafka.apache.org/documentation/.

Offset Settings tab

 

To reset the offset value, set the start date in the Offset timestamp field and select the format in the Timestamp format field.

Offset settings tab for Kafka Offset job
Option Description
Offset timestamp

Specifies the start date timestamp to use as the offset. The timestamp can be a past or future time. When this job is run, the offset value in each partition will be reset to timestamp given.

Timestamp format

Specifies the timestamps format for the offset.

When epoch value is given, the timestamp format is not needed.

Examples

 

The kafka_job_abort.kjb sample job in the design-tools/data-integration/plugins/kafka-job-plugins-zip-9.5.1.0-110/kafka-offset-job/samples/transformations directory demonstrates the capabilities of this job using the Abort step.Kafka Offset example kafka_job_abort

The kafka_job.kjb sample job in the design-tools/data-integration/plugins/pentaho-streaming-kafka-plugin-zip-9.5.1.0-110/pentaho-streaming-kafka-plugin/samples/transformations directory demonstrates in the kafka_job.kjb job with both successes and failures. Comments in the transformations explain how the steps are used.

Kafka offset example kafka_job.kjb