Skip to main content

Pentaho+ documentation has moved!

The new product documentation portal is here. Check it out now at docs.hitachivantara.com

 

Hitachi Vantara Lumada and Pentaho Documentation

Set Up a Carte Cluster

Parent article

If you want to speed the processing of your transformations, consider setting up a Carte cluster. A Carte cluster consists of two or more Carte slave servers and a Carte master server. When you run a transformation, the different parts of it are distributed across Carte slave server nodes for processing, while the Carte master server node tracks the progress.

Carte Cluster Configuration

There are two types of Carte clusters. Static Carte cluster has a fixed schema that specifies one master node and two or more slave nodes. In a static cluster, you specify the nodes in a cluster at design-time, before you run the transformation or job.

Configure a static Carte cluster

Follow the directions below to set up static Carte slave servers:

Procedure

  1. Copy over any required JDBC drivers and PDI plugins from your development instances of PDI to the Carte instances.

  2. Run the Carte script with an IP address, hostname, or domain name of this server, and the port number you want it to be available on.

    ./carte.sh 127.0.0.1 8081
  3. If you will be executing content stored in a Pentaho Repository, copy the repositories.xml file from the .kettle directory on your workstation to the same location on your Carte slave. Without this file, the Carte slave will be unable to connect to the Pentaho Repository to retrieve content.

  4. Ensure that the Carte service is running as intended, accessible from your primary PDI development machines, and that it can run your jobs and transformations.

  5. To start this slave server every time the operating system boots, create a startup or init script to run Carte at boot time with the same options you tested with.

    Pentaho Server Considerations

    NoteAny action done through the Carte server embedded in the Pentaho Server is controlled through the /pentaho/server/pentaho-server/pentaho-solutions/system/kettle/slave-server-config.xml file. To make modifications to slave-server-config.xml, you must stop the Pentaho Server.

Configure a Dynamic Carte Cluster

This procedure is only necessary for dynamic cluster scenarios in which one Carte server will control multiple slave Carte instances.

NoteThe following instructions explain how to create carte-master-config.xml and carte-slave-config.xml files. You can rename these files if you want, but you must specify the content in the files as per the instructions.

Configure a Carte Master Server

Follow the process below to configure the Carte Master Server.

Procedure

  1. Copy over any required JDBC drivers from your development instances of PDI to the Carte instances.

  2. Create a carte-master-config.xml configuration file using the following example as a template:

    <slave_config>
    <!-- on a master server, the slaveserver node contains information about this Carte instance -->
        <slaveserver>
            <name>Master</name>
            <hostname>yourhostname</hostname>
            <port>9001</port>
            <username>cluster</username>
            <password>cluster</password>
            <master>Y</master>
        </slaveserver>
    </slave_config>
    NoteThe <name> of the Master server must be unique among all Carte instances in the cluster.
  3. Run the Carte script with the carte-slave-config.xml parameter. Note that if you placed the carte-slave-config.xml file in a different directory than the Carte script, you will need to add the path to the file to the command.

    ./carte.sh carte-master-config.xml
  4. Ensure that the Carte service is running as intended.

  5. To start this master server every time the operating system boots, create a startup or init script to run Carte at boot time.

Results

You now have a Carte master server to use in a dynamic cluster. Next, configure the Carte slave servers.

Configure Carte Slave Servers

Follow the directions below to set up static Carte slave servers.

Procedure

  1. Follow the process to configure the Carte master server (see above).

  2. Make sure the master server is running.

  3. Copy over any required JDBC drivers from your development instances of PDI to the Carte instances.

  4. In the /pentaho/design-tools/ directory, create a carte-slave-config.xml configuration file using the following example as a template:

    <slave_config>
    <!-- the masters node defines one or more load balancing Carte instances that will manage this slave -->
        <masters>
    		<slaveserver>
    			<name>Master</name>
    			<hostname>yourhostname</hostname>
    			<port>9000</port>
    <!-- uncomment the next line if you want the DI Server to act as the load balancer -->
    <!--	    <webAppName>pentaho</webAppName> -->
    			<username>cluster</username>
    			<password>cluster</password>
    			<master>Y</master>
    		</slaveserver>
    	</masters>
    	<report_to_masters>Y</report_to_masters>
    <!-- the slaveserver node contains information about this Carte slave instance -->
        <slaveserver>
            <name>SlaveOne</name>
            <hostname>yourhostname</hostname>
            <port>9001</port>
            <username>cluster</username>
            <password>cluster</password>
            <master>N</master>
        </slaveserver>
    </slave_config>
    NoteThe slaveserver <name> must be unique among all Carte instances in the cluster.
  5. If you want a slave server to use the same kettle properties as the master server, add the <get_properties_from_master> and <override_existing_properties> tags between the <slaveserver> and </slaveserver> tags for the slave server. Put the name of the master server between the <get_properties_from_master> and </get_properties_from_master> tags. Here is an example.

    <!-- the slaveserver node contains information about this Carte slave instance -->
        <slaveserver>
            <name>SlaveOne</name>
            <hostname>yourhostname</hostname>
            <port>9001</port>
            <username>cluster</username>
            <password>cluster</password>
            <master>N</master>
            <get_properties_from_master>Master</get_properties_from_master>
            <override_existing_properties>Y</override_existing_properties>
        </slaveserver>
  6. Save and close the file.

  7. Run the Carte script with the carte-slave-config.xml parameter. Note that if you placed the carte-slave-config.xml file in a different directory than the Carte script, you will need to add the path to the file to the command.

    ./carte.sh carte-slave-config.xml
  8. If you will be executing content stored in a Pentaho Repository, copy the repositories.xml file from the .kettle directory on your workstation to the same location on your Carte slave. Without this file, the Carte slave will be unable to connect to the Pentaho Repository to retrieve PDI content.

  9. Stop, then start the master and slave servers.

  10. Stop, then start the Pentaho Server.

  11. Ensure that the Carte service is running as intended. If you want to start this slave server every time the operating system boots, create a startup or init script to run Carte at boot time.

Tuning Options

The table below shows the three configurable settings for schedule and remote execution logging in the slave-server-config.xml file.

NoteTo make modifications to slave-server-config.xml, you must stop the Pentaho Server.
PropertyValuesDescription
max_log_linesAny value of 0 (zero) or greater. 0 indicates that there is no limit.Truncates the execution log when it goes beyond this many lines.
max_log_timeout_minutesAny value of 0 (zero) or greater. 0 indicates that there is no timeout.Removes lines from each log entry if it is older than this many minutes.
object_timeout_minutesAny value of 0 (zero) or greater. 0 indicates that there is no timeout.Removes entries from the list if they are older than this many minutes.

The following code block is an example of the slave-server-config.xml file:

<slave_config>
  <max_log_lines>0</max_log_lines>
  <max_log_timeout_minutes>0</max_log_timeout_minutes>
  <object_timeout_minutes>0</object_timeout_minutes>
</slave_config>

Configuring Carte Servers for SSL

Carte SSL uses the JKS format for keystores, which is the default format created by the keytool command-line utility. It is a best practice to locate the keystore file in a directory that has restricted access. Carte runs on a Jetty server. For more information on how to use SSL certificates in the Jetty server, read https://wiki.eclipse.org/Jetty/Howto/Configure_SSL.

To configure Carte servers to use SSL, complete these steps:

Procedure

  1. Stop the Carte server if it is running.

  2. Open the carte-master-config.xml configuration file.

  3. Add the keyStore, keyStorePassword and optionally, the keyPassword values between <sslConfig> </sslConfig> tags in the master server configuration section. If you do not include the keyStore and keyStorePassword values in the file, Carte will not start. Here is an example of how to add the values. Adjust the values to match your environment.

    NoteYou can use the encr tool, which is in the data-integration directory to generate obfuscated passwords. To use the tool, open a command prompt or shell tool and type encr.bat -carte <password>. (Use encr.sh if you are using Linux.) You can then paste the obfuscated value into the file instead of the clear-text password.
    <slave_config>
    <!-- on a master server, the slaveserver node contains information about this Carte instance -->
        <slaveserver>
            <name>Master</name>
            <hostname>yourhostname</hostname>
            <port>9001</port>
            <username>cluster</username>
            <password>cluster</password>
            <master>Y</master>
            <sslConfig/>
                <keyStore>D:\KEY_STORE\Pentaho</keyStore>
                <keyStorePassword>OBF:1x8g1toc1u301z0f1u2a1toi1x8e</keyStorePassword>
                <keyPassword>OBF:1iun1i9a1lfk1w261w1c1lby1i6o1irz</keyPassword>
            </sslConfig>
        </slaveserver>
    </slave_config>
    ParameterDescriptionRequired
    keyStorePath to the keystore file.Yes
    keyStorePasswordPassword for the keystore.Yes
    keyPasswordPassword for the key. If the keyStorePassword and keyPassword are the same, omit the keyPassword parameter from file.No
  4. Save and close the carte-master-config.xml file.

  5. Open the carte-slave-config.xml file for the slave servers and add the same values.

  6. When finished save and close the carte-slave-config.xml file.

  7. Start the Carte server.

    A message like the following appears in the console.
    2015/02/17 11:23:54 - Carte - Using SSL mode.
  8. To access Carte, type the following in a browser, substituting <host> and <port> for valid values that are in your environment:

    https://<host>:<port>/

Change Jetty Server Parameters

Carte runs on a Jetty server. You do not need to do anything to configure the Jetty server for Carte to work. But if you want to make changes to the default connection parameters, complete the steps in one of the subsections that follow.

Jetty Server ParametersDefinition
acceptorsThe number of thread dedicated to accepting incoming connections. The number of acceptors should be below or equal to the number of CPUs.
acceptQueueSizeNumber of connection requests that can be queued up before the operating system starts to send rejections.
lowResourcesMaxIdleTimeThis allows the server to rapidly close idle connections in order to gracefully handle high load situations.
NoteIf you want to learn more about these options, check out the Jetty documentation here: http://wiki.eclipse.org/Jetty/Howto/Configure_Connectors#Configuration_Options. For more information about a high load setup read this article: https://wiki.eclipse.org/Jetty/Howto/High_Load.

In the Carte Configuration file

To change the Jetty server parameters in the carte-slave-config.xml file, complete these steps.

Procedure

  1. In the /pentaho/design-tools/ directory, open the carte-slave-config.xml and add these lines between the <slave_config> </slave_config> tags.

    <slave_config>
    ...
        <!-- Carte uses an embedded jetty server. Include this next section only if you want to change the default jetty configuration options.-->
        <jetty_options>
            <acceptors>2</acceptors>
            <acceptQueueSize>2</acceptQueueSize>
            <lowResourcesMaxIdleTime>2</lowResourcesMaxIdleTime>
        </jetty_options>
    </slave_config>
  2. Adjust the values for the parameters as necessary, then save and close the file.

In the Kettle Configuration file

To change the Jetty server parameters in the kettle.properties file, configure the following parameters to the numeric value you want. See Set Kettle Variables if you need more information on how to do this.

Kettle Variable in kettle.propertiesJetty Server Parameter
KETTLE_CARTE_JETTY_ACCEPTORSacceptors
KETTLE_CARTE_JETTY_ACCEPT_QUEUE_SIZEacceptQueueSize
KETTLE_CARTE_JETTY_RES_MAX_IDLE_TIMElowResourcesMaxIdleTime

Initialize Slave Servers

Follow the instructions below to configure PDI to work with Carte slave servers.

Procedure

  1. Open a transformation.

  2. In the Explorer View in the PDI client(Spoon), select the Slave tab.

  3. Select the New button.

    The Slave Server dialog window appears.
  4. In the Slave Server dialog window, enter the appropriate connection information for the Pentaho (or Carte) slave server.

    OptionDescription
    Server nameThe name of the slave server.
    Hostname or IP addressThe address of the device to be used as a slave.
    Port (empty is port 80)Defines the port you are for communicating with the remote server. If you leave the port blank, 80 is used.
    Web App Name (required for Pentaho Server)Leave this blank if you are setting up a Carte server. This field is used for connecting to the Pentaho server.
    User nameEnter the user name for accessing the remote server.
    PasswordEnter the password for accessing the remote server.
    Is the masterEnables this server as the master server in any clustered executions of the transformation.
    NoteWhen executing a transformation or job in a clustered environment, you should have one server set up as the master and all remaining servers in the cluster as slaves.

    Below are the proxy tab options:

    OptionDescription
    Proxy server hostnameSets the host name for the proxy server you are using.
    The proxy server portSets the port number used for communicating with the proxy.
    Ignore proxy for hosts: regexp | separatedSpecify the server(s) for which the proxy should not be active. This option supports specifying multiple servers using regular expressions. You can also add multiple servers and expressions separated by the ' | ' character.
  5. Click OK to exit the dialog box. Notice that a plus sign (+) appears next to Slave Server in the Explorer View.

Create a cluster schema

Clustering allows transformations and transformation steps to be executed in parallel on more than one Carte server. The clustering schema defines which slave servers you want to assign to the cluster and a variety of clustered execution options.

Begin by selecting the Kettle cluster schemas node in the PDI client Explorer View. Right-click and select New to open the Clustering Schema dialog box.

OptionDescription
Schema nameThe name of the clustering schema
Port

Specify the port from which to start numbering ports for the slave servers. Each additional clustered step executing on a slave server will consume an additional port.

To avoid networking problems, make sure no other networking protocols are in the same range .

Sockets buffer sizeThe internal buffer size to use
Sockets flush interval rowsThe number of rows after which the internal buffer is sent completely over the network and emptied.
Sockets data compressed?When enabled, all data is compressed using the Gzip compression algorithm to minimize network traffic
Dynamic clusterIf checked, a master Carte server will perform failover operations, and you must define the master as a slave server in the field below. If unchecked, the PDI client will act as the master server, and you must define the available Carte slaves in the field below.
Slave ServersA list of the servers to be used in the cluster. You must have one master server and any number of slave servers. To add servers to the cluster, click Select slave servers to select from the list of available slave servers.

Run transformations in a cluster

  • To run a transformation on a cluster, access the Run Options window. You can access the Run Options window through the context menu next to the Run icon in the toolbar or by pressing F8. In the Run Options window, specify your run configuration that controls running the transformation in a clustered environment. To set up run configurations, see Run configurations.
  • To run a clustered transformation via a job, access the Transformation job entry details screen and select the Advanced tab, then select Run this transformation in a clustered mode?.
  • To assign a cluster to an individual transformation step, right-click on the step and select Clusters from the context menu. That option brings up the cluster schema list. Select a schema, then click OK.
  • When running transformations in a clustered environment, you have option to Show transformations, which displays the generated (converted) transformations that will be executed on the cluster.