Pentaho Worker Nodes system best practices
There are several hardware, networking, and operating system recommendations for running the Pentaho Worker Nodes Product on one or more instances.
Resource best practices
This section provides basic resource requirements and best practices for running Pentaho Worker Nodes on one or more instances. You can scale your own worker nodes environments based on your work item load.
This guideline uses the following definitions:
Minimum guidelines
The minimum amounts of RAM, CPU, and available disk space required to run a system instance.
Best practice guidelines
The best practice amounts of RAM, CPU, and available disk space for system instances.
A system in which all instances meet or exceed these best practices can index more documents and can process documents faster than a system in which the instances meet only the minimum amounts.
Resource | Minimum guidelines | Best practice guidelines |
RAM | 16 GB | 32 GB |
CPU | 4 cores | 8 cores |
Available disk space | 50 GB | 500 GB |
Single-instance systems versus multi-instance systems
A system can be a single instance or it can have multiple instances of four or more nodes. Each instance must meet the minimum RAM, CPU, and disk space requirements.
Single instance system
A single-instance system is useful for testing and demonstration purposes. It requires only a single server and can perform all product functionality. However, a single-instance system has the following drawbacks:
- It has a single point of failure. If the instance hardware fails, you lose access to the system.
- With no additional instances, you cannot choose where to run services. All services run on that one instance.
Multiple instance system
A multi-instance system is a best practice for use in a production environment because it offers the following advantages:
- You can control how services are distributed across the multiple instances, providing improved service redundancy, scale-out, and availability.
- A multi-instance system can survive instance outages. For example, with a four-instance system running the default distribution of services, the system can lose one instance and remain available.
- Performance is improved since work is performed in parallel across instances.
- You can add additional instances to the system at any time.
You cannot convert a single-instance system to a production-ready multi-instance system by adding new instances since the system does not support adding additional master instances. Master instances are special instances that run a particular set of system services. Single instance systems have one master instance. Multi-instance systems have a minimum of three master servers.
By adding additional instances to a single-instance system, your system still has only one master instance, meaning there is a single point of failure for the essential services that only a master instance can run.
A multi-instance system should have a minimum of three master servers. A non-master or worker node can be added to a multi-instance if the minimum of three is the starting point.
For information on adding instances to an existing system, see the Administrator Help, which is available from the Administration App.
Size of cluster
The number of nodes or masters in the cluster corresponds to the amount of fault tolerance you want to build into your system.
In a multi-node cluster, Zookeeper maintains a quorum of nodes: a minimum
number of nodes required for the cluster to function optimally. ZooKeeper defines the quorum
as ceil(N/2)
where N is number of
masters. Example clusters using this ceiling function are as follows:
- For 2 masters, quorum size is 2
- For 3 masters, quorum size is 2
- For 5 masters, quorum size is 3
Docker and operating system requirements
To be a system instance, each server or virtual machine you provide must meet the following requirements:
- Must have Docker version 1.13.1 or later installed.
- Must run a 64-bit Linux distribution.
For more information about the Docker versions suggested by various operating systems, refer to the Administrator Help, which is available from the Administration App.
Networking
The following sections describe the network usage and requirements for both system instances and services. When networking, do the following:
- You must configure the network settings for each service when you install the system. You cannot change these settings after the system is up and running.
- If your networking environment changes after you deploy the system, such that the system can no longer function with its current networking configuration, you need to reinstall the system. For more information about networking, see the installation guide included with your installation.
For more information about adding network security, see Enabling secure communication for Pentaho Worker Nodes.
Instance IP address requirements
All instance IP addresses must be static, including both internal and external network IP addresses, if applicable to your system.
If the IP address of any instance changes, see the installation guide included with your installation.
Network types
Each system service can bind to one type of network, either internal or external, for receiving incoming traffic. If your network infrastructure supports having two networks, you may want to isolate the traffic for most system services to a secured internal network that has limited access.
You can use either a single network type for all services or a mix of both types. If you want to use both types, every instance in your system must be addressable by two IP addresses: one on your internal network and one on your external network. If you use only one network type, each instance needs only one IP address.
Allowing access to external resources
Regardless of whether you are using a single network type or a mix of types, you need to configure your network environment to verify that all instances have outgoing access to the external resources you want to use, including:
- The data sources where your data is stored.
- Identity providers for user authentication.
- Email servers that you want to use for sending email notifications.
Ports
Each service binds to a number of ports for receiving incoming traffic.
Before installing the system, you can configure the services to use different ports, or use the default values shown below.
External ports
The following table contains information about the service ports that users use to interact with the system. On every instance in the system, each of these ports must be accessible from:
- Any network that requires administrative or search access to the system.
- Every other instance in the system.
Default Port Value | Service | Purpose |
8000 | Admin-App | Access to administrative interfaces:
|
38080 | Content Execution Router | Entry point to Worker Node (non-secure setup) |
38443 | Content Execution Router | Entry point to Worker Node (non-secure setup) |
If you are enabling security, you need to indicate a port value for secure communication. See Enabling secure communication for Pentaho Worker Nodes for more information.
Internal ports
Determine which ports each system service should use. You can use the default ports for each service or specify different ones. In either case, these restrictions apply:
- Every port must be accessible from all instances in the system.
- Some ports must be accessible from outside the system.
- All port values must be unique; no two services can share the same port.
- For information on port usage and requirements for each service, see Ports.
You can find more information on how these ports are used in the documentation for the third-party software underlying each service.
Next step: Installing Pentaho Worker Nodes
You can now begin the setup steps for Installing the Pentaho Worker Nodes product.
Install and set up Pentaho Worker Nodes
Complete the instructions in the following articles to set up Pentaho Worker Nodes:
Run and administer the Pentaho Worker Nodes product
Once you have Pentaho Worker Nodes set up and configured, use the following articles to learn how to run and administer work items: