Pentaho Worker Nodes System Recommendations
There are several hardware, networking, and operating system recommendations for running the Pentaho Worker Nodes Product on one or more instances.
Resource Recommendations
This section provides basic resource requirements for running Pentaho Worker Nodes Product on an HCI instance. You can scale your own worker nodes environments based on your work item load. When aligning your available resources with the work item load you want to run, keep the following guidelines in mind:
- A single instance of HCI requires 8 GB of RAM and 2 cores per machine, plus 50 GB of available disk space.
Resource | Required amounts to run a single HCI instance |
---|---|
RAM | 8 GB |
CPU | 2 cores |
Available disk space | 50 GB |
- A single worker node is configured for 8 GB of RAM and 2 cores from the cluster, plus 8 GB of available disk space.
Resource | Required amounts to run a single worker node |
---|---|
RAM | 8 GB |
CPU | 2 cores |
Available disk space | 8 GB approximately |
Given the above guidelines, a machine with 32 GB of RAM and 8 cores must reserve 8 GB and 2 cores for running the required HCI instance. You can, therefore, allocate the remaining 24 GB and 6 cores to running three worker nodes simultaneously.
Resource | Required amounts to run 3 worker nodes simultaneously on a single HCI instance |
---|---|
RAM | 32 GB = 8 GB (HCI instance) + 24 GB (8 GB per worker node) |
CPU | 8 cores = 2 cores (HCI instance) + 6 cores (2 cores per worker node) |
Available disk space | 75 GB = 50 GB (HCI instance) + 25 GB (8 GB disk per worker node) |
Use these guidelines to scale your resources. The more worker nodes you run simultaneously, the more hardware and resources you may need.
Single-Instance Systems Versus Multi-Instance Systems
An HCI system can have a single instance or it can have multiple instances of four or more. Each instance must meet the minimum RAM, CPU, and disk space requirements.
Three instances are sufficient to perform leader election for distributing work. However, a multi-instance system requires a minimum of four instances because, with the minimum hardware requirements, three instances are not sufficient for running all HCI services at their recommended distributions.
Single Instance System
A single-instance system is useful for testing and demonstration purposes. It requires only a single server and can perform all HCI functionality.
However, a single-instance system has the following drawbacks:
- It has a single point of failure. If the instance hardware fails, you lose access to HCI.
- With no additional instances, you cannot choose where to run HCI services. All services run on that one instance.
Multiple Instances System
A multi-instance system is recommended for use in a production environment because it offers the following advantages:
- You can control how services are distributed across the multiple instances, providing improved service redundancy, scale-out, and availability.
- A multi-instance system can survive instance outages. For example, with a four-instance system running the default distribution of services, the system can lose one instance and remain available.
- Performance is improved since work is performed in parallel across instances.
- You can add additional instances to the system at any time.
You cannot convert a single-instance system to a production-ready multi-instance system by adding new instances since HCI does not support adding additional master instances. Master instances are special instances that run a particular set of HCI services. Single instance systems have one master instance. Multi-instance systems have a minimum of three master servers.
By adding additional instances to a single-instance system, your system still has only one master instance, meaning there is a single point of failure for the essential services that only a master instance can run.
A multi-instance system should have a minimum of three master servers. A non-master or worker node can be added to a multi-instance if the minimum of three is the starting point.
The three master instance IP values should be determined before you run the Setup script. Once HCI is installed, any IP changes would require the complete removal and re-installation of HCI to enact the changes, such as changing single-instance IP values to multi-instance IP values.
For information on adding instances to an existing HCI system, see the HCI Administrator Help, which is available from the Administration App.
Docker and Operating System Requirements
To be an HCI instance, each server you provide must meet the following requirements:
- Must have Docker version 1.10.3 or later installed
- Must run a 64-bit Linux distribution
You must install the current Docker version suggested by your operating system, unless that version is earlier than 1.10.3. HCI cannot run with Docker versions prior to 1.10.3.
For more information about the Docker versions suggested by various operating systems, refer to the HCI Install Guide included with your installation.
Docker Considerations
Ensure that the Docker storage driver is configured correctly on each instance before installing HCI. After HCI is installed, changing the Docker storage driver requires a reinstallation of HCI.
To view the current Docker storage driver on an instance, run the command:
docker info
Do not run the Docker Device Mapper storage driver in loop-lvm mode on a production system, because it can slow system performance. On certain Linux distributions, your system may not have enough space to run it.
The Docker installation directory on each instance must have at least 20 GB available for storing the HCI Docker images.
Networking
The following describes the network usage and requirements for both system instances and services.
Notes
- You must configure the network settings for each service when you install the system. You cannot change these settings after the system is up and running.
- If your networking environment changes after you deploy HCI, such that HCI can no longer function with its current networking configuration, you need to reinstall the HCI system. For more information about networking, refer to the HCI Install Guide included with your installation.
For more information about adding network security, see Enabling Secure Communication for Pentaho Worker Nodes.
Instance IP Address Requirements
All instance IP addresses must be static, including both internal and external network IP addresses, if applicable to your system.
If the IP address of any instance changes, refer to the HCI Install Guide included with your installation.
Network Types
Each HCI service can bind to one type of network, either internal or external, for receiving incoming traffic. If your network infrastructure supports having two networks, you may want to isolate the traffic for most system services to a secured internal network that has limited access. You can then leave only the Search-App and Admin-App services on your external network for user access.
You can use either a single network type for all services or a mix of both types. If you want to use both types, every instance in your system must be addressable by two IP addresses: one on your internal network and one on your external network. If you use only one network type, each instance needs only one IP address.
Allowing Access to External Resources
Regardless of whether you are using a single network type or a mix of types, you need to configure your network environment to ensure that all instances have outgoing access to the external resources you want to use, including:
- The data sources where your data is stored.
- Identity providers for user authentication.
- Email servers that you want to use for sending email notifications.
Ports
Each service binds to a number of ports for receiving incoming traffic.
Before installing HCI, you can configure the services to use different ports, or use the default values shown below. For more information, see Optional: Set Up Networking for System Services.
System-External Ports
The following table contains information about the service ports that users use to interact with the system. On every instance in the system, each of these ports must be accessible from:
- Any network that requires administrative or search access to the system.
- Every other instance in the system.
Default Port Value | Service | Purpose |
---|---|---|
8000 | Admin-App |
Access to administrative interfaces:
|
If you are enabling security, you will need to indicate a port value for secure communication. See Enabling Secure Communication for Pentaho Worker Nodes for more information.
System-Internal Ports
Determine which ports each HCI service should use. You can use the default ports for each service or specify different ones. In either case, these restrictions apply:
- Every port must be accessible from all instances in the system.
- Some ports must be accessible from outside the system.
- All port values must be unique; no two services, whether system services or HCI services, can share the same port.
- For information on port usage and requirements for each HCI service, see Ports.
You can find more information on how these ports are used in the documentation for the third-party software underlying each service. Refer to “Appendix B: Services” in the HCI Install Guide included with your installation.
Set Up HCI and Pentaho for Worker Nodes
Complete the instructions in the following articles to set up HCI and Pentaho to use worker nodes:
Run and Administer the Pentaho Worker Nodes Product
Use the following articles to assist you in running and administering Pentaho Worker Nodes: