Skip to main content

Pentaho+ documentation has moved!

The new product documentation portal is here. Check it out now at docs.hitachivantara.com

 

Hitachi Vantara Lumada and Pentaho Documentation

Requirements

Pentaho Data Catalog requires specific external components and applications to operate optimally. This article provides a list of those components and applications along with details of their use and the versions Data Catalog supports.

Environment considerations

To ensure proper software development and deployment practices, it is a best practice to have two separate environments:

  • Development or Staging
  • Production

System requirements

This section outlines the software, hardware, and access requirements you should have before you install Data Catalog.

Checklist for infrastructure requests

Perform the following tasks as needed to prepare your environment for Data Catalog:

  • Request a Virtual Machine (VM) on Azure or AWS or on-premises.
  • Request IDs with remote access permissions to the VM on your cloud or on-premises.
  • Request necessary access to systems, applications, and data sources.
  • Request VDI or VPN access for Data Catalog data engineers to enable remote access to the VM.
  • Request a database user account (service account) or logins for connecting to the data sources.
  • Make sure the database user account has read-only permissions for the database objects, including system catalog tables.
  • Make sure that your system owner or Database Administrator (DBA) has copied or extracted any required data or files.
  • Obtain an SSL certificate from a certificate authority. If required by your organization's security policy, raise an infrastructure support request for an SSL certificate. The certificate authority will give you a key file and a certificate file.

Hardware requirements

Your server and network must meet the following requirements:

CategoryDescription
CPU16 cores (minimum)

32 cores (recommended)

RAM

64 GB (minimum)

128 GB (recommended)

Disk storage1 TB (minimum)
Network1 Gbps

If the server is running on AWS or Azure, review the following requirements.

AWS EC2 details

An AWS EC2 virtual machine (VM) has the following requirements:

Minimum RequirementsPreferred Requirements
Sizem5.4xlargem2.8xlarge
vCPU16 cores32 cores
Memory64 GB128 GB
NoteYou can attach Amazon Elastic Block Store (EBS) storage to these VMs.

Azure VM details

An Azure VM has the following requirements:

Minimum RequirementsPreferred Requirements
SizeB_16s_v2B_32s_v2
vCPU16 cores32 cores
Memory64GB128 GB
NoteYou can attach standard SSDs, standard HDDs, and premium SSDs disk storage to these VMs.

Server storage requirements

The server file systems and storage must meet the following requirements:

  • At least 10 GB of storage should be allocated for the root file system.
  • Ample storage should be mounted in the designated Docker storage area (typically the default on Linux servers).
NoteAny POSIX-compliant file system can be used, but XFS, the standard file system in RHEL, is well-tested.

Operating system requirements

You must have dedicated servers available with a hosting environment. The hosting environment can be on-premises or on the cloud using platforms such as Azure or AWS.

The server must run one of the following amd64 architecture Linux operating systems:

  • Amazon Linux 2 (AWS only)
  • CentOS 7 or 8

Linux kernel version

Version 4.0 or higher of the Linux kernel is required. For RHEL, use version 3.10.0-514 of the kernel or a higher version.

NoteThe overlay and overlay2 drivers are supported on XFS backing file systems, but only with d_type=true enabled.
  • To verify that the ftype option is set to 1, use the command xfs_info and check the output. To format an XFS file system correctly, use the flag -n ftype=1.
  • If the dedicated server is restarted, make sure to enable auto start-up for Docker by executing the following commands:
    sudo systemctl enable docker.service
    sudo systemctl enable containerd.service
    

Network security and firewall requirements

The network security and firewall must meet the following requirements:

  • Ports 80 and 443 should be open for inbound traffic.
  • The application server must have network connectivity to the database server and port.
NoteThe default installation includes a signed certificate for HTTPS enablement on port 443. However, if desired, you can obtain an SSL certificate from a certificate authority.

User account

The server user account used for the installation must either be the root user or have appropriate permissions to run Docker. To set up Docker permissions for non-root users, see the official Docker documentation at https://docs.docker.com/engine/install/linux-postinstall/.

Software requirements

Before you install Data Catalog, Docker must already be installed on the server and configured to start on boot. See the official Docker documentation at https://docs.docker.com/engine/install/ for instructions on installing Docker.

NameRequirements
DockerVersion 22.0+
Docker ComposeVersion 2.22.0+

Additional software

For seamless SSH connectivity and secure file transfer between your machine and the server, it is a best practice to install the following software on your machine:

  • An SSH client such as PuTTY (recommended), a widely used SSH client for Windows.
  • WinSCP for a graphical user interface to securely transfer files between the client and the server using SSH.

Data source connectivity

The following table contains the supported data sources and respective requirements to connect with Data Catalog.

Data source Requirements
AWS S3
  • AWS region where the S3 bucket was created
  • Access key and secret access key
  • Read-only permissions to the S3 bucket
Azure Blob Storage
  • Account Fully Qualified Domain Name (FQDN)
  • Client ID and client key
  • authTokenEndpoint
HCP
  • AWS region where the S3 bucket was created
  • Access key and secret access key
  • Read-only permissions to the S3 bucket
RDBMSTo enable Data Catalog to perform data profiling, grant read-only access to all database objects and system catalog tables.
SMB/CIFS
  • URI should provide hostname and share folder details
  • Username and password to access the SMB/CIFS Share Directory
  • Path of directory that needs to be scanned
  • Read-only access is required

(Optional) Client Virtual Device Interface (VDI)

The following table contains the client’s VDI requirements.

CategoryRequirements
Server configuration
  • Windows operating system
  • 16 GB RAM
Disk or storage
  • 100 GB minimum
Others
  • Internet connectivity
  • Google Chrome browser
  • Permission to download files from the FTP server (secure FTP access)