Skip to main content

Pentaho+ documentation has moved!

The new product documentation portal is here. Check it out now at docs.hitachivantara.com

 

Hitachi Vantara Lumada and Pentaho Documentation

Install Pentaho Data Catalog

This article covers the installation of Data Catalog using the release package. You can install Pentaho Data Storage Optimizer in a new or existing Data Catalog environment.

Installing Data Catalog

Before you begin, you must have root privileges or have the necessary permissions to run Docker as part of the installation process.

ImportantIt is a best practice before installing a new version to save a copy of your conf/.env file to save any environment customizations you have made, in case the file is overwritten during the installation process. During installation, Data Catalog checks for a PDC_DATA_ENCRYPTION_KEY environment variable in the conf/.env file. If the variable exists, the conf/.env file is retained. However, if the variable does not exist, Data Catalog generates a new .env file containing a PDC_DATA_ENCRYPTION_KEY environment variable. If needed, you can add any custom environment variable settings back in to the new .env file from your saved file.

Perform the following steps to install Data Catalog:

Procedure

  1. Open a terminal window on your dedicated Data Catalog deployment server.

  2. Save the Data Catalog release package in the Data Catalog server.

  3. Extract the files from the release package to the /opt directory using the following command:

    tar -xvf [name of release package].tar.gz -C /opt

    The command creates a pentaho directory and extracts the contents of the deployment into a pdc-docker-deployment subdirectory.
  4. Start all the Dockers using the following command:

    sh pdc.sh up
  5. (Optional) If you are installing Pentaho Data Storage Optimizer, copy and paste the following commands to set environment variables and generate the required tokens, add them to the environment files, and restart all the Docker containers:

    echo RULES_PDC_AUTH_TOKEN=\"$(./pdc.sh get-jwt-token RULES_ENGINE)\" >> ./conf/.env
    echo PDSO_PDC_AUTH_TOKEN=\"$(./pdc.sh get-jwt-token PDSO)\" >> ./conf/.env
    echo PDSO_VFS_EXTERNAL_HOST_IP=\"$(hostname -I | awk '{print $1}')\"  >> ./conf/.env
    echo PDC_FE_PDSO_URL=/pdso/ >> ./conf/.env
    echo COMPOSE_PROFILES=mongodb,collab,pdso >> ./conf/.env
    CautionModifying these settings can have Pentaho product implications, and incorrect changes may negatively impact the functionality of the other product. It is a best practice to collaborate with your Pentaho Data Catalog partner to ensure that any modifications align with your intended objectives.
  6. Restart the Docker containers to update them with the new environment changes:

    sh pdc.sh up

    The installation script uses the packaged Docker images for the Data Catalog release and the Data Storage Optimizer release, if installed, to create and run Docker containers on your dedicated server. The installation finishes when each Docker container has successfully started.
  7. Access Data Catalog and Data Storage Optimizer, if installed, through your browser (the Chrome browser is recommended) using the server name or IP address and confirm that the applications are successfully installed and running.

    NoteFor new installations, you are redirected to the Create Admin Account page.

Results

Data Catalog is successfully installed.

Next steps

After installing Data Catalog, you may need to set up other components, depending on your environment. For more information, see Advanced configuration.

Installing Data Storage Optimizer into a Data Catalog deployment

The process below installs Data Storage Optimizer to an existing Data Catalog deployment.

NoteData Storage Optimizer can be installed into Data Catalog version 10.0.1.

Perform the following steps to install Data Storage Optimizer into Data Catalog:

Procedure

  1. Open a terminal window on your dedicated Data Catalog deployment server.

  2. On the Data Catalog server, navigate to the pentaho/pdc-docker-deployment directory using the following command:

    cd pentaho/pdc-docker-deployment
  3. Start all the Dockers using the following command:

    sh pdc.sh up
  4. Copy and paste the following commands to generate the required tokens and add them to the environment files:

    echo RULES_PDC_AUTH_TOKEN=\"$(./pdc.sh get-jwt-token RULES_ENGINE)\" >> ./conf/.env
     echo PDSO_PDC_AUTH_TOKEN=\"$(./pdc.sh get-jwt-token PDSO)\" >> ./conf/.env
     echo PDSO_VFS_EXTERNAL_HOST_IP=\"$(hostname -I | awk '{print $1}')\"  >> ./conf/.env
     echo PDC_FE_PDSO_URL=/pdso/ >> ./conf/.env
     echo COMPOSE_PROFILES=mongodb,collab,pdso >> ./conf/.env
     

    CautionModifying these settings can have Pentaho product implications, and incorrect changes may negatively impact the functionality of the other products. It is a best practice to collaborate with your Pentaho Data Catalog partner to ensure that any modifications align with your intended objectives.
  5. Re-run the installation script:

    ./pdc.sh up

    The installation script uses the packaged Docker images for the Data Storage Optimizer release to create and run Docker containers on your dedicated server. The installation finishes when each Docker container has been successfully started.

    Data Storage Optimizer is successfully installed.

Results

To begin using Data Storage Optimizer, you must login. User authentication, roles, and permissions for Data Storage Optimizer are provided by Data Catalog. Use the login credentials shared by your Data Catalog administrator for access. Click the App Switcher in Data Catalog and select Data Storage Optimizer to access the Data Storage Optimizer application.