Skip to main content
Hitachi Vantara Lumada and Pentaho Documentation

Solutions configuration reference

Parent article

As the LDOS administrator, you can configure solutions for your organization. Current configuration information is stored in the values.yaml file in the Configuration tab of each solution.

Changes to the values.yaml file are saved as a new configuration in the Configuration tab. New run resources are created and previously used resources are destroyed. The History tab tracks the configuration changes, including information about the revision, status, and description of the action taken.

To edit the configuration of a solution, click inside the editor and make your changes to the currently running configuration parameters in the values.yaml file. When finished, click Save to save the file and restart the service. Clicking Reset discards the changes you are making in the configuration file.

If you already saved your changes, but want to revert back to the previous values, then you can use the Rollback action on the History tab. For more information, see Rollback a revision.

CautionThis task may impact your initial installation and configuration settings. As a best practice, you should only perform this task with the guidance of your IT administrator or your Hitachi Vantara Customer Success representative.

Messaging Service

The Messaging Service solution provides the communications infrastructure for the Dataflow Engine and Dataflow Studio solutions in Lumada DataOps Suite. You can scale the messaging queues by configuring the number of replicas (message brokers) and the size of the persistent message stores.

This solution is shared by other solutions. As best practice, you should only install one version and not append the timestamp to the solution name, which will simplify its discovery by other solutions.

The user-configurable values for the Messaging Service are defined in the following table:

Parameter Description Default
kafka.replicas The number of replicas used for the Messaging Service solution. 3
kafka.Persistence.size The size of the persistent volumes. This value can only be changed during installation.
CautionThe size for the persistent volumes and all messages are lost if the installation is removed.
2 GB

Establishing messaging safeguards

Your dataflows rely on messages continually flowing through the Messaging Service solution. How well the Messaging Service solution performs depends on whether you balance the messaging load and how you configure the Messaging Service solution properties to prevent persistent storage from filling up.

Load balance the Messaging Service solution

As a best practice, you should balance the message load to ensure the following results:
  • You can upgrade the Messaging Service solution without having to re-run your active dataflows.
  • You can recover from a loss of all broker pods without affecting dataflows.

Perform the following steps to make sure the Messaging Service solution uses load balancing:

Procedure

  1. Open Solution management from Lumada DataOps Suite.

  2. Navigate to the Configuration tab in the Messaging Service solution under installed solutions.

  3. Verify the following lines of code are in the values.yaml textbox under the Configuration tab:

    kafka:
    ...
    external:
       enable: true
       type: LoadBalancer
  4. Add the above lines of code if they do not appear in the values.yaml textbox.

Results

This Kafka load balancer setting ensures upgrades and recovery can occur without affecting dataflows.

Prevent the persistent message storage from filling up

As a best practice, you should adjust the default Messaging Service solution configuration overrides so that when a topic is created the persistent message storage is prevented from filling up. You can override select configurations by using the configurationOverrides keyword. You must set two types of overrides to delete segments before they fill up the persistent message storage. Set log.flush.interval to make the segments available for deletion, and set log.retention and log.cleanup.policy to physically delete the segments.

The following overrides are set by default:

   configurationOverrides:
      log.flush.interval.message: 10000
      log.retention.bytes: "699050667"
      log.retention.minutes: 30
      log.cleanup.policy: delete

Setting log.flush.interval to 10000 messages makes the segments available for deletion after processing 10,000 messages or approximately 20 Mb based on 2K log message sizes. Setting log.retention to 699050667 bytes and 30 minutes makes a period of size and time where the segments are still available (not yet deleted). The example size of 699050667 bytes is calculated based on the maximum size of a topic containing the log messages from every transform run and accounts for over 98% of the total messages by size. The example time of 30 minutes ensures a segment will only be deleted after 30 minutes has expired on its oldest message and it is no longer the active segment. Setting log.cleanup.policy to delete prevents persistent messages from filling up the storage by removing segments as opposed to compacting them. You should adjust these overrides per your system.

Perform the following steps to adjust the overrides to prevent persistent message storage from filling up by deleting segments after a specified retention interval per your system:

Procedure

  1. Open Solution management from Lumada DataOps Suite.

  2. Navigate to the Configuration tab in the Messaging Service solution under installed solutions.

  3. Verify the following configurationOverrides section with settings for log.flush.interval, log.retention, and log.cleanup.policy exists in the values.yaml textbox under the Configuration tab:

       configurationOverrides:
          log.flush.interval.message: 10000
          log.retention.bytes: "699050667"
          log.retention.minutes: 30
          log.cleanup.policy: delete
  4. Adjust any of these overrides, as needed, per your system. See Kafka documentation for details about these types of overrides.

Dataflow Importer

The Dataflow Importer solution is the background service that automatically imports your staged dataflow files into Dataflow Studio. You must ensure that the files are tagged correctly and placed in the applicable network file system (NFS) folder for successful importing. See Importing dataflows for details.

You can configure the scanning interval of the Dataflow Importer for ingestion of new, revised, and deleted files. As a best practice, begin with an interval of 6 minutes (360000 milliseconds) and then add minutes as your expected file count increases. Use an interval proportional to the number of files imported to allow the process to complete. For example, if you have 100,000 files to import, your setting should be much longer than the interval used to import 1,000 files.

To access the Interval setting, from the Solution management window, click Installed in the navigation pane and then click Dataflow Importer. Click the Configuration tab to view the setting details for the values.yaml.

ParameterDescriptionDefault
components:importer:intervalThe time, in milliseconds, that elapses before the Dataflow Importer scans the staging folder. This setting should always be greater than 0, but less than 2147483647.10000

Data Processing Service

Lumada Data Catalog uses the Data Processing Service for multi-node processing using a secure Spark 3.1.1 history server and S3 storage that you provide.

The Data Processing Service provides a Spark history server instance for convenience, but you do not have to use this instance. The Data Processing Service history server is configured to connect to a valid S3 filesystem and valid S3 path. Amazon Web Services (AWS) S3 and Minio are examples of valid S3 filesystems. If a valid S3 path is not provided during installation, the Data Processing Service will not install successfully.