Solutions configuration reference
As the LDOS administrator, you can configure solutions for your organization. Current configuration information is stored in the values.yaml file in the Configuration tab of each solution.
Changes to the values.yaml file are saved as a new configuration in the Configuration tab. New run resources are created and previously used resources are destroyed. The History tab tracks the configuration changes, including information about the revision, status, and description of the action taken.
To edit the configuration of a solution, click inside the editor and make your changes to the currently running configuration parameters in the values.yaml file. When finished, click Save to save the file and restart the service. Clicking Reset discards the changes you are making in the configuration file.
If you already saved your changes, but want to revert back to the previous values, then you can use the Rollback action on the History tab. For more information, see Rollback a revision.
Messaging Service
The Messaging Service solution provides the communications infrastructure for the Dataflow Engine and Dataflow Studio solutions in Lumada DataOps Suite. You can scale the messaging queues by configuring the number of replicas (message brokers) and the size of the persistent message stores.
This solution is shared by other solutions. As best practice, you should only install one version and not append the timestamp to the solution name, which will simplify its discovery by other solutions.
The user-configurable values for the Messaging Service are defined in the following table:
Parameter | Description | Default |
kafka.replicas | The number of replicas used for the Messaging Service solution. | 3 |
kafka.Persistence.size | The size of the persistent volumes. This value can only be changed during installation. CautionThe size for the persistent volumes and all messages are lost if the installation is removed. | 2 GB |
Establishing messaging safeguards
Your dataflows rely on messages continually flowing through the Messaging Service solution. How well the Messaging Service solution performs depends on whether you balance the messaging load and how you configure the Messaging Service solution properties to prevent persistent storage from filling up.
Load balance the Messaging Service solution
- You can upgrade the Messaging Service solution without having to re-run your active dataflows.
- You can recover from a loss of all broker pods without affecting dataflows.
Perform the following steps to make sure the Messaging Service solution uses load balancing:
Procedure
Open Solution management from Lumada DataOps Suite.
Navigate to the Configuration tab in the Messaging Service solution under installed solutions.
Verify the following lines of code are in the values.yaml textbox under the Configuration tab:
kafka: ... external: enable: true type: LoadBalancer
Add the above lines of code if they do not appear in the values.yaml textbox.
Results
Prevent the persistent message storage from filling up
configurationOverrides
keyword. You must set two types of overrides to delete segments before they fill up the persistent message storage. Set log.flush.interval
to make the segments available for deletion, and set log.retention
and log.cleanup.policy
to physically delete the segments.The following overrides are set by default:
configurationOverrides: log.flush.interval.message: 10000 log.retention.bytes: "699050667" log.retention.minutes: 30 log.cleanup.policy: delete
Setting log.flush.interval
to 10000 messages makes the segments available for deletion after processing 10,000 messages or approximately 20 Mb based on 2K log message sizes. Setting log.retention
to 699050667 bytes and 30 minutes makes a period of size and time where the segments are still available (not yet deleted). The example size of 699050667 bytes is calculated based on the maximum size of a topic containing the log messages from every transform run and accounts for over 98% of the total messages by size. The example time of 30 minutes ensures a segment will only be deleted after 30 minutes has expired on its oldest message and it is no longer the active segment. Setting log.cleanup.policy
to delete prevents persistent messages from filling up the storage by removing segments as opposed to compacting them. You should adjust these overrides per your system.
Perform the following steps to adjust the overrides to prevent persistent message storage from filling up by deleting segments after a specified retention interval per your system:
Procedure
Open Solution management from Lumada DataOps Suite.
Navigate to the Configuration tab in the Messaging Service solution under installed solutions.
Verify the following
configurationOverrides
section with settings forlog.flush.interval
,log.retention
, andlog.cleanup.policy
exists in the values.yaml textbox under the Configuration tab:configurationOverrides: log.flush.interval.message: 10000 log.retention.bytes: "699050667" log.retention.minutes: 30 log.cleanup.policy: delete
Adjust any of these overrides, as needed, per your system. See Kafka documentation for details about these types of overrides.
Dataflow Importer
The Dataflow Importer solution is the background service that automatically imports your staged dataflow files into Dataflow Studio. You must ensure that the files are tagged correctly and placed in the applicable network file system (NFS) folder for successful importing. See Importing dataflows for details.
You can configure the scanning interval of the Dataflow Importer for ingestion of new, revised, and deleted files. As a best practice, begin with an interval of 6 minutes (360000 milliseconds) and then add minutes as your expected file count increases. Use an interval proportional to the number of files imported to allow the process to complete. For example, if you have 100,000 files to import, your setting should be much longer than the interval used to import 1,000 files.
To access the Interval setting, from the Solution management window, click Installed
in the navigation pane and then click Dataflow Importer. Click the
Configuration tab to view the setting details for the
values.yaml
.
Parameter | Description | Default |
components:importer:interval | The time, in milliseconds, that elapses before the Dataflow Importer scans the staging folder. This setting should always be greater than 0, but less than 2147483647. | 10000 |
Data Processing Service
Lumada Data Catalog uses the Data Processing Service for multi-node processing using a secure Spark 3.1.1 history server and S3 storage that you provide.
The Data Processing Service provides a Spark history server instance for convenience, but you do not have to use this instance. The Data Processing Service history server is configured to connect to a valid S3 filesystem and valid S3 path. Amazon Web Services (AWS) S3 and Minio are examples of valid S3 filesystems. If a valid S3 path is not provided during installation, the Data Processing Service will not install successfully.