Data Integration Operations Mart
The PDI Operations Mart is a centralized data mart that stores job or transformation log data for auditing, reporting, and analysis. The PDI Operations Mart enables you to collect and query Data Integration log data and then use the Pentaho Server tools to examine the log data in reports, charts, and dashboards. The data mart is a collection of tables organized as a data warehouse using a star schema. Together, the dimension tables and a fact table represent the logging data. These tables must be created in the PDI Operations Mart database. Pentaho provides SQL scripts to create these tables for the PostgreSQL database. A Data Integration job populates the time and date dimensions.
Note: For optimal performance, be sure to clean the operations mart periodically.
Getting Started
Installation of DI Operations Mart depends on the following conditions and prerequisites:
Database Requirement
Before proceeding with the DI Operations Mart installation steps below, ensure that your Pentaho Server and Repository are configured with one of the following database types:
- PostgreSQL
- MySQL
- Oracle
- MS SQL Server
If you need to review the Pentaho Server installation method, see Pentaho Installation.
Existing 8.2 Installation
If you have an existing 8.2 installation of the PDI client (Spoon) and the Pentaho Server, you must configure them to use the DI Operations Mart using the Installation Steps below.
Installation Steps
To install the DI Operations Mart, you will perform the following steps:
- Step 1: Get the DI Operations Mart Files
- Step 2: Run the Setup Script
- Step 3: Set the Global Kettle Logging Variables
- Step 4: Add Logging and Operations Mart Connections
- Step 5: Add the DI Operations Mart ETL Solution and Sample Reports to the Repository
- Step 6: Initialize the DI Operations Mart
- Step 7: Verify the DI Operations Mart is Working
Step 1: Get the DI Operations Mart Files
The DI Ops Mart files are available for download from the Pentaho Customer Support Portal.
- On the Customer Portal home page, sign in using the Pentaho support user name and password provided in your Pentaho Welcome Packet.
- Click Downloads, then click Pentaho 8.2 GA Release in the 8.x list.
- On the bottom of the Pentaho 8.2 GA Release page, browse the folders in the Box widget to find the files you need, located in the Operations Mart folder:
- pentaho-operations-mart-5.0.0-dist.zip
- Unzip the Pentaho Operations Mart file. Inside are the packaged Operations Mart installations file.
-
Unpack the installation file by running the installer file for your environment.
- In the IZPack window, read the license agreement, select I accept the terms of this license agreement, and then click Next.
- In the Select the installation path text box, browse to or enter the directory location where you want to unpack the files, then click Next.
- If you chose an existing directory, a warning message that the directory already exists appears. Click Yes. Any existing files in the directory will be retained.
- When the installation progress is complete, click Quit. Your directory will contain the setup scripts and files used to create the default content in the following steps.
Step 2: Run the Setup Script
Depending on your database repository type, run the following scripts to create the tables which will capture the activity of transformations and jobs.
The pentaho-operations-mart-ddl-5.0.0.zip file contains folders for each database type listed with the scripts that are needed.
Database Type | Script Name | Located in the Directory |
---|---|---|
PostgreSQL |
|
/pentaho-server/data/postgresql |
MySQL |
|
/pentaho-server/data/mysql5 |
Oracle |
|
/pentaho-server/data/oracle10g |
Microsoft SQL Server |
|
/pentaho-server/data/sqlserver |
** This script is an optional installation during the Pentaho Server installation (either Windows or Linux).
Step 3: Set the Global Kettle Logging Variables
Perform this step on the computer where you have installed your Pentaho Data Integration (PDI) client and Pentaho Server.
When you run PDI for the first time, the kettle.properties file is created and stored in the $USER_HOME/.kettle.properties directory.
- In the PDI client, choose Edit > Edit the kettle.properties file
- Add or edit the variables and values to reflect the values in the shown in the following table:
For Oracle and Microsoft SQL Server, leave Value blank with Variables that contain SCHEMA in the name.
Variable | Value |
---|---|
KETTLE_CHANNEL_LOG_DB | live_logging_info |
KETTLE_CHANNEL_LOG_TABLE | channel_logs |
KETTLE_CHANNEL_LOG_SCHEMA | pentaho_dilogs |
KETTLE_JOBENTRY_LOG_DB | live_logging_info |
KETTLE_JOBENTRY_LOG_TABLE | jobentry_logs |
KETTLE_JOBENTRY_LOG_SCHEMA | pentaho_dilogs |
KETTLE_JOB_LOG_DB | live_logging_info |
KETTLE_JOB_LOG_TABLE | job_logs |
KETTLE_JOB_LOG_SCHEMA | pentaho_dilogs |
KETTLE_METRICS_LOG_DB | live_logging_info |
KETTLE_METRICS_LOG_TABLE | metrics_logs |
KETTLE_METRICS_LOG_SCHEMA | pentaho_dilogs |
KETTLE_STEP_LOG_DB | live_logging_info |
KETTLE_STEP_LOG_TABLE | step_logs |
KETTLE_STEP_LOG_SCHEMA | pentaho_dilogs |
KETTLE_TRANS_LOG_DB | live_logging_info |
KETTLE_TRANS_LOG_TABLE | trans_logs |
KETTLE_TRANS_LOG_SCHEMA | pentaho_dilogs |
KETTLE_TRANS_PERFORMANCE_LOG_DB | live_logging_info |
KETTLE_TRANS_PERFORMANCE_LOG_TABLE | transperf_logs |
KETTLE_TRANS_PERFORMANCE_LOG_SCHEMA | pentaho_dilogs |
Step 4: Add Logging and Operations Mart Connections
This section explains how to add the logging (live_logging_info) and Operations Mart (PDI_Operations_Mart) connections for a PDI client.
- Navigate to the pentaho/design-tools/data-integration/simple-jndi directory.
- Open the jdbc.properties file with a text editor.
- Depending on your repository database type, update the values accordingly (URL, users, password) as shown in the samples:
The URLs, users, and passwords may need to be obtained from your system administrator.
PostgreSQL:
PDI_Operations_Mart/type=javax.sql.DataSource PDI_Operations_Mart/driver=org.postgresql.Driver PDI_Operations_Mart/url=jdbc:postgresql://localhost:5432/hibernate?searchpath=pentaho_operations_mart PDI_Operations_Mart/user=hibuser PDI_Operations_Mart/password=password live_logging_info/type=javax.sql.DataSource live_logging_info/driver=org.postgresql.Driver live_logging_info/url=jdbc:postgresql://localhost:5432/hibernate?searchpath=pentaho_dilogs live_logging_info/user=hibuser live_logging_info/password=password
MySQL:
PDI_Operations_Mart/type=javax.sql.DataSource PDI_Operations_Mart/driver=com.mysql.jdbc.Driver PDI_Operations_Mart/url=jdbc:mysql://localhost:3306/pentaho_operations_mart PDI_Operations_Mart/user=hibuser PDI_Operations_Mart/password=password live_logging_info/type=javax.sql.DataSource live_logging_info/driver=com.mysql.jdbc.Driver live_logging_info/url=jdbc:mysql://localhost:3306/pentaho_dilogs live_logging_info/user=hibuser live_logging_info/password=password
Oracle:
PDI_Operations_Mart/type=javax.sql.DataSource PDI_Operations_Mart/driver=oracle.jdbc.OracleDriver PDI_Operations_Mart/url=jdbc:oracle:thin:@localhost:1521/XE PDI_Operations_Mart/user=pentaho_operations_mart PDI_Operations_Mart/password=password live_logging_info/type=javax.sql.DataSource live_logging_info/driver=oracle.jdbc.OracleDriver live_logging_info/url=jdbc:oracle:thin:@localhost:1521/XE live_logging_info/user=pentaho_dilogs live_logging_info/password=password
Microsoft SQL Server:
PDI_Operations_Mart/type=javax.sql.DataSource PDI_Operations_Mart/driver=com.microsoft.sqlserver.jdbc.SQLServerDriver PDI_Operations_Mart/url=jdbc:sqlserver://10.0.2.15:1433;DatabaseName=pentaho_operations_mart PDI_Operations_Mart/user=pentaho_operations_mart PDI_Operations_Mart/password=password live_logging_info/type=javax.sql.DataSource live_logging_info/driver=com.microsoft.sqlserver.jdbc.SQLServerDriver live_logging_info/url=jdbc:sqlserver://10.0.2.15:1433;DatabaseName=pentaho_dilogs live_logging_info/user=dilogs_user live_logging_info/password=password
Step 5: Add a JNDI Connection for the Pentaho Server
This section explains how to add a JNDI connection for the Pentaho Server. Perform this task on the computer where you have installed the Pentaho Server.
- Navigate to the
pentaho/server/pentaho-server/tomcat/webapps/Pentaho/META-INF/
folder. - Open the
context.xml
file with a text editor. - Depending on your database type, edit the file to reflect the values in the applicable example:
PostgreSQL:
<Resource name="jdbc/PDI_Operations_Mart" auth="Container" type="javax.sql.DataSource" factory="org.apache.tomcat.jdbc.pool.DataSourceFactory" maxActive="20" minIdle="0" maxIdle="5" initialSize="0" maxWait="10000" username="hibuser" password="password" driverClassName="org.postgresql.Driver" url="jdbc:postgresql://localhost:5432/hibernate" validationQuery="select 1"/> <Resource name="jdbc/pentaho_operations_mart" auth="Container" type="javax.sql.DataSource" factory="org.apache.tomcat.jdbc.pool.DataSourceFactory" maxActive="20" minIdle="0" maxIdle="5" initialSize="0" maxWait="10000" username="hibuser" password="password" driverClassName="org.postgresql.Driver" url="jdbc:postgresql://localhost:5432/hibernate" validationQuery="select 1"/> <Resource name="jdbc/live_logging_info" auth="Container" type="javax.sql.DataSource" factory="org.apache.tomcat.jdbc.pool.DataSourceFactory" maxActive="20" minIdle="0" maxIdle="5" initialSize="0" maxWait="10000" username="hibuser" password="password" driverClassName="org.postgresql.Driver" url="jdbc:postgresql://localhost:5432/hibernate?searchpath=pentaho_dilogs" validationQuery="select 1"/>
MySQL:
<Resource name="jdbc/PDI_Operations_Mart" auth="Container" type="javax.sql.DataSource" factory="org.apache.tomcat.jdbc.pool.DataSourceFactory" maxActive="20" maxIdle="5" maxWait="10000" username="hibuser" password="password" driverClassName="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:3306/pentaho_operations_mart" jdbcInterceptors="ConnectionState" defaultAutoCommit="true" validationQuery="select 1"/> <Resource name="jdbc/pentaho_operations_mart" auth="Container" type="javax.sql.DataSource" factory="org.apache.tomcat.jdbc.pool.DataSourceFactory" maxActive="20" maxIdle="5" maxWait="10000" username="hibuser" password="password" driverClassName="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:3306/pentaho_operations_mart" jdbcInterceptors="ConnectionState" defaultAutoCommit="true" validationQuery="select 1"/> <Resource name="jdbc/live_logging_info" auth="Container" type="javax.sql.DataSource" factory="org.apache.tomcat.jdbc.pool.DataSourceFactory" maxActive="20" maxIdle="5" maxWait="10000" username="hibuser" password="password" driverClassName="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:3306/pentaho_dilogs" jdbcInterceptors="ConnectionState" defaultAutoCommit="true" validationQuery="select 1"/>
Oracle:
<Resource validationQuery="select 1 from dual" url="jdbc:oracle:thin:@localhost:1521/orcl" driverClassName="oracle.jdbc.OracleDriver" password="password" username="pentaho_operations_mart" initialSize="0" maxActive="20" maxIdle="10" maxWait="10000" factory="org.apache.tomcat.jdbc.pool.DataSourceFactory" type="javax.sql.DataSource" auth="Container" connectionProperties="oracle.jdbc.J2EE13Compliant=true" name="jdbc/pentaho_operations_mart"/> <Resource validationQuery="select 1 from dual" url="jdbc:oracle:thin:@localhost:1521/orcl" driverClassName="oracle.jdbc.OracleDriver" password="password" username="pentaho_operations_mart" initialSize="0" maxActive="20" maxIdle="10" maxWait="10000" factory="org.apache.tomcat.jdbc.pool.DataSourceFactory" type="javax.sql.DataSource" auth="Container" connectionProperties="oracle.jdbc.J2EE13Compliant=true" name="jdbc/PDI_Operations_Mart"/> <Resource validationQuery="select 1 from dual" url="jdbc:oracle:thin:@localhost:1521/XE" driverClassName="oracle.jdbc.OracleDriver" password="password" username="pentaho_dilogs" maxWaitMillis="10000" maxIdle="5" maxTotal="20" jdbcInterceptors="ConnectionState" defaultAutoCommit="true" factory="org.apache.commons.dbcp.BasicDataSourceFactory" type="javax.sql.DataSource" auth="Container" name="jdbc/live_logging_info"/>
Microsoft SQL Server:
<Resource name="jdbc/PDI_Operations_Mart" auth="Container" type="javax.sql.DataSource" factory="org.apache.tomcat.jdbc.pool.DataSourceFactory" maxTotal="20" maxIdle="5" maxWaitMillis="10000" username="pentaho_operations_mart" password="password" jdbcInterceptors="ConnectionState" defaultAutoCommit="true" driverClassName="com.microsoft.sqlserver.jdbc.SQLServerDriver" url="jdbc:sqlserver://10.0.2.15:1433;DatabaseName=pentaho_operations_mart" validationQuery="select 1"/> <Resource name="jdbc/pentaho_operations_mart" auth="Container" type="javax.sql.DataSource" factory="org.apache.tomcat.jdbc.pool.DataSourceFactory" maxTotal="20" maxIdle="5" maxWaitMillis="10000" username="pentaho_operations_mart" password="password" jdbcInterceptors="ConnectionState" defaultAutoCommit="true" driverClassName="com.microsoft.sqlserver.jdbc.SQLServerDriver" url="jdbc:sqlserver://10.0.2.15:1433;DatabaseName=pentaho_operations_mart" validationQuery="select 1"/> <Resource name="jdbc/live_logging_info" auth="Container" type="javax.sql.DataSource" factory="org.apache.tomcat.jdbc.pool.DataSourceFactory" maxTotal="20" maxIdle="5" maxWaitMillis="10000" username="dilogs_user" password="password" jdbcInterceptors="ConnectionState" defaultAutoCommit="true" driverClassName="com.microsoft.sqlserver.jdbc.SQLServerDriver" url="jdbc:sqlserver://10.0.2.15:1433;DatabaseName=pentaho_dilogs" validationQuery="select 1"/>
Step 6: Add the DI Operations Mart ETL Solution and Sample Reports to the Repository
- Stop the Pentaho Server.
- Depending on your repository database type, copy the following ETL solution and sample reports (downloaded in Step 1: Get the DI Operations Mart Files) to: $PENTAHO_HOME/pentaho-server/pentaho-solution/default-content
pentaho-operations-mart-etl-5.0.0-dist.zip may already be in this directory. If you are using a repository database type other than PostgreSQL, remove it.
-
PostgreSQL: pentaho-operations-mart-etl-5.0.0-dist.zip
-
MySQL: pentaho-operations-mart-etl-mysql5-5.0.0-dist.zip
-
Oracle: pentaho-operations-mart-etl-oracle10g-5.0.0-dist.zip
-
Microsoft SQL Server: pentaho-operations-mart-etl-mssql-5.0.0-dist.zip
- Place these two files in the directory as well:
- DI Operations Mart sample reports: pentaho-operations-mart-operations-di-5.0.0-dist.zip
- BA Operations Mart sample reports: pentaho-operations-mart-operations-bi-5.0.0-dist.zip
- Start the Pentaho Server.
Step 7: Initialize the DI Operations Mart
- Launch the PDI client (Spoon).
- Connect to the Pentaho Repository via the Pentaho Server.
- At the Main Menu, select File > Open.
- Select Browse Files > Public >Pentaho Operations Mart > DI Ops Mart ETL > Fill_in_DIM_DATE_and_DIM_TIME job file and run it.
- At the Main Menu, select File > Open.
- Select Public > Pentaho Operations Mart > DI Ops Mart ETL > Update_Dimensions_then_Logging_Datamart job file and run it.
Step 8: Verify the DI Operations Mart is Working
- From the Pentaho User Console, select Browse Files > Public > Pentaho Operations Mart > DI Audie Reports > Last_Run and open it.
- You should see the Jobs and Transformations that were run in Step 6.
Give Users Access to the PDI Operations Mart
By default, only users who have the Admin role can access the Pentaho Operations Mart. The Admin role has access to all capabilities within all Pentaho products, including the Pentaho Operations Mart. If you want to allow users to view and run the Pentaho Operations Mart only, you can assign them the Pentaho Operations role. For example, a user who has been assigned the Pentaho Operations user role is able to open and view a report within the PDI Operations Mart, but does not have the ability to delete it.
To give users access to view the PDI Operations Mart, assign the Pentaho Operations role to those users as follows:
- From within the Pentaho User Console, select the Administration tab.
- From the left panel, select Security > Users/Roles.
- Select the Roles tab.
- Add the new role called Pentaho Operations by following the instructions in Adding Roles.
- Assign the appropriate users to the new role, as described in Adding Users to Roles.
- Advise these users to log in to the Pentaho User Console, create a Pentaho Analyzer or Pentaho Interactive Report, and ensure that they can view the Pentaho Operations Mart in the Select a Data Source dialog.
Charts, Reports, and Dashboards Using PDI Operations Mart Data
Once you have created and populated your Data Integration Operations Mart with log data, the features of the User Console enable you to examine this data and create reports, charts, and dashboards. We provide many pre-built reports, charts, and dashboards that you can modify.
To help understand the contents of the log, see DI Operations Mart Reference.
Clean Up Operations Mart Tables
Cleaning the PDI Operation Mart consists of running either a job or transformation that deletes data older than a specified maximum age. The transformation and job for cleaning up the PDI Operations Mart can be found in the etl
folder.
Perform the following steps to clean up the PDI Operations Mart:
- Using the PDI Client (Spoon), open either
Clean_up_PDI_Operations_Mart.kjb
for jobs or theClean_up_PDI_Operations_Mart_fact_table.ktr
for transformations. - Set the following parameters:
max.age.days
(required)—the maximum age in days of the data.schema.prefix
(optional)—for PostgreSQL databases, enter the schema name followed by a period (.), this will be applied to the SQL statements. For other databases, leave the value blank.
- Run the job or transformation. This will delete Job and transformation data older than the maximum age from the data mart.
To schedule regular clean up of the PDI Operations Mart, see Schedule Perspective in the PDI Client.