Data Integration Operations Mart
The PDI Operations Mart is a centralized data mart that stores job or transformation log data for auditing, reporting, and analysis. The PDI Operations Mart enables you to collect and query Data Integration log data and then use the Pentaho Server tools to examine the log data in reports, charts, and dashboards. The data mart is a collection of tables organized as a data warehouse using a star schema. Together, the dimension tables and a fact table represent the logging data. These tables must be created in the PDI Operations Mart database. Pentaho provides SQL scripts to create these tables for the PostgreSQL database. A Data Integration job populates the time and date dimensions.
Note: For optimal performance, be sure to clean the operations mart periodically.
For DI Ops Mart installation instructions, see Install DI Ops Mart.
Give Users Access to the PDI Operations Mart
By default, only users who have the Admin role can access the Pentaho Operations Mart. The Admin role has access to all capabilities within all Pentaho products, including the Pentaho Operations Mart. If you want to allow users to view and run the Pentaho Operations Mart only, you can assign them the Pentaho Operations role. For example, a user who has been assigned the Pentaho Operations user role is able to open and view a report within the PDI Operations Mart, but does not have the ability to delete it.
To give users access to view the PDI Operations Mart, assign the Pentaho Operations role to those users as follows:
- From within the Pentaho User Console, select the Administration tab.
- From the left panel, select Security > Users/Roles.
- Select the Roles tab.
- Add the new role called Pentaho Operations by following the instructions in Adding Roles.
- Assign the appropriate users to the new role, as described in Adding Users to Roles.
- Advise these users to log in to the Pentaho User Console, create a Pentaho Analyzer or Pentaho Interactive Report, and ensure that they can view the Pentaho Operations Mart in the Select a Data Source dialog.
Charts, Reports, and Dashboards Using PDI Operations Mart Data
Once you have created and populated your Data Integration Operations Mart with log data, the features of the User Console enable you to examine this data and create reports, charts, and dashboards. We provide many pre-built reports, charts, and dashboards that you can modify.
Logging Tables Status for the PDI Operations Mart
The following tables are contained in the PDI Operations Mart:
Transformation Log Tables
The status values and their descriptions are in the following table:
Status Display | Description |
---|---|
start | Indicates the transformation was started and retains this status until the transformation ends when no logging interval is set. |
end | Transformation ended successfully. |
stop | Indicates the user stopped the transformation. |
error | Indicates an error occurred when attempting to run the transformation. |
running | A transformation is only in this status directly after starting and does not appear without a logging interval. |
paused | Indicates the user paused the transformation and does not appear without a logging interval. |
Jobs Log Tables
The status values and their descriptions are in the following table:
Status Display | Description |
---|---|
start | Indicates the job was started and retains this status until the job ends, and when no logging interval is set. |
end | Job ended successfully. |
stop | Indicates the user stopped the job. |
error | Indicates an error occurred when attempting to run the job. |
running | A job is only in this status directly after starting and does not appear without a logging interval. |
paused | Indicates the user paused the job and it does not appear without a logging interval. |
Logging Dimensions and Metrics for the Data Integration Operation Mart
These tables identify the various dimensions and metrics that can be used to create new ETL log charts and reports:
Fact Table
(fact_execution)
Field Name | Description |
---|---|
execution_date_tk | A technical key (TK) linking the fact to the date when the transformation/job was executed. |
execution_time_tk | A technical key (TK) linking the fact to the time-of-day when the transformation/job was executed. |
batch_tk | A technical key (TK) linking the fact to batch information for the transformation/job. |
execution_tk | A technical key (TK) linking the fact to execution information about the transformation/job. |
executor_tk | A technical key (TK) linking the fact to information about the executor (transformation or job). |
parent_executor_tk | A technical key (TK) linking the fact to information about the parent transformation/job). |
root_executor_tk | A technical key (TK) linking the fact to information about the root transformation/job. |
execution_timestamp | The date and time when the transformation/job was executed. |
duration | The length of time (in seconds) between when the transformation was logged (LOGDATE) and the maximum dependency date (DEPDATE). |
rows_input | The number of lines read from disk or the network by the specified step. Can be input from files, databases, etc. |
rows_output | The number of rows output during the execution of the transformation/job. |
rows_read | The number of rows read in from the input stream of the specified step. |
rows_written | The number of rows written during the execution of the transformation/job. |
rows_rejected | The number of rows rejected during the execution of the transformation/job. |
errors | The number of errors that occurred during the execution of the transformation/job. |
Batch Dimension
(dim_batch)
Field Name | Description |
---|---|
batch_tk | A technical key (TK) for linking facts to batch information. |
batch_id | The ID number for the batch. |
logchannel_id | A string representing the identifier for the logging channel used by the batch. |
parent_logchannel_id | A string representing the identifier for the parent logging channel used by the batch. |
Date Dimension
(dim_date)
Field Name | Description |
---|---|
date_tk | A technical key (TK) for linking facts to date information. |
date_field | A Date object representing a particular day (year, month, day). |
ymd | A string representing the date value in year-month-day format. |
ym | A string representing the date value in year-month format. |
year | An integer representing the year value. |
quarter | An integer representing the number of the quarter (1-4) to which this date belongs. |
quarter_code | A 2-character string representing the quarter (Q1-Q4) to which this date belongs. |
month | An integer representing the number of the month (1-12) to which this date belongs. |
month_desc | A string representing the month (“January”..”December”) to which this date belongs. |
month_code | A string representing the shortened month code (“JAN”..”DEC”) to which this date belongs. |
day | An integer representing the day of the month (1-31) to which this date belongs. |
day_of_year | An integer representing the day of the year (1-366) to which this date belongs. |
day_of_week | An integer representing the day of the week (1-7) to which this date belongs. |
day_of_week_desc | A string representing the day of the week (“Sunday”..”Saturday”) to which this date belongs. |
day_of_week_code | A string representing the shortened day-of-week code (“SUN”..”SAT”) to which this date belongs. |
week | An integer representing the week of the year (1-53) to which this date belongs. |
Execution Dimension
(dim_execution)
Field Name | Description |
---|---|
execution_tk | A technical key (TK) for linking facts to execution information. |
execution_id | A unique string identifier for the execution. |
server_name | The name of the server associated with the execution. |
server_host | The name of the server associated with the execution. |
executing_user | The name of the user who initiated the execution. |
execution_status | The status of the execution (start, stop, end, error). |
Executor Dimension
The following table contains information about an executor that is a job or transformation (dim_executor):
Field Name | Description |
---|---|
executor_tk | A technical key (TK) for linking facts to executor information. |
version | An integer corresponding to the version of the executor. |
date_from | A date representing the minimum date for which the executor is valid. |
date_to | A date representing the maximum date for which the executor is valid. |
executor_id | A string identifier for the executor. |
executor_source | The source location (either file- or repository-relative) for the executor. |
* executor_environment | File server, repository name, related to the executor_source. *Reserved for future use. |
executor_type | The executor type (“job” or “transformation”). |
executor_name | The name of the executor (transformation name, e.g.). |
executor_desc | A string description of the executor (job description, e.g.). |
executor_revision | A string representing the revision of the executor (“1.3”, e.g.). |
executor_version_label | A string representing a description of the revision (i.e. change comments). |
exec_enabled_table_logging | Whether table logging is enabled for this executor. Values are “Y” if enabled, “N” otherwise. |
exec_enabled_detailed_logging | Whether detailed (step or job entry) logging is enabled for this executor. Values are “Y” if enabled, “N” otherwise. |
exec_enabled_perf_logging | Whether performance logging is enabled for this executor. Values are “Y” if enabled, “N” otherwise. |
exec_enabled_history_logging | Whether historical logging is enabled for this executor. Values are “Y” if enabled, “N” otherwise. |
last_updated_date | The date the executor was last updated. |
last_updated_user | The name of the user who last updated the executor. |
Log Table
The log table (dim_log_table) contains data used by the Pentaho Operations Mart.
Do not modify the Log table. The following table is for reference only:
Field Name | Description |
---|---|
log_table_tk | A technical key (TK) for linking. |
object_type | The type of PDI object being logged (“job”, “transformation”, “step”, e.g.). |
table_connection_name | The name of the database connection corresponding to the location of the transformation/job logging table. |
table_name | The name of the table containing the transformation/job logging information. |
schema_name | The name of the database schema corresponding to the location of the transformation/job logging table. |
step_entry_table_conn_name | The name of the database connection corresponding to the location of the step/entry logging table. |
step_entry_table_name | The name of the table containing the step/entry logging information. |
step_entry_schema_name | The name of the database schema corresponding to the location of the step/entry logging table. |
perf_table_conn_name | The name of the database connection corresponding to the location of the performance logging table. |
perf_table_name | The name of the table containing the performance logging information. |
perf_schema_name | The name of the database schema corresponding to the location of the performance logging table. |
Time-Of-Day-Dimension
This dimension contains entries for every second of a day from midnight to midnight (dim_time).
Field Name | Description |
---|---|
time_tk | A technical key (TK) for linking facts to time-of-day information. |
hms | A string representing the time of day as hours-minutes-seconds ("00:01:35", e.g.). |
hm | A string representing the time of day as hours-minutes ("23:59", e.g.). |
ampm | A string representing whether the time-of-day is AM or PM. Values are “am” or “pm”. |
hour | The integer number corresponding to the hour of the day (0-23). |
hour12 | The integer number corresponding to the hour of the day with respect to AM/PM (1-12). |
minute | The integer number corresponding to the minute of the hour (0-59). |
second | The integer number corresponding to the second of the minute (0-59). |
Step Fact Table
This fact table contains facts about individual step executions (fact_step_execution).
Field Name | Description |
---|---|
execution_date_tk | A technical key (TK) linking the fact to the date when the step was executed. |
execution_time_tk | A technical key (TK) linking the fact to the time-of-day when the step was executed. |
batch_tk | A technical key (TK) linking the fact to batch information for the step. |
executor_tk | A technical key (TK) linking the fact to information about the executor (transformation). |
parent_executor_tk | A technical key (TK) linking the fact to information about the parent transformation. |
root_executor_tk | A technical key (TK) linking the fact to information about the root transformation/job. |
execution_timestamp | The date and time when the step was executed. |
step_tk | A technical key (TK) linking the fact to information about the step. |
step_copy | The step copy number. This is zero if there is only one copy of the step, or (0 to N-1) if N copies of the step are executed. |
rows_input | The number of lines read from disk or the network by the step. Can be input from files, databases, etc. |
rows_output | The number of lines written to disk or the network by the step. Can be output to files, databases, etc. |
rows_read | The number of rows read in from the input stream of the step. |
rows_written | The number of rows written to the output stream of the step. |
rows_rejected | The number of rows rejected during the execution of the step. |
errors | The number of errors that occurred during the execution of the step. |
Step Dimension
This dimension contains information about individual steps and job entries (dim_step) .
Field Name | Description |
---|---|
step_tk | A technical key (TK) for linking facts to step/entry information |
step_id | The string name of the step/entry |
* original_step_name | The name of the step/entry template used to create this step/entry (“Table Input”, e.g.). *Reserved for future use. |
Job Entry Fact Table
This fact table contains facts about individual job entry executions (fact_jobentry_execution).
Field Name | Description |
---|---|
execution_date_tk | A technical key (TK) linking the fact to the date when the job entry was executed. |
execution_time_tk | A technical key (TK) linking the fact to the time-of-day when the job entry was executed. |
batch_tk | A technical key (TK) linking the fact to batch information for the job entry. |
executor_tk | A technical key (TK) linking the fact to information about the executor (transformation or job). |
parent_executor_tk | A technical key (TK) linking the fact to information about the parent transformation/job. |
root_executor_tk | A technical key (TK) linking the fact to information about the root transformation/job. |
step_tk | A technical key (TK) linking the fact to information about the job entry. |
execution_timestamp | The date and time when the job entry was executed. |
rows_input | The number of lines read from disk or the network by the job entry. Can be input from files, databases, etc. |
rows_output | The number of lines written to disk or the network by the job entry. Can be output to files, databases, etc. |
rows_read | The number of rows read in from the input stream of the job entry. |
rows_written | The number of rows written to the output stream of the job entry. |
rows_rejected | The number of rows rejected during the execution of the job entry. |
errors | The number of errors that occurred during the execution of the job entry. |
result | Whether the job entry finished successfully or not. Values are “Y” (successful) or “N” (otherwise). |
nr_result_rows | The number of result rows after execution. |
nr_result_files | The number of result files after execution. |
Execution Performance Fact Table
This fact table contains facts about the performance of steps in transformation executions (fact_perf_execution).
Field Name | Description |
---|---|
execution_date_tk | A technical key (TK) linking the fact to the date when the transformation was executed. |
execution_time_tk | A technical key (TK) linking the fact to the time-of-day when the transformation was executed. |
batch_tk | A technical key (TK) linking the fact to batch information for the transformation. |
executor_tk | A technical key (TK) linking the fact to information about the executor (transformation). |
parent_executor_tk | A technical key (TK) linking the fact to information about the parent transformation/job. |
root_executor_tk | A technical key (TK) linking the fact to information about the root transformation/job. |
step_tk | A technical key (TK) linking the fact to information about the transformation/job. |
seq_nr | The sequence number. This is an identifier differentiating performance snapshots for a single execution. |
step_copy | The step copy number. This is zero if there is only one copy of the step, or (0 to N-1) if N copies of the step are executed. |
execution_timestamp | The date and time when the transformation was executed. |
rows_input | The number of rows read from the input file, database, or network during the interval. |
rows_output | The number of rows written to output file, database, or network during the interval. |
rows_read | The number of rows read from previous steps during the interval. |
rows_written | The number of rows written to following steps during the interval. |
rows_rejected | The number of rows rejected by the steps error handling during the interval. |
errors | The number of errors that occurred during the execution of the transformation/job. |
input_buffer_rows | The size of the step’s input buffer in rows at the time of the snapshot. |
output_buffer_rows | The size of the output buffer in rows at the time of the snapshot. |
Clean Up Operations Mart Tables
Cleaning the PDI Operation Mart consists of running either a job or transformation that deletes data older than a specified maximum age. The transformation and job for cleaning up the PDI Operations Mart can be found in the etl
folder.
Perform the following steps to clean up the PDI Operations Mart:
- Using the PDI Client (Spoon), open either
Clean_up_PDI_Operations_Mart.kjb
for jobs or theClean_up_PDI_Operations_Mart_fact_table.ktr
for transformations. - Set the following parameters:
max.age.days
(required)—the maximum age in days of the data.schema.prefix
(optional)—for PostgreSQL databases, enter the schema name followed by a period (.), this will be applied to the SQL statements. For other databases, leave the value blank.
- Run the job or transformation. This will delete Job and transformation data older than the maximum age from the data mart.
To schedule regular clean up of the PDI Operations Mart, see Schedule Perspective in the PDI Client.