Skip to main content

Pentaho+ documentation has moved!

The new product documentation portal is here. Check it out now at docs.hitachivantara.com

 

Hitachi Vantara Lumada and Pentaho Documentation

Pentaho Server performance tips

Parent article

The Pentaho Server ships in a condition designed to work well for the majority of customers. However, deployments which are either very large or very small will need to adjust certain settings, and possibly even remove certain unused functionality, in order to achieve the desired performance goals without adding hardware.

Read through the sections below and decide which ones apply to your situation.

Set timeouts

To help you maintain the health of your Pentaho system, we provide tips to help you diagnose processing errors and monitor the Pentaho Server performance.

Disable server and session-related timeouts to debug

Follow the instructions below to disable server and session timeouts associated with the User Console.
NoteThese instructions are applicable when you are in a test environment. Once you go live, we recommend you set your timeouts to five or ten minutes so that sensitive Pentaho Server-related data can be protected. The time must be expressed in minutes.

Procedure

  1. Open the server.xml file located under: pentaho-server/tomcat/conf

  2. Find the connectionTimeout="20000" parameter and change its value to: 0 (zero)

    If this value is set to a negative number it will never timeout.
  3. Open the web.xml file, located under: pentaho-server/tomcat/webapps/pentaho/WEB-INF/web.xml

  4. Find the session-timeout parameter and change its value to: -1 (negative one)

  5. Save the file and refresh the User Console.

Define result row limit and timeout

When a query in the User Console returns an unusually large number of rows, this may impact server performance. To limit the number of rows returned by a query and to set up a timeout, you must create two custom properties, max_rows and timeout, in the Metadata Editor.

The values you define for the row number limit (max-rows) and timeout properties are passed to the JDBC driver.

To define max rows and timeout:

Procedure

  1. In the Metadata Editor, expand the Business Model node and select Orders.

  2. Right-click Orders and choose Edit.

    The Business Model Properties page displays a list of properties that were previously defined.
  3. In the Business Model Properties page, click the Add icon.

    The Add New Property page dialog box appears.
  4. Enable Add a custom property.

  5. In the ID text box, type: max_rows

    ImportantThe ID is case-sensitive and must be typed exactly as shown.
  6. Click the down-arrow in the Type field and choose: Numeric

    The Business Model Properties page appears. The max_rows property is listed under Custom in the navigation tree.
  7. In the right pane, under Custom, enter a value for your max_rows property.

    If you enter 3000 as your value, the number of rows allowed to display in a query result is constrained to 3,000.
  8. Repeat steps 3 through 6 to for the timeout custom property.

  9. In the right pane, under Custom, enter a value for your timeout property.

    The timeout property requires a numeric value defined in number of seconds. For example, if you enter: 3600 the limit for query results is one minute.
  10. Click OK in the Business Model Properties page to save your newly created properties.

Move Pentaho managed data sources to JNDI

Most production BI environments have finely-tuned data sources for reporting and analysis. If you haven't done any data warehouse performance-tuning, you may want to consult Mondrian performance tips. for basic advice before proceeding.

Pentaho provides a Data Source Wizard in the Pentaho User Console that enables business users to develop rapid prototype data sources for interactive reporting and analysis. This is a great way to get off the ground quickly, but they are "quick and dirty" and not performant. For maximum performance, you should establish your own JNDI data connections at the Web application server level, and tune them for your database.

JNDI data sources can be configured for Pentaho client tools by adding connection details to the ~/.pentaho/simple-jndi/default.properties file on Linux, or the %userprofile%\.pentaho\simple-jndi\default.properties file on Windows.

Manual cleanup of the Temporary folder

Every time you generate content on the Pentaho Server, temporary files are created on the local file system in the /pentaho-solutions/system/tmp/ folder. In some cases, the Pentaho Server may not properly purge that temporary content, leaving behind orphaned artifacts that can slowly build up and reduce performance on the volume that contains the pentaho-solutions folder. One way to address this issue is to mount the /tmp folder on a separate volume, thereby siphoning off all disk trash asssociated with creating new content. However, you will still have to perform a manual garbage collection procedure on this folder on a regular basis. You can accomplish this via a script that runs through your system scheduler; it should be safe to delete any content files in this directory that are more than a week old.

Memory optimization for the Geo Service plugin

The Pentaho Geo Service enables Geo Map visualizations in Analyzer.

If you do not use Analyzer or use the Geo Service in Analyzer, you can free up approximately 200MB of RAM by removing the Geo Service plugin. Shut down the Pentaho Server and delete the /pentaho/server/pentaho-server/pentaho-solutions/system/pentaho-geo/ directory.

If you frequently use the Geo Service, update the cache setting for pentaho-geo roles in the ehcache.xml file. This file can be found in the /pentaho/server/pentaho-server/tomcat/webapps/pentaho/WEB-INF/classes directory. For example, if you add a large number of municipalities, you may want to increase the number of municipalities cached in memory or configure overflowToDisk for the pentaho-geo-municipality role.

Turn off audit logging

While audit logging can be useful for monitoring Pentaho Server activity and performance, the act of collecting the necessary audit data can introduce significant memory overhead with the solution database. Your Pentaho Server must be stopped before performing this procedure. Follow the instructions below to disable audit logging in the Pentaho Server.
NotePerforming this task will disable all audit functions in the Pentaho Server's administration interface.

Procedure

  1. Open the /pentaho-solutions/system/pentahoObjects-spring.xml file with a text editor.

  2. Locate the following line:

    <bean id="IAuditEntry" class="org.pentaho.platform.engine.services.audit.AuditSQLEntry" scope="singleton" />
    
  3. Replace that line with the following one:

    <bean id="IAuditEntry" class="org.pentaho.platform.engine.core.audit.NullAuditEntry" scope="singleton" />
    
  4. Save and close the file

  5. Using a database management tool or command line interface, connect to the Pentaho Hibernate database.

  6. Truncate (but do not drop) the following table:

    • PRO_AUDIT
  7. Exit your database utility and restart the Pentaho Server.

Test Pentaho Server scalability

Improper scalability testing can give you the wrong idea about changes you have made to your Pentaho Server instance. Before testing, ensure that you are reusing sessions, instead of creating successive new sessions. Creating multiple unnecessary sessions causes the Pentaho Server to run out of memory unless the session timeout in web.xml is set extremely low (1 per minute, for instance). The default is 30 minutes.

Logging into the Pentaho Server is resource-intensive. You must authenticate the user, create a bunch of session data, and run all startup action sequences, which usually store data in the user's session. So, if during testing, you simply string together a bunch of URLs and ignore the established session, you will create a series of 30-minute sessions and almost certainly run out of memory.

The correct way to test the server is to mimic a user's actions from a browser.

Sessions and URLs

Most stress test tools (Loadrunner, JMeter, etc.) have session/cookie management options to ensure that they behave like a human user. However, if you're creating your own test scripts, you should follow this process:

Procedure

  1. Log into the server.

  2. Execute a URL that contains the userid and password parameters.

    &userid=administrator&password=password
  3. Using the same session, submit other URLs without the userid/password.

Next steps

Use this process for as many users as you need to test with.

To log out of a session, you can use the http://localhost:8080/pentaho/Logout URL. This will invalidate the session if you append the userid and password values of the logged-in user. Without passing those parameters (or, alternatively, specifying the session ID or cookie) on the Logout URL, you will create another new session instead of closing an old one.

This means that two back-to-back wget commands in Linux will create two different HTTP sessions on the server unless one of the following conditions is met:

  • Session 1: -cookies=on is specified for both wget commands
  • Session 2: -save-cookies is used on the first wget command to save the cookies to a file, and -load-cookies is used on the second wget command to load the session state

Memory and sessions

Out of memory errors can happen because of what your test script is doing, not necessarily because of any weakness in the Pentaho platform. You can see just how robust the the Pentaho platform is by taking a look at a production server's natural (human user) load. The following URL will show you what each day's maximum and present number of HTTP sessions are: http://testserver.example.com/pentah...ic/UserService

You can see the Java virtual machine memory settings by examining the options passed to the Tomcat or JBoss start scripts, or by looking at the CATALINA_OPTS system variable, if there is one. The Xms and Xmx options define the minimum and maximum amount of memory assigned to the application server. The default settings are not particularly high, and even if you have adjusted them, take note of the number of sessions it takes to use up all of the memory. Also take note of the fact that closing sessions after an out of memory error will return the memory to the available pool, proving that there are no memory leaks or zombie sessions inherent in the Pentaho platform.

Use Apache HTTPd with SSL for delivering static content

To improve performance, you may want to use the Apache HTTPd Web server to handle delivery of static content and facilitation of socket connections.

Learn more