Pentaho Server performance tips
The Pentaho Server ships in a condition designed to work well for the majority of customers. However, deployments which are either very large or very small will need to adjust certain settings, and possibly even remove certain unused functionality, in order to achieve the desired performance goals without adding hardware.
Read through the sections below and decide which ones apply to your situation.
Set timeouts
To help you maintain the health of your Pentaho system, we provide tips to help you diagnose processing errors and monitor the Pentaho Server performance.
Disable server and session-related timeouts to debug
Procedure
Open the server.xml file located under: pentaho-server/tomcat/conf
Find the connectionTimeout="20000" parameter and change its value to: 0 (zero)
If this value is set to a negative number it will never timeout.Open the web.xml file, located under: pentaho-server/tomcat/webapps/pentaho/WEB-INF/web.xml
Find the session-timeout parameter and change its value to: -1 (negative one)
Save the file and refresh the User Console.
Define result row limit and timeout
The values you define for the row number limit (max-rows) and timeout properties are passed to the JDBC driver.
To define max rows and timeout:
Procedure
In the Metadata Editor, expand the Business Model node and select Orders.
Right-click Orders and choose Edit.
The Business Model Properties page displays a list of properties that were previously defined.In the Business Model Properties page, click the Add icon.
The Add New Property page dialog box appears.Enable Add a custom property.
In the ID text box, type: max_rows
ImportantThe ID is case-sensitive and must be typed exactly as shown.Click the down-arrow in the Type field and choose: Numeric
The Business Model Properties page appears. The max_rows property is listed under Custom in the navigation tree.In the right pane, under Custom, enter a value for your max_rows property.
If you enter 3000 as your value, the number of rows allowed to display in a query result is constrained to 3,000.Repeat steps 3 through 6 to for the timeout custom property.
In the right pane, under Custom, enter a value for your timeout property.
The timeout property requires a numeric value defined in number of seconds. For example, if you enter: 3600 the limit for query results is one minute.Click OK in the Business Model Properties page to save your newly created properties.
Move Pentaho managed data sources to JNDI
Pentaho provides a Data Source Wizard in the Pentaho User Console that enables business users to develop rapid prototype data sources for interactive reporting and analysis. This is a great way to get off the ground quickly, but they are "quick and dirty" and not performant. For maximum performance, you should establish your own JNDI data connections at the Web application server level, and tune them for your database.
JNDI data sources can be configured for Pentaho client tools by adding connection details to the ~/.pentaho/simple-jndi/default.properties file on Linux, or the %userprofile%\.pentaho\simple-jndi\default.properties file on Windows.
Manual cleanup of the Temporary folder
Memory optimization for the Geo Service plugin
If you do not use Analyzer or use the Geo Service in Analyzer, you can free up approximately 200MB of RAM by removing the Geo Service plugin. Shut down the Pentaho Server and delete the /pentaho/server/pentaho-server/pentaho-solutions/system/pentaho-geo/ directory.
If you frequently use the Geo Service, update the cache
setting for pentaho-geo roles
in the ehcache.xml file.
This file can be found in the
/pentaho/server/pentaho-server/tomcat/webapps/pentaho/WEB-INF/classes
directory. For example, if you add a large number of municipalities, you may want to
increase the number of municipalities cached in memory or configure
overflowToDisk
for the pentaho-geo-municipality
role.
Turn off audit logging
Procedure
Open the /pentaho-solutions/system/pentahoObjects-spring.xml file with a text editor.
Locate the following line:
<bean id="IAuditEntry" class="org.pentaho.platform.engine.services.audit.AuditSQLEntry" scope="singleton" />
Replace that line with the following one:
<bean id="IAuditEntry" class="org.pentaho.platform.engine.core.audit.NullAuditEntry" scope="singleton" />
Save and close the file
Using a database management tool or command line interface, connect to the Pentaho Hibernate database.
Truncate (but do not drop) the following table:
- PRO_AUDIT
Exit your database utility and restart the Pentaho Server.
Test Pentaho Server scalability
Improper scalability testing can give you the wrong idea about changes you have made to your Pentaho Server instance. Before testing, ensure that you are reusing sessions, instead of creating successive new sessions. Creating multiple unnecessary sessions causes the Pentaho Server to run out of memory unless the session timeout in web.xml is set extremely low (1 per minute, for instance). The default is 30 minutes.
Logging into the Pentaho Server is resource-intensive. You must authenticate the user, create a bunch of session data, and run all startup action sequences, which usually store data in the user's session. So, if during testing, you simply string together a bunch of URLs and ignore the established session, you will create a series of 30-minute sessions and almost certainly run out of memory.
The correct way to test the server is to mimic a user's actions from a browser.
Sessions and URLs
Procedure
Log into the server.
Execute a URL that contains the userid and password parameters.
&userid=administrator&password=password
Using the same session, submit other URLs without the userid/password.
Next steps
Use this process for as many users as you need to test with.
To log out of a session, you can use the http://localhost:8080/pentaho/Logout URL. This will invalidate the session if you append the userid and password values of the logged-in user. Without passing those parameters (or, alternatively, specifying the session ID or cookie) on the Logout URL, you will create another new session instead of closing an old one.
This means that two back-to-back wget commands in Linux will create two different HTTP sessions on the server unless one of the following conditions is met:
- Session 1: -cookies=on is specified for both wget commands
- Session 2: -save-cookies is used on the first wget command to save the cookies to a file, and -load-cookies is used on the second wget command to load the session state
Memory and sessions
You can see the Java virtual machine memory settings by examining the options passed to the Tomcat or JBoss start scripts, or by looking at the CATALINA_OPTS system variable, if there is one. The Xms and Xmx options define the minimum and maximum amount of memory assigned to the application server. The default settings are not particularly high, and even if you have adjusted them, take note of the number of sessions it takes to use up all of the memory. Also take note of the fact that closing sessions after an out of memory error will return the memory to the available pool, proving that there are no memory leaks or zombie sessions inherent in the Pentaho platform.