Pentaho Reporting Performance Tips
- Last updated
- Save as PDF
Pentaho Reporting's default configuration makes certain assumptions about system resources and the size, features, and details of reports that may not meet your specific requirements. If you have large inline subreports, or many parameters, you can run into performance bottlenecks. Fortunately, many performance problems can be mitigated through specific engine and report options. Refer to the sections below that apply to your scenario.
Cache Report Content
You can cache the result sets of parameterized reports so that every time you change a parameter during your user session (all caching is on a per-session basis) you do not have to retrieve a new result set. By default, Pentaho Reporting has result set caching turned on, but you may find some advantage in turning it off or changing the cache thresholds and settings.
When you publish a report to the Pentaho Server, you switch cache and engine configurations from the local Report Designer versions of
classic-engine.properties to the server's version inside the Pentaho WAR. These configurations may not be the same, so if you have made changes to the result set cache settings locally, you may want to port those changes over to the Pentaho Server as well.
Result Set Caching
When rendered, a parameterized report must account for every dataset required for every parameter. Every time a parameter field changes, every dataset is recalculated, which can negatively impact performance.
You can avoid gratuitous dataset recalculations by caching parameter datasets. This is accomplished through the EHcache framework built into the Pentaho Server. You can configure specific settings for published reports by editing the
ehcache.xml file in the /WEB-INF/classes/ folder inside of the pentaho.war. The relevant element is:
Anything containing complex objects is not cached (CLOB and BLOB data types). These are results coming from a scripting dataset, a Java method call, a table data source, an external data source (computed in an action sequence), or a CDA data source. In all cases, either no point in caching exists because it would be more expensive than recalculating, or because not enough hints are available in the involved parameters.
<!-- Defines a cache used by the reporting engine to hold small datasets. This cache can be configured to have a separate instance for each logged in user via the global report configuration. This per-user cache is required if role or other security and filter information is used in ways invisible for the reporting engine. --> <cache name="report-dataset-cache" maxElementsInMemory="50" eternal="false" overflowToDisk="false" timeToIdleSeconds="300" timeToLiveSeconds="600" diskPersistent="false" diskExpiryThreadIntervalSeconds="120" />
However, if a cache exists for too long, it may not reflect in the report output because it's still using old data. So there is a balance between performance and accuracy that you must tailor to your needs.
Result Set Cache Options
classic-engine.properties options control result set caching in parameterized reports.
||Number of rows in the dataset that will be cached; the higher the number, the larger the cache and the more disk space is used while the cache is active.||Integer; default value is 1000.|
Streamline Printed Output
Pentaho Reporting's overall performance is chiefly affected by the amount of printed content that it has to generate. The more content you generate, the more time the Reporting engine will take to perform all layout computations.
Large inline subreports are notorious for poor performance. This is because the output layout of an inline subreport is always stored in memory. The master report's layouting pauses until the subreport is fully generated, then it's inserted into the master report's layout model and subsequently printed. Memory consumption for this layouting model is high because the full layout model is kept in memory until the report is finished. If there is a large amount of content in the subreport, you will run into "out of memory" exceptions.
An inline subreport that consumes the full width of the root-level band should be converted into a banded subreport. Banded subreports are layouted and all output is generated while the subreport is processed. The memory footprint for that is small because only the active band or the active page has to be held in memory.
When images are embedded from remote servers (HTTP/FTP sources), you must ensure that the server produces a LastModifiedDate header. The Reporting engine uses that header as part of its caching system, and if it is missing, the remote images will not be cached, forcing the engine to retrieve them every time they're needed.
Caching must be configured properly via a valid ehcache configuration file, which is stored in the Pentaho Web app in the /WEB-INF/classes/ directory. If caching is disabled or misconfigured, then there will be performance problems when loading reports and resources.
A pageable report generates a stream of pages. Each page has the same height, even if the page is not fully filled with content. When a page is filled, the layouted page will be passed over to the output target to render it in either a Graphics2D or a streaming output (PDF, Plaintext, HTML, etc.) context.
Page Break Methods
When the content contains a manual page break, the page will be considered full. If the pagebreak is a before-print break, then the break will be converted to an after-break, the internal report states will be rolled back, and the report processing restarts to regenerate the layout with the new constraints. A similar rollback happens if the current band does not fit on the page. Because of this, you would generally prefer break-before over break-after.
So for large reports, you might consider removing manual page breaks and limiting the width of bands.
When processing a pageable report, the reporting engine assumes that the report will be run in interactive mode, which allows for parameterization control. To make browsing through the pages faster, a number of page states will be stored to allow report end-users to restart output processing at the point in the report where they adjust the parameters.
Reports that are run to fully export all pages usually do not need to store those page states. A series of Report engine settings controls the number and frequency of the page states stored:
The Reporting engine uses three lists to store page states. The default configuration looks as follows:
- The first 20 states (Pages 1 to 20) are stored in the primary pool. All states are stored with strong references and will not be garbage collected.
- The next 400 states (pages 21 to 421) are stored into the secondary pool. Of those, every fourth state is stored with a strong reference and cannot be garbage collected as long as the report processor is open.
- All subsequent states (pages > 421) are stored in the tertiary pool and every tenth state is stored as strong reference.
So for a 2000-page report, a total of about 270 states will be stored with strong references.
In server mode, the settings could be cut down to:
org.pentaho.reporting.engine.classic.core.performance.pagestates.PrimaryPoolSize=1 org.pentaho.reporting.engine.classic.core.performance.pagestates. SecondaryPoolFrequency=1 org.pentaho.reporting.engine.classic.core.performance.pagestates.SecondaryPoolSize=1 org.pentaho.reporting.engine.classic.core.performance.pagestates. TertiaryPoolFrequency=100
This reduces the number of states stored for a 2000-page report to 22, thus cutting the memory consumption for the page states to a 1/10th.
In the current version full exports do not generate page states and thus these settings have no effect on such exports. They still affect the interactive mode.
A table export produces tabular output from a fully-layouted display model. A table export cannot handle overlapping elements and therefore has to remove them.
To support layout debugging, the Reporting engine stores a lot of extra information in the layout model. This increases memory consumption but makes it easier to develop Reporting solutions. These Reporting engine debug settings should never be enabled in production environments:
These settings are false by default. Report Designer comes with its own method to detect overlapping elements and does not rely on these settings.
In HTML exports, there are a few Reporting engine settings that can affect export performance. The first is CopyExternalImages:
This controls whether images from HTTP/HTTPS or FTP sources are linked from their original source or copied (and possibly re-encoded) into the output directory. The default is true; this ensures that reports always have the same image. Set to false if the image is dynamically generated, in which case you'd want to display the most recent view.
The Style and ForceBufferedWriting settings control how stylesheets are produced and whether the generated HTML output will be held in a buffer until the report processing is finished:
Style information can be stored inline, or in the <head> element of the generated HTML file:
Or in an external CSS file:
ForceBufferedWriting should be set to true if a report uses an external CSS file. Browsers request all resources they find in the HTML stream, so if a browser requests a style sheet that has not yet been fully generated, the report cannot display correctly. It is safe to disable buffering if the styles are inline because the browser will not need to fetch an external style sheet in that case. Buffered content will appear slower to the user than non-buffered content because browsers render partial HTML pages while data is still being received from the server. Buffering will delay that rendering until the report is fully processed on the server.
Pentaho Reporting Configuration Files
The following files contain various configuration options for Pentaho Reporting. The options are not particularly self-explanatory and their value limits are not obvious; therefore, you shouldn't change any options in these files unless you are following guidelines from Pentaho documentation or are assisted by Pentaho Support or a consulting representative.
Contains options for the Report Designer client tool. It does not change any report options.
Contains global report rendering options for reports generated locally from Report Designer. Some of these options can be overridden in individual reports.
Contains global report rendering options for published reports that are generated on the Pentaho Server. Some of these options can be overridden in individual reports.