Pentaho Reporting performance tips
Pentaho Reporting's default configuration makes certain assumptions about system resources and the size, features, and details of reports that may not meet your specific requirements. If you have large inline subreports, or many parameters, you can run into performance bottlenecks. Fortunately, many performance problems can be mitigated through specific engine and report options. Refer to the sections below that apply to your scenario.
Cache report content
You can cache the result sets of parameterized reports so that every time you change a parameter during your user session (all caching is on a per-session basis) you do not have to retrieve a new result set. By default, Pentaho Reporting has result set caching turned on, but you may find some advantage in turning it off or changing the cache thresholds and settings.
Result set caching
You can avoid gratuitous dataset recalculations by caching parameter datasets. This is accomplished through the EHcache framework built into the Pentaho Server. You can configure specific settings for published reports by editing the ehcache.xml file in the /WEB-INF/classes/ folder inside of the pentaho.war. The relevant element is:
Anything containing complex objects is not cached (CLOB and BLOB data types). These are results coming from a scripting dataset, a Java method call, a table data source, an external data source (computed in an action sequence), or a CDA data source. In all cases, either no point in caching exists because it would be more expensive than recalculating, or because not enough hints are available in the involved parameters.
<!-- Defines a cache used by the reporting engine to hold small datasets. This cache can be configured to have a separate instance for each logged in user via the global report configuration. This per-user cache is required if role or other security and filter information is used in ways invisible for the reporting engine. --> <cache name="report-dataset-cache" maxElementsInMemory="50" eternal="false" overflowToDisk="false" timeToIdleSeconds="300" timeToLiveSeconds="600" diskPersistent="false" diskExpiryThreadIntervalSeconds="120" />However, if a cache exists for too long, it may not reflect in the report output because it is still using old data. So there is a balance between performance and accuracy that you must tailor to your needs.
Result set cache options
Option | Purpose | Possible Values |
org.pentaho.reporting.platform.plugin.cache.PentahoDataCache.CachableRowLimit | Number of rows in the dataset that will be cached; the higher the number, the larger the cache and the more disk space is used while the cache is active. | Integer; default value is: 1000 |
Streamline printed output
Pentaho Reporting's overall performance is chiefly affected by the amount of printed content that it has to generate. The more content you generate, the more time the Pentaho Reporting engine will take to perform all layout computations.
Large inline subreports are notorious for poor performance. This is because the output layout of an inline subreport is always stored in memory. The master report's layouting pauses until the subreport is fully generated, then it's inserted into the master report's layout model and subsequently printed. Memory consumption for this layouting model is high because the full layout model is kept in memory until the report is finished. If there is a large amount of content in the subreport, you will run into out of memory exceptions.
An inline subreport that consumes the full width of the root-level band should be converted into a banded subreport. Banded subreports are layouted and all output is generated while the subreport is processed. The memory footprint for that is small because only the active band or the active page has to be held in memory.
When images are embedded from remote servers (HTTP/FTP sources), you must ensure that the server produces a LastModifiedDate header. The Pentaho Reporting engine uses that header as part of its caching system, and if it is missing, the remote images will not be cached, forcing the engine to retrieve them every time they're needed.
Caching must be configured properly via a valid ehcache configuration file, which is stored in the Pentaho Web app in the /WEB-INF/classes/ directory. If caching is disabled or misconfigured, then there will be performance problems when loading reports and resources.
Paginated exports
A pageable report generates a stream of pages. Each page has the same height, even if the page is not fully filled with content. When a page is filled, the layouted page will be passed over to the output target to render it in either a Graphics2D or a streaming output (PDF, Plaintext, HTML, etc.) context.
Page break methods
So for large reports, you might consider removing manual page breaks and limiting the width of bands.
Page states
Reports that are run to fully export all pages usually do not need to store those page states. A series of Pentaho Reporting engine settings controls the number and frequency of the page states stored:
- org.pentaho.reporting.engine.classic.core.performance.pagestates.PrimaryPoolSize=20
- org.pentaho.reporting.engine.classic.core.performance.pagestates.SecondaryPoolFrequency=4
- org.pentaho.reporting.engine.classic.core.performance.pagestates.SecondaryPoolSize=100
- org.pentaho.reporting.engine.classic.core.performance.pagestates.TertiaryPoolFrequency=10
- The first 20 states (Pages 1 to 20) are stored in the primary pool. All states are stored with strong references and will not be garbage collected.
- The next 400 states (pages 21 to 421) are stored into the secondary pool. Of those, every fourth state is stored with a strong reference and cannot be garbage collected as long as the report processor is open.
- All subsequent states (pages > 421) are stored in the tertiary pool and every tenth state is stored as strong reference.
In server mode, the settings could be cut down to:
org.pentaho.reporting.engine.classic.core.performance.pagestates.PrimaryPoolSize=1 org.pentaho.reporting.engine.classic.core.performance.pagestates. SecondaryPoolFrequency=1 org.pentaho.reporting.engine.classic.core.performance.pagestates.SecondaryPoolSize=1 org.pentaho.reporting.engine.classic.core.performance.pagestates. TertiaryPoolFrequency=100This reduces the number of states stored for a 2000-page report to 22, thus cutting the memory consumption for the page states to a 1/10th.
Table exports
To support layout debugging, the Pentaho Reporting engine stores a lot of extra information in the layout model. This increases memory consumption but makes it easier to develop Reporting solutions. These Pentaho Reporting engine debug settings should never be enabled in production environments:
- org.pentaho.reporting.engine.classic.core.modules.output.table.base.ReportCellConflicts
- org.pentaho.reporting.engine.classic.core.modules.output.table.base.VerboseCellMarkers
HTML exports
org.pentaho.reporting.engine.classic.core.modules.output.table.html.CopyExternalImages=true
This controls whether images from HTTP/HTTPS or FTP sources are linked from their original source or copied (and possibly re-encoded) into the output directory. The default is true; this ensures that reports always have the same image. Set to false if the image is dynamically generated, in which case you'd want to display the most recent view.
The Style and ForceBufferedWriting settings control how stylesheets are produced and whether the generated HTML output will be held in a buffer until the report processing is finished:
org.pentaho.reporting.engine.classic.core.modules.output.table.html.ForceBufferedWriting=trueStyle information can be stored inline, or in the <head> element of the generated HTML file:
org.pentaho.reporting.engine.classic.core.modules.output.table.html.InlineStyles=trueOr in an external CSS file:
org.pentaho.reporting.engine.classic.core.modules.output.table.html.ExternalStyle=trueForceBufferedWriting should be set to true if a report uses an external CSS file. Browsers request all resources they find in the HTML stream, so if a browser requests a style sheet that has not yet been fully generated, the report cannot display correctly. It is safe to disable buffering if the styles are inline because the browser will not need to fetch an external style sheet in that case. Buffered content will appear slower to the user than non-buffered content because browsers render partial HTML pages while data is still being received from the server. Buffering will delay that rendering until the report is fully processed on the server.
Pentaho Reporting configuration files
File | Purpose |
/pentaho/design-tools/report-designer/resources/report-designer.properties | Contains options for the Report Designerclient tool. It does not change any report options |
/pentaho/design-tools/report-designer/resources/classic-engine.properties | Contains global report rendering options for reports generated locally from Report Designer. Some of these options can be overridden in individual reports. |
/tomcat/webapps/pentaho/WEB-INF/classes/classic-engine.properties | Contains global report rendering options for published reports that are generated on the Pentaho Server. Some of these options can be overridden in individual reports. |