Skip to main content

Pentaho+ documentation is moving!

The new product documentation portal is here. Check it out now at docs.hitachivantara.com

 

Hitachi Vantara Lumada and Pentaho Documentation

Jackrabbit repository perfomance tuning

Parent article

These topics will help you to get the best performance out of your Jackrabbit repository, using background information on known Jackrabbit performance issues as well as step-by-step instructions on addressing them in the Pentaho software.

Jackrabbit hangs on unused data

Jackrabbit Repository (JCR) often retains to a lot of unused data if you perform migrations from the same repository multiple times. This leads to an increase in table sizes and slowdowns on the repository.

You can clean up this unused data in the JCR by enabling a system listener designed for this purpose. Cleaning up the JCR can only be done with no users logged into it, and the repository remains locked while the process is running.

Procedure

  1. Stop the Pentaho Server.

  2. Locate the pentaho-server/pentaho-solutions/system directory and open the systemListeners.xml with any text editor.

  3. Add this bean as the last item within the list tags.

    <bean id="repositoryCleanerSystemListener" class="org.pentaho.platform.plugin.services.repository.RepositoryCleanerSystemListener">
    <property name="gcEnabled" value="true"/>
    <property name="execute" value="now"/>
    </bean>
  4. Save and close the systemListeners.xml file and restart the Pentaho Server.

Next steps

You can customize the settings for the repositoryCleanerSystemListener by editing these properties. We recommend cleaning up the Jackrabbit repository on a regular schedule.
PropertyDescription
gcEnabledThis is a Boolean flag that turns the listener On (true) or Off (false).
executeYou can choose to run the listener:
  • now

    runs once during server start-up

  • weekly

    runs on the first day of each week (Sunday)

  • monthly

    runs on the first day of each month

Jackrabbit runs slowly with too many home directories

Before Pentaho 6.1, the Jackrabbit repository ran slowly when there are too many home directories. Jackrabbit scanned each and every home directory on the first login after a server restart, calling UserDetailService for each home directory owner.

A flag has been added to skip user verification on principal creation by default. It retrieves user details from the user cache only, which speeds up repository loading.

You may need to restore the old behavior if your authorization system is expecting the Pentaho Server to load all of the user information on startup. Restore the old behavior by changing the skipUserVerificationOnPrincipalCreation to false. This allows user verification to operate in the same way it did before 6.1.

Procedure

  1. Navigate to the pentaho-solutions/system/jackrabbit directory.

  2. Open the security.properties file with any text editor.

  3. Locate the skipUserVerificationOnPrincipalCreation property and set the value as needed.

  4. Save and close the file.

Next steps

If you discover that you need to re-enable the old mode of verification, then it is likely an issue exists with your authentication system. We recommend contacting Pentaho Support if you need help.

Jackrabbit Lucene SearchIndex slows server performance

The purpose of the Jackrabbit SearchIndex tag is to index property values and node names when data is saved or whenever a data transaction is performed. With the Pentaho Platform, Jackrabbit’s Lucene tries to index all of the text from every file in the repository. The SearchIndex tag has been disabled for Pentaho 6.1 and higher to improve overall repository performance.

When you upgrade to Pentaho 6.1 or higher and bring your previous repository.xml forward to the new version, your server will start and function as it did in your previous version of Pentaho. This Jackrabbit Lucene indexing can cause degradation in repository performance.

If you are bringing forward your repository.xml, you will need to disable the SearchIndex tag within Jackrabbit. Depending on whether you have a custom-configured repository or a default repository, follow one of these procedures for disabling the SearchIndex tag.

If you have a custom-configured repository XML file

If you have a custom-configured repository.xml file, follow these steps to disable the SearchIndex tag:

Procedure

  1. Navigate to the pentaho-solutions/system/jackrabbit directory.

  2. Open the repository.xml file with any text editor.

  3. Search for the SearchIndex tag.

  4. You should find it within two tags: the <Workspace> tag and the <Repository> tag.

  5. In the <Repository> tag only, delete or comment out the SearchIndex tag. Make sure that you don't change the SearchIndex tag within the <Workspace> tag.

  6. Save and close the repository.xml file.

Next steps

Whenever you make any changes to the Jackrabbit repository.xml file, you need to delete the pentaho-solutions/system/jackrabbit/repository folder and restart your Pentaho Server. The folder will be recreated with your new repository.xml settings upon server restart.

If you are using the default repository XML file

If you have a default repository.xml file, follow these steps to disable the SearchIndex tag:

Procedure

  1. Navigate to the pentaho-solutions/system/jackrabbit directory.

  2. Open the repository.xml file with any text editor.

  3. Search for the following instance of the SearchIndex tag:

    <SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
        <param name="path" value="${rep.home}/repository/index"/>
        <param name="supportHighlighting" value="true"/>
    </SearchIndex>
  4. Delete or comment out that SearchIndex tag.

  5. Save and close the repository.xml file.

Next steps

Whenever you make any changes to the Jackrabbit repository.xml file, you need to delete the pentaho-solutions/system/jackrabbit/repository folder and restart your Pentaho Server. The folder will be recreated with your new repository.xml settings upon server restart.