Use Pentaho Repositories in PDI
- Last updated
- Save as PDF
The PDI Client (also known as Spoon) offers several different types of file storage. A Pentaho Repository stores transformations, jobs, and schedules in a central environment through the Pentaho Server. It is recommended for enterprise deployments and fully supported features.
Get Started with Pentaho Repositories
If your team needs a collaborative ETL (Extract, Transform, and Load) environment, we recommend using one or more Pentaho Repositories. In addition to storing and managing your jobs and transformations, Pentaho Repositories provide full revision history for you to track changes, compare revisions, and revert to previous versions when necessary. These features, along with enterprise security and content locking, make using a Pentaho Repository an ideal platform for collaboration.
Create a Connection in the PDI Client
If you want to access the repository items through the PDI client, perform the following steps to create a connection to a Pentaho Repository:
- Verify the Pentaho Server is running, and start the PDI client.
- Click the Connect link in the upper right corner of the PDI client toolbar. The Pentaho Repository welcome dialog box appears.
If Connect is replaced by a different link, you are already connected to a repository.
- Click Get Started.
- Enter or update the Display Name property.
- Modify the URL associated with your repository, if necessary.
- Click Finish to test the connection of your repository. If the test fails, make sure that the port number in the URL is correct. If you installed PDI using the Pentaho Installation Wizard, the correct port should appear in the installation-summary.txt file. The file is in the root directory where you installed PDI.
- If the test is successful, you can either Connect Now, Manage Connections, or Finish to close the dialog box. If you choose to finish, you can connect to the repository later through the menu next to the Connect link in the upper right corner of the PDI client toolbar.
Connect to a Pentaho Repository
Once a repository is created, a menu appears next to the Connect link. You can use this menu to connect to the repository.
If you are in the process of creating your first repository, selecting Connect Now will automatically take you to Step 2.
- Select a repository in the Connect menu.
- Log on to the repository by entering your User Name and Password credentials. For example, User Name = admin, Password = password.
- Click OK to exit the Repository Configuration dialog box. Your user name and repository display name will appear in the upper right corner of the PDI client toolbar.
If you want the Repository Connection window to automatically appear when the PDI client starts, go to Tools > Options and click Show repository dialog at startup.
Manage Repositories in the PDI Client
After a repository is created, a menu appears next to the Connect link. You can use the menu to connect to any repository you created. If you connect to a repository, the Connect link in the PDI client toolbar is replaced by your user name and the display name of the repository.
This menu can also be used to access the Repository Manager or disconnect from your current repository.
Repository Manager
You can Add, Edit, or Delete your repositories through the Repository Manager dialog box.
If you hover over an item in the list, you can set that repository to Launch connection on startup of the PDI client. If you set a repository as the default on startup, you can clear this behavior by checking Launch connection on startup again.
You can also click on an item in the list to select it. Once selected, you can either Edit or Delete that repository. If you choose Edit, the Connection Details dialog box will appear.
Connection Details
Use the Connection Details dialog box to specify the settings of your repository.
Setting | Description |
---|---|
Display Name | Identifies the repository within the PDI client. |
URL | Defines the web address of the repository. The default value is http://localhost:8080/pentaho. You can change this setting to any web address pertaining to your specific collaboration project. |
Description | Describes the repository, such as its type and any other useful information. |
Launch connection on startup | Indicates the repository should open by default when starting the PDI client. |
Unsupported Repositories
You can also create either a database repository (which uses a central relational database to store your ETL metadata) or a file repository (which uses your local file system to store the metadata). You can create these types of repositories through the Other Repositories link in the Pentaho Repository welcome dialog box.
From the Other Repositories dialog box, you can Get Started by selecting either the Database Repository or the File Repository from the list.
Database and file repositories are not supported or recommended for production use.
Database Repository
Similar to the Pentaho Repository, you connect to the database repository by entering a Display Name into the Connection Details dialog box. After specifying a name, you need to select Database Connection, which leads to a list in the Select a database connection dialog box. From this dialog box, you can either create a new database, or Edit and Delete an existing connection. When you create a new connection or Edit, the Database Connection dialog box appears. Use this dialog box to specify your database connection, then Test and click OK. In the Select a database connection dialog box, click on what database connection you want to use and then go Back to the Connection Details dialog box. After Display Name and the Database Connection are specified, click Finish to test the connection to repository.
File Repository
Besides entering in a Display Name, you will need to specify the Location of the local file system that you want to use as a file repository. You can Browse to this location from the Connection Details dialog box. After you specify a repository name and file system location, you can click Finish to test the connection. Unlike with other repositories, when you connect to a file repository, the link in the upper right corner will only show the display name of file repository.
Use the Repository Explorer
The Repository Explorer contains options for managing connections, clusters, security, partitions, access control and version history.
Advanced Topics
The following topics help to extend your knowledge of Pentaho Repositories beyond basic setup and use:
- Import and Export PDI Content
Repository content can also be imported and exported through either the PDI client or a command line interface.
- Purge Transformations, Jobs, and Shared Objects from the Pentaho Repository
If the Pentaho Repository becomes too large for effective system performance consider purging some of the data.
- Backup and Restore Pentaho Repositories
Perform routine backups to minimize potential data loss through machine failure, theft, disaster, or accidental change.