Inspect Your Data
When working with your transformation, you can gain valuable insights by visualizing and interacting with your data. You can quickly inspect step data, reducing the amount of iterative work needed while building your transformation. Then you can rapidly publish a data source to share with your teams or across your organization.
Depending on your operating system, you may need to upgrade your Web browser for the full experience. See our list of supported components here.
Begin inspecting your data by clicking on a step in the transformation.
The fly-out inspection bar appears at the top of the transformation canvas. The bar displays the name of the step selected and offers two options:
- Run and Inspect Data - Runs the transformation up to the selected step, then lets you inspect your data.
- Inspect Data - Lets you inspect the data of a step once the transformation has run.
Note: This option runs your transformation only if it was not previously executed.
After the transformation runs, a flat table of your step data is displayed with all the available fields selected in Stream View.
Additionally, you can begin inspecting data using these other methods:
Step Context Menu - Right-click on a step and choose either Inspect Data or Run and Inspect Data.
- Preview Data Panel - Select the Preview Data tab. Click the Inspect Data button located at the top right of the Preview Data bar.
- Actions Menu - Select a step. From the Menu bar, click Action>Inspect Data or Action>Run and Inspect Data.
- Keyboard Shortcuts - Select a step. Then using your keyboard, do the following:
- In Windows and Ubuntu, press either Shift+Ctrl+F9 (Inspect Data) or Ctrl+F9 (Run and Inspect Data).
- In OS X, press Shift+Command+F9 (Inspect Data) or Command+F9 (Run and Inspect Data).
Tour the Environment
The following illustration shows selected data visualized as a bar chart in Model View.
Use the number locators in the preceding illustration to reference the sections of the inspection environment in the table below.
Use the Header bar to access:
Stream View / Model View
Toggle between the Stream View and Model View modes to inspect data and build visualizations based on the data sampled.
When a visualization mode is not supported, the unsupported view is disabled.
|Search Box||Use the Search Box to find a specific field in the list of available fields. This feature is especially useful in Stream View where the order of the fields is solely determined by the transformation.|
|Available Fields panel||
The Available Fields panel lists all available fields from the subset of data being inspected. Field types are automatically assigned as the step data are ingested, including:
From this panel, you can select the specific fields you want to inspect and exclude others. Selected fields display with a blue disk icon () to the left of their names. Click a field to select or clear it, or drag a field into the Layout panel.
|Visualization Selector||Use the Visualization Selector to choose a visualization type. Selecting a visualization from the drop-down menu displays it in the Canvas area.|
|Layout panel||Displays the available drop zones and associated field types needed for the selected visualization. Click the header to collapse this panel and expand the Filters panel, if needed.|
Displays all filters applied to a visualization. Click the header to collapse this panel and expand the Layout panel, if needed. To apply a filter, you can drag a field from the Available Fields panel into the Filters panel. Keyboard shortcuts are available for many filter options. Also, some specific filtering actions can be applied by clicking on the visualization. See the Use Filters to Explore Your Data article for more information.
|Canvas||The Canvas displays the selected visualization.|
Use the Tabs bar to manage and navigate the tabs:
Data visualizations have two modes: Stream View and Model View. You can switch between these modes to inspect data and shape visualizations based on the sampled set. Stream View generates SQL queries used in entity-relational modeling and executed in a relational database. Model View builds upon the same tables as Stream View, laying a dimensional model on top of them, and allowing for multidimensional queries, supported in the background by MDX queries to a Mondrian engine.
The first view provided during data inspection is a Stream View of your step data in a flat table on the Canvas. To reduce the number of data fields selected, click anywhere on the field name in the Available Fields panel. The blue disc icon to the left of the name disappears, indicating that the field is no longer selected. To change the visualization type, use the Visualization Selector. If you select a visualization that requires a model, the mode will automatically switch to Model View. Otherwise, it remains in Stream View, and if available Model View can be manually selected.
Drag the fields you want to visualize from the Available Fields panel and drop them into the drop zones of the Layout panel. The drop zones and the data they accept are determined by the visualization type. To explore your data with additional visualization types, create additional tabs.
You can further customize your visualization by keeping or excluding fields, by drilling down into data points in the visualization including the legend or axis labels of a chart, and by other filtering options. When you filter, the filtering action is applied to the data and the Filters panel and visualization automatically updates, based on the selected filter. For more information, see the Filters article.
Once you are satisfied with your step data and model, you can make the content available for collaboration by publishing a data source.
Save Your Inspection Session
You can save your data inspection session for later use and sharing. After you have made changes to the generated data and you exit the application, the inspection icon () appears on the step in the transformation canvas to indicate it has a remembered session. When you save, this session gets stored as a Kettle transformation (.ktr) file. The session can then be restored by reopening the saved file and re-inspecting the step.
When opening older saved file formats, they will be automatically updated to the current format. After this conversion, the formats can only be opened in the current version of PDI.
Use Tabs to Create Multiple Visualizations
A tab is created when you run and inspect your data, add a new tab, or duplicate a tab. By using multiple tabs, you can create unique visualizations to inspect differences, spot trends, and develop insights regarding your data. You can add a new tab to build a new visualization, or you can duplicate an existing tab to investigate the results of small changes to your data. Although a tab is initially denoted by its associated visualization type (Table, Stacked Bar, Geo Map, etc.), you can customize this assigned tab name. When changing the tab name, you can use the same name for more than one tab, but a tab name cannot be blank.
Perform the following steps to change a tab name:
- Double-click the tab (or select Rename from the Tab menu).
- Type the new tab name in the text box, and then click outside the text box (or press Enter).
Press Esc if you want to cancel your changes.
Tabs remain open between sessions so that you can always return to the inspection canvas to fine tune your transformation at any time until satisfied with the results. Note that tabs can become invalid when you reopen a remembered inspection session, if, for example, some of the selected fields in the transformation or step were removed, renamed, or changed in relation to the hierarchy. Additionally, tabs can become invalid when the metadata of the field changes. To revalidate those tabs, you can clear the invalid fields from the visualization in the inspection canvas, or exit your session and revert the performed changes before reentering the inspection environment. In the flat table, all invalid fields are removed automatically.
Publish for Collaboration
When you are ready to make your content available for others, you can publish it as a data source. The data source uses a data service that is automatically created on the step, and is available to other tools. You must be connected to your repository to publish the data source.
Perform the following steps to publish your content:
- Click the Publish button ( ) at the top right of the Header bar to open the Publish Data Source.
- Click Get Started to open the Publish Details window.
Enter the data source information in the following fields:
|Data Source Name||The name used by other Pentaho applications when accessing your data source.|
|Server||The default value for this field is your current repository. You can select other repository connections, if you have created them, through the Repository Manager.|
|URL||The base URL string used to connect to the server.|
The user name required to access the server. The user must also have publishing permissions.
|Password||The password associated with the provided user name.|
- When you are done, click Finish.
- Once your data source is created, a confirmation will appear. The data source should now be available on the server. Click Close to continue inspecting your data or click View this in User Console to open a new browser window and work with the data source in Analyzer.
For more information on inspecting your data, see the following articles: