Skip to main content

Pentaho+ documentation has moved!

The new product documentation portal is here. Check it out now at docs.hitachivantara.com

 

Hitachi Vantara Lumada and Pentaho Documentation

Inspect your data

Parent article

When working with your transformation, you can gain valuable insights by visualizing and interacting with your data.

You can quickly inspect step data, reducing the amount of iterative work needed while building your transformation. Then you can rapidly publish a data source to share with your teams or across your organization.

NoteDepending on your operating system, you may need to upgrade your Web browser for the full experience. See our list of supported components here.

Get started

Begin inspecting your data by clicking on a step in the transformation.

Transformation with Data Inspection

The fly-out inspection bar appears at the top of the transformation canvas . The fly-out inspection bar displays the name of the step selected and contains two buttons for data inspection:

  • GUID-42AA6F41-BF37-4F24-9C08-3C35835F1A09-low.png

    Run and Inspect Data button - Runs the transformation up to the selected step, then lets you inspect your data.

  • GUID-42AA6F41-BF37-4F24-9C08-3C35835F1A09-low.png

    Inspect Data button - Lets you inspect the data of a step once the transformation has run.

NoteThis option runs your transformation only if it was not previously executed.

After the transformation runs, a flat table of your step data is displayed with all the available fields selected in Stream View.

Data Explorer Sample Transformation

Additionally, you can begin inspecting data using these other methods:

  • Transformation menu: Right-click on a step and choose either Inspect Data or Run and Inspect Data.
  • Preview Data panel: Select the Preview Data tab. Click the Inspect Data button located at the top right of the Preview Data panel.
  • Actions menu: Select a step. From the Menu bar, click Action Inspect Data or Action Run and Inspect Data.
  • Keyboard Shortcuts: Select a step, and then using your keyboard, do the following:
    • In Windows and Ubuntu, press either ShiftCtrlF9 to inspect data, or CtrlF9 to run and inspect data.
    • In OS X, press ShiftCommandF9 to inspect data, or CommandF9 to run and inspect data.

Tour the environment

The following illustration shows selected data visualized as a Bar chart in Model View.

Data inspection features

Use the numbered items in the illustration above to reference the sections of the inspection environment in the table below.

ItemFeatureDescription
1Header bar

Use the header bar to access:

  • The title of the step being inspected.
  • The row count of the data that was sampled, up to a maximum default of 50,000 rows.
  • The Publish data source button (Publish Data Source Button), use to create a data source for collaborative use later through a data service.
  • The Return to transformation button (Return To Transformation Button), use to return to your transformation.
2Stream View / Model View

Toggle between the Stream View and Model View modes to inspect data and build visualizations based on the data sampled.

  • Use Stream View to inspect the data using data types and formats from the PDI data stream.
  • Use Model View to inspect the data using a dimensional model that can be adjusted with the Annotate Stream step.
NoteWhen a visualization mode is not supported, the unsupported view is disabled.
Search boxUse the Search box to find a specific field in the list of available fields. This feature is especially useful in Stream View where the order of the fields is solely determined by the transformation.
Available fields panel

The available fields panel lists all available fields from the subset of data being inspected. Field types are automatically assigned as the step data are ingested, including:

  • Default fields, which contain default data depending upon the view:
    • Stream View data that are not numeric, with no date or timestamp, including string, boolean and other types.
    • Model View data that are non-measure, and not annotated as location or time hierarchies.
  • Date fields (Date Field Icon icon), which contain date data. (Stream View only)
  • Numeric fields (Numeric Field Icon icon), which contain numeric data. (Stream View only)
  • Geographic fields (Geographic Field Icon icon), which contain location data. (Model View only)
  • Measure fields (Measure Field Icon icon), which contain quantitative data. (Model View only)
  • Time fields (Time Field Icon icon), which contain time data. (Model View only)

From this panel, you can select the specific fields you want to inspect and exclude others. Selected fields display with a blue disk icon (Selected Fields Icon) to the left of their names. Click a field to select or clear it, or drag a field into the Layout panel.

  • Select Clear All to remove all fields from the Layout panel, clear all filters from the Filters panel, and clear the canvas.
  • For a flat table in Stream View, click Select All to include all fields in the flat table in the order they are listed.
3Visualization selectorUse the visualization selector to choose a visualization type. Selecting a visualization from the drop-down menu produces it on the canvas.
4Layout panelDisplays the available drop zones and associated field types needed for the selected visualization. Click the header to collapse this panel and expand the Filters panel, if needed.
5Filters panelDisplays all filters applied to a visualization. Click the header to collapse this panel and expand the Layout panel, if needed. To apply a filter, you can drag a field from the available fields panel into the Filters panel. Keyboard shortcuts are available for many filter options. Also, some specific filtering actions can be applied by clicking on the visualization. See the Use Filters to Explore Your Data article for more information.
6canvasThe canvas displays the visualization you are using for data inspection.
7Tabs bar

Use the Tabs bar to manage and navigate the tabs:

  • The active tab is always indicated with a blue highlight.
  • Create a tab for another data visualization by duplicating an existing tab or by adding a new tab.
  • Rename a tab.
  • Scroll multiple tabs.
  • Delete tabs you no longer need.
  • Display a menu (Menu Iconicon), which contains options for the selected tab (Duplicate, Delete, and Rename).

Use visualizations

Data visualizations have two modes: Stream View and Model View. You can switch between these modes to inspect data and shape visualizations based on the sampled set. Stream View generates SQL queries used in entity-relational modeling and executed in a relational database. Model View builds upon the same tables as Stream View, laying a dimensional model on top of them, and allowing for multidimensional queries, supported in the background by MDX queries to a Mondrian engine.

The first view provided during data inspection is a Stream View of your step data in a flat table on the canvas. To reduce the number of data fields selected, click anywhere on the field name in the available fields panel. The blue disc icon to the left of the name disappears, indicating that the field is no longer selected. To change the visualization type, use the visualization selector. If you select a visualization that requires a model, the mode will automatically switch to Model View. Otherwise, it remains in Stream View, and if available Model View can be manually selected.

Drag the fields you want to visualize from the available fields panel and drop them into the drop zones of the Layout panel. The drop zones and the data they accept are determined by the visualization type. To explore your data with additional visualization types, create additional tabs.

You can further customize your visualization by keeping or excluding fields, by drilling down into data points in the visualization including the legend or axis labels of a chart, and by other filtering options. When you filter, the filtering action is applied to the data and the Filters panel and visualization automatically updates, based on the selected filter. For more information, see the Filters article.

Once you are satisfied with your step data and model, you can make the content available for collaboration by publishing a data source.

Save your inspection session

You can save your data inspection session for later use and sharing. After you have made changes to the generated data and you exit the application, an inspection icon (Inspection Icon) appears on the step in the transformation canvas to indicate it has a remembered session. When you save, this session gets stored as a Kettle transformation (.ktr) file. The session can then be restored by reopening the saved file and re-inspecting the step.

NoteWhen opening older saved file formats, they will be automatically updated to the current format. After this conversion, the formats can only be opened in the current version of PDI.

Use tabs to create multiple visualizations

A tab is created when you run and inspect your data, add a new tab, or duplicate a tab. By using multiple tabs, you can create unique visualizations to inspect differences, spot trends, and develop insights regarding your data. You can add a new tab to build a new visualization, or you can duplicate an existing tab to investigate the results of small changes to your data. Although a tab is initially denoted by its associated visualization type (Table, Stacked Bar, Geo Map, etc.), you can customize this assigned tab name. When changing the tab's name, you can use the same name for more than one tab, but a name cannot be blank.

Perform the following steps to change a tab name:

Procedure

  1. Double-click the tab (or select Rename from the tab menu).

  2. Type the new tab name in the text box, and then click outside the text box (or press Enter).

    NotePress Esc if you want to cancel your changes.

Results

Tabs remain open between sessions so that you can always return to the inspection canvas to fine tune your transformation at any time until satisfied with the results. Note that tabs can become invalid when you reopen a remembered inspection session, if, for example, some of the selected fields in the transformation or step were removed, renamed, or changed in relation to the hierarchy. Additionally, tabs can become invalid when the metadata of the field changes. To revalidate those tabs, you can clear the invalid fields from the visualization in the inspection canvas, or exit your session and revert the performed changes before reentering the inspection environment. In the flat table, all invalid fields are removed automatically.

Publish for collaboration

When you are ready to make your content available for others, you can publish it as a data source. The data source uses a data service that is automatically created on the step, and is available to other tools. You must be connected to your repository to publish the data source.

Procedure

  1. Click the Publish data source button ( Publish Data Source Button ) at the top right of the header bar to open the Publish Data Source window.

  2. Click Get Started to open the Publish Details window.

    Enter the data source information in the following fields:

    FieldsDescription
    Data Source NameThe name used by other Pentaho applications when accessing your data source.
    ServerThe default value for this field is your current repository. You can select other repository connections, if you have created them, through the Repository Manager.
    URLThe base URL string used to connect to the server.
    User NameThe user name required to access the server. The user must also have publishing permissions.
    PasswordThe password associated with the provided user name.
  3. When you are done, click Finish.

  4. Once your data source is created, a confirmation will appear. The data source should now be available on the server. Click Close to continue inspecting your data or click View this in User Console to open a new browser window and work with the data source in Analyzer.

Learn more

For more information on inspecting your data, see the following articles: