Skip to main content

Pentaho+ documentation has moved!

The new product documentation portal is here. Check it out now at docs.hitachivantara.com

 

Hitachi Vantara Lumada and Pentaho Documentation

Query HCP

Parent article

The Query HCP step uses the Metadata Query Engine (MQE) to query your Hitachi Content Platform (HCP) repository for objects, their URLs, and system metadata properties. You can use the resulting object URL to define HCP object and custom metadata locations for the PDI Read metadata from HCP and Write metadata to HCP steps.

NoteAfter modifying an HCP object, your change may not be immediately reflected in your query results of the HCP repository until the search index has been updated.

Before you begin

In the HCP Tenant Management Console and System Management Console, perform the following steps to set up HCP for queries from PDI:

Procedure

  1. In the Tenant Management Console at the tenant level, verify the following options are selected for you namespace:

    1. Under the Protocols tab, select Enable REST API.

    2. Under the Settings tab, select Enable ACLs.

    See Enabling REST, S3 compatible, HSwift, and WebDAV access to namespace and Enabling the use of ACLs for more details.
  2. In the System Management Console, go to the Services tab and verify that the following options for MQE (Metadata Query Engine) are selected:

    • Enable metadata query API
    • Enable indexing
    • Enable indexing of custom metadata

    See Configuring the metadata query engine in the HCP documentation for more details.

  3. In the Tenant Management Console on the level of your namespace, go to the Services tab, and verify that the following options for Search are selected:

    • Enable search
    • Enable indexing
    • Enable indexing of custom metadata
    • Enable full custom metadata indexing

    See Setting search and indexing options for more details.

General

The following field is general to this transformation step:

  • Step name: Specify the unique name of the transformation step on the canvas. You can customize the name or leave it as the default.

Options

The Query HCP step features Query and Output tabs. Each tab is described below.

Query tab

Query tab in the Query HCP step

In this tab, specify the VFS connection to connect to your HCP repository, and then refine your search with a query statement and other options.

OptionDescription
HCP VFS ConnectionFrom the drop-down list, select the VFS connection you created for this transformation to connect to your HCP repository.
Query statementSpecify your HCP Metadata Query Engine (MQE) search statement for finding particular objects. For example, to find objects of 2000 MBs or greater, specify +(size[2000 TO *]).

You can use the metadata query engine to generate a Query statement for PDI.

  1. In the HCP Namespace Browser, specify your filters in the Structured Query tab.
  2. Click Show as advanced query. The related query statement is generated under the Advanced Query tab in the HCP Namespace Browser.
  3. Copy the resulting HCP query statement into this PDI Query statement option.
See Working with structured searches and Working with advanced searches for more details.
Sort results bySelect which object properties you want to use to sort the results in alphabetical order.

You can sort by Ascending or Descending order.

Number of rows to skip(Optional) Specify a number of resulting objects to skip. This option is useful when setting up a paged query. See the "Paging through objects" section below for information about paged queries.
Number of results to return(Optional) Specify the maximum number of objects to return. The maximum is 10,000 objects. If you do not specify a value for Number of results to return, HCP returns the properties of up to 100 objects.

The following other values are possible for this option:

  • -1

    Returns the properties of all the objects up to 10,000 objects.

  • 0

    Returns just the total count of objects in the HCP repository.

Hitachi Vantara recommends working with only a few thousand results at a time for the best performance.

Paging through objects

You can use Number of rows to skip and Number of results to return to page through in your HCP repository. First, query the repository with Number of results to return set to 0 to obtain the total count of objects. Then, query the repository for a specify number of objects at an offset location by setting Number of rows to skip. Within a PDI transformation, you can loop over the Query HCP step with a variable offset to page through the objects in the HCP repository. See Paged queries with object-based requests for more details on how to page through an HCP repository.

Output tab

Output tab in the Query HCP step

In this tab, specify the PDI fields for the resulting object URLs and system metadata properties.

OptionDescription
Outgoing field for Object URLSpecify the name of the PDI field that will contain the URL of a resulting object.
Return all object propertiesSelect this option to include the system metadata properties associated with the resulting objects. If this option is cleared, properties will not be included with the resulting objects.
Outgoing field for object propertiesIf you select Return all object properties, you must specify the name of the PDI field that will contain the system metadata properties of a resulting object.