Skip to main content

Pentaho+ documentation has moved!

The new product documentation portal is here. Check it out now at docs.hitachivantara.com

 

Hitachi Vantara Lumada and Pentaho Documentation

Add a MongoDB data source

Parent aricle

You can define a MongoDB data source query, then display query results in a report.

To establish a connection to a MongoDB data source, specify the input options and queries when you select Configure Fields in the MongoDB Input step. For more information about these fields, see MongoDB Input.

Optionally, you can connect to a MongoDB data source using the Connection String option in the step. See the Configure connection tab of the MongoDB Input step for details.

Before you begin

To complete this task, you must have MongoDB database connection information, such as host names, port numbers and authentication credentials. If you need these items, contact your system administrator for help before you begin.

Procedure

  1. Open the Pentaho Report Designer window and either create a report or open an existing one.

  2. Select Data Add Datasource MongoDB from the menu.

    The MongoDB Data Source window appears.
  3. Click Add New Query to name the MongoDB data source definition.

    Fields in the window become active.
  4. Type the name of the query in the Name field.

  5. Configure the connection

    1. Enter the host name and port number for your MongoDB database.

      You can also specify a different port number for each host name by separating the host name and port number with a colon, and separating each combination of host name and port number with a comma like this: localhost1:27017,localhost2:27018. If you specify the port in the host field, leave the port field empty.
    2. If you want to MongoDB to automatically sense and attempt to connect to available hosts, even if one is down, select the Use all replica set members checkbox.

    3. Type the user name needed to access the MongoDB database in the User field, then type the password of the user in the Password field.

    4. Indicate how long the database should wait before terminating the connection attempt.

      If you do not want the database to ever terminate the connection, leave the Connection timeout field blank. Otherwise, enter a numerical value in milliseconds.
    5. In the Socket timeout field, indicate how long the database should wait for a write operation to occur before it terminates it.

      If you do not want the database to ever terminate it, leave this field blank.
  6. To query the MongoDB server for available databases and collections, click the Input Options tab.

    You can also set the read preference and tag set specification, in this tab.
    1. Click Get DBs to populate the drop-down menu with names of available databases then select the appropriate database, or enter the database name.

    2. Click Get collections to populate the drop-down menu with names of available Mongo collections.

      If an error message appears, check the host name and port numbers in the Configure connection tab.
    3. Select a database from the Database drop down menu.

    4. Click the Get Collections button, then select a collection from the Collection drop down menu.

    5. Indicate the read preference in the Read field.

    6. If you want to specify a tag set, click the Get tags button.

      Tag sets that have been specified on the MongoDB database appear in the Tag Set section of the window. If you want to append tag sets together so that they are processed at one time, select the tag sets, then click the Join tags button. Click the Test tag sets button to see a list of nodes that match the tag set criteria.
  7. Click the Query tab.

    You can formulate a query using two different methods. You can either create the query as a JSON Query expression, or use the Aggregation Framework. The Aggregation Framework is explained in detail in the MongoDB Aggregation Framework documentation.
    • JSON Query Expression

      Using JSON Query Expressions is analogous to using the MongoDB find() command documented on the http://docs.mongodb.org/manual/core/read-operations page. The query argument to find is entered in the Query expression (JSON) field. The projection argument is supplied in the Fields expression (JSON) text box. In order to use the JSON Query Expression mode, ensure that the Query is aggregation pipeline checkbox is not selected.

    • Aggregation Framework

      To query MongoDB using the Aggregation Framework http://docs.mongodb.org/manual/core/aggregation/ click the Query is aggregation pipeline checkbox. Enter a sequence of pipeline operations in the Query expression (JSON) field. This mode uses the same syntax as the MongoDB aggregation() command.

  8. Click the Fields tab to view the fields that are in the database and collection you specified.

    You can also edit field names that appear in Report Designer, edit the path to the field that you want to include in the report, and make changes to the type.
    1. Click the Get Fields button.

      Pentaho's Schema on Read functionality samples the documents in the collection to determine which fields are available what their data types are. The fields are displayed.
    2. If desired, edit the names of the fields.

      The names are what the fields will be called in Data Integration and Report Designer.
    3. If desired, edit the path to the field in the MongoDB database.

      If an array was returned, you can specify the element in the array by indicating the number of the element in brackets, like this: $.myArrayElement[0]. In this example, which is of a zero-based array, the content of the first field is returned.
      NoteIf you want to return all of the items in an array, place an asterisk in the bracket, like this:$.myArrayElement[*]
    4. Edit the data type if necessary.

  9. You can paramaterize both JSON Query Expressions and Aggregation Pipeline queries using simple string replacement.

    Parameters are specified using Pentaho Reporting's parameter syntax ${param} where param is the name of the parameter containing the data that you want to replace the param name. For example, if you have defined a report parameter named state, that you want to use to select documents for that state, your query could look like this: {$match : { state : "${State}" }}. So, if you set the state parameter to FL at runtime, the resulting query submitted to MongoDB would look like this: {$match : { state : "FL" }} If the parameter name used in the ${} matches the name of existing parameter, the linkage between the parameter and the query are automatic. If you prefer to use a different name in the query, click the Edit Parameter button.
    1. Click the Edit Parameter button.

    2. In the Transformation Parameters window, click the Add a New Parameter button to add a row to the table.

    3. In the DataRow column, choose the parameter you want to add from the drop down list.

    4. Select the Transformation Parameter from the drop down list.

    5. If desired, add a transformation argument by clicking the Add a New Transformation Argument button and adding the argument in the row that appears.

    6. When complete, click OK.

  10. Click Preview to test the connection and to see what the data will look like when it is brought into the report designer. When complete, click OK.

Results

Report Designer can now access your MongoDB data source.