Skip to main content

Pentaho+ documentation has moved!

The new product documentation portal is here. Check it out now at docs.hitachivantara.com

 

Hitachi Vantara Lumada and Pentaho Documentation

Google BigQuery

Parent article
You can use Google BigQuery as a data source with the Pentaho User Console or with the PDI client.

Before you begin

You must have a Google account and must create service account credentials in the form of a key file in JSON format to connect to Google BigQuery. To create service account credentials, see https://cloud.google.com/storage/docs/authentication.

Additionally, you must set permissions for your BigQuery and Google Cloud accounts. To configure your service account authentication, see https://www.simba.com/products/BigQuery/doc/v1/JDBC_InstallGuide/content/jdbc/bq/authenticating/serviceaccount.htm.

Perform the following steps to create a JDBC connection to a Google BigQuery data source from the User Console or PDI client:

Procedure

  1. Stop the Pentaho Server.

  2. Download the ZIP file containing the Simba JDBC 4.2 driver for Google BigQuery from https://cloud.google.com/bigquery/partners/simba-drivers.

  3. Extract the following files to the server/pentaho-server/tomcat/webapps/pentaho/WEB-INF/lib folder for the User Console or the design-tools/data-integration/lib directory for the PDI client:

    • GoogleBigQueryJDBC42.jar
    • google-http-client-1.22.0.jar
    • google-http-client-jackson2-1.22.0.jar
    • google-oauth-client-1.22.0.jar
    • google-api-client-1.22.0.jar
    • google-api-services-bigquery-v2-rev355-1.22.0.jar
    NoteThe Google BigQuery connection name will not display in the User Console Database Connection dialog box until you copy these files.
  4. Restart the Pentaho Server.

  5. Log on to the User Console or the PDI client, then open the Database Connection dialog box.

    See Define Data Connections for more information.
  6. In the Database Connection dialog, select General, then Google BigQuery as the Database Type.

  7. In the Settings area, enter the information for your Google BigQuery account.

    • The Host Name is the URL to Google's BigQuery web services API. For example, https://www.googleapis.com/bigquery/v2
    • The Project ID (in the PDI client) and the Database name (in the User Console) are identical.
    • The Port Number should be 443.
  8. Click Options and add the following parameters and values:

    ParameterValue
    OAuthType0 (Zero)
    OAuthServiceAcctEmailSpecify your service account email address
    OAuthPvtKeyPathSpecify the path to your private key credential file
    TimeoutSpecify the amount of time, in seconds, before the server closes the connection. The recommended value is 120 seconds
  9. Click Test to verify that you can connect to your data.