Skip to main content

Pentaho+ documentation is moving!

The new product documentation portal is here. Check it out now at docs.hitachivantara.com

 

Hitachi Vantara Lumada and Pentaho Documentation

Google BigQuery

Parent article
You can use Google BigQuery as a data source with the Pentaho User Console or with the PDI client.

Before you begin

You must have a Google account and must create service account credentials in the form of a key file in JSON format to connect to Google BigQuery. To create service account credentials, see the Google Cloud Storage Authentication documentation.

Additionally, you must set permissions for your BigQuery and Google Cloud accounts. To configure your service account authentication, see the Google Service Account documentation.

Perform the following steps to create a JDBC connection to a Google BigQuery data source from the User Console or PDI client.

Procedure

  1. Stop the Pentaho Server.

  2. Download the ZIP file containing the Simba version 1.2.2.1004 JDBC 4.2 driver for Google BigQuery from the Google Cloud Simba Drivers site.

  3. Navigate to the server/pentaho-server/tomcat/webapps/pentaho/WEB-INF/lib directory for the User Console or the design-tools/data-integration/lib directory for the PDI client and delete any files associated with previous versions of Google BigQuery.

    Visually verify each file to ensure the older version is deleted.
  4. Extract the following files to the server/pentaho-server/tomcat/webapps/pentaho/WEB-INF/lib folder for the User Console or the design-tools/data-integration/lib directory for the PDI client.

    • animal-sniffer-annotations-1.14.jar
    • api-common-1.7.0.jar
    • avro-1.9.0.jar
    • checker-compat-qual-2.5.2.jar
    • error_prone_annotations-2.1.3.jar
    • gax-1.42.0.jar
    • gax-grpc-1.42.0.jar
    • google-api-client-1.28.0.jar
    • google-api-services-bigquery-v2-rev426-1.25.0.jar
    • google-auth-library-credentials-0.15.0.jar
    • google-auth-library-oauth2-http-0.13.0.jar
    • GoogleBigQueryJDBC42.jar
    • google-cloud-bigquerystorage-0.85.0-alpha.jar
    • google-cloud-core-1.67.0.jar
    • google-cloud-core-grpc-1.67.0.jar
    • google-http-client-1.29.0.jar
    • google-http-client-apache-2.0.0.jar
    • google-http-client-jackson2-1.28.0.jar
    • google-oauth-client-1.28.0.jar
    • grpc-alts-1.18.0.jar
    • grpc-auth-1.18.0.jar
    • grpc-context-1.18.0.jar
    • grpc-core-1.18.0.jar
    • grpc-google-cloud-bigquerystorage-v1beta1-0.50.0.jar
    • grpc-grpclb-1.18.0.jar
    • grpc-netty-shaded-1.18.0.jar
    • grpc-protobuf-1.18.0.jar
    • grpc-protobuf-lite-1.18.0.jar
    • grpc-stub-1.18.0.jar
    • gson-2.7.jar
    • j2objc-annotations-1.1.jar
    • javax.annotation-api-1.3.2.jar
    • jsr305-3.0.2.jar
    • opencensus-api-0.18.0.jar
    • opencensus-contrib-grpc-metrics-0.18.0.jar
    • opencensus-contrib-http-util-0.18.0.jar
    • protobuf-java-3.7.0.jar
    • protobuf-java-util-3.7.0.jar
    • proto-google-cloud-bigquerystorage-v1beta1-0.50.0.jar
    • proto-google-common-protos-1.15.0.jar
    • proto-google-iam-v1-0.12.0.jar
    • threetenbp-1.3.3.jar
    NoteThe Google BigQuery connection name does not display in the User Console Database Connection dialog box until you copy these files.
  5. Restart the Pentaho Server.

  6. Log on to the User Console or the PDI client, then open the Database Connection dialog box.

    See Define Data Connections for more information.
  7. In the Database Connection dialog box, select General, then select Google BigQuery as the Database Type.

  8. In the Settings area, enter the information for your Google BigQuery account.

    • The Host Name is the URL to Google's BigQuery web services API. For example, https://www.googleapis.com/bigquery/v2
    • The Project ID in the PDI client and the Database name in the User Console are identical.
    • The Port Number is 443.
  9. Click Options, then add the following parameters and values.

    ParameterValue
    OAuthType0 (Zero)
    OAuthServiceAcctEmailSpecify your service account email address.
    OAuthPvtKeyPathSpecify the path to your private key credential file.
    TimeoutSpecify the amount of time, in seconds, before the server closes the connection. The recommended value is 120 seconds.
  10. Click Test to verify that you can connect to your data.