Skip to main content

Pentaho+ documentation has moved!

The new product documentation portal is here. Check it out now at docs.hitachivantara.com

 

Hitachi Vantara Lumada and Pentaho Documentation

Support statement for Analyzer on Impala

Parent article

These are the minimum requirements for Analyzer to work with Impala:

  • Pentaho 7.1 or later
  • Cloudera CDH5.x, CDH 6.1, Impala 1.3.x or later
  • Recommend using Parquet compressed file format for tables in Impala
  • Recommendations for the Hive and Simba drivers. The driver to use depends on the following scenarios:
    ScenarioRecommended Driver
    Pentaho 8.3 or later with the CDH 5.14 shimImpala JDBC Connector 2.5.43 Cloudera driver.
    Pentaho 8.3 or later with the CDH 6.1 driverImpala JDBC Connector 2.6.4. Cloudera driver.
    Pentaho 9.0 or later with the CDH 6.1 driverImpala JDBC Connector 2.6.4. Cloudera driver.
    Pentaho 9.1 or later with the CDP 7.1.4 driverImpala JDBC Connector 2.6.4. Cloudera driver.
  • Make sure that the JDBC driver is dropped into the Pentaho Server and Schema Workbench directories
  • Turn off connection pooling in Pentaho Server.
  • Set global order by limit in Cloudera manager.
  • In Mondrian schemas, divide dimension tables with high cardinality into several levels
NoteAs with any data source, the performance of Pentaho Analyzer on Impala will be dependent upon the data shape, Impala’s configuration, and the types of queries. See the best practice, "Pentaho Analyzer with Impala as a Data Source" located at: https://support.pentaho.com/hc/en-us/articles/208652846 or download the PDF.

There are some compiled Mondrian automated test suite results for Analyzer on Impala with OEM Simba, as well as the community Apache Hive driver: