Support Statement for Analyzer on Impala
Using Analyzer on Impala
These are the minimum requirements for Analyzer to work with Impala.
- Use minimum Pentaho BA Suite EE 5.1 or later.
- Use minimum Cloudera CDH5.x, Impala 1.3.x or later.
- Recommend using Parquet compressed file format for tables in Impala.
- Recommendations for the Hive and Simba drivers. Which driver you should use depends on the following scenarios:
Scenario | Recommended Driver |
---|---|
Pentaho 5.4 with CDH 5.3 or earlier | Apache Hive JDBC that was distributed as part of the CDH shim |
Pentaho 6.0 with CDH 5.4 shim | Impala JDBC Connector 2.5.24 Cloudera Simba driver |
Pentaho 6.1 with CDH 5.5 shim | Impala JDBC Connector 2.5.29 Cloudera Simba driver |
Pentaho 6.1 with CDH 5.7 shim | Impala JDBC Connector 2.5.31 Cloudera Simba driver |
- Make sure that the JDBC driver is dropped into the BA Server and Schema Workbench directories.
- Turn off connection pooling in Pentaho BA Server.
- Set global order by limit in Cloudera manager.
- In Mondrian schemas, divide dimension tables with high cardinality into several levels.
As with any data source, the performance of Pentaho Analyzer on Impala will be dependent upon the data shape, Impala’s configuration, and the types of queries. See the link https://support.pentaho.com/hc/en-us/articles/208652846 to the best practice concerning Pentaho Analyzer on Impala.
There are some compiled Mondrian automated test suite results for Analyzer on Impala with OEM Simba, as well as the community Apache Hive driver: