Skip to main content
Hitachi Vantara Lumada and Pentaho Documentation

Redesign Your Data Warehouse

Note: The advice in this section applies to building and optimizing data warehouses in general and is not specific to Analysis. However, since poor data warehouse design is so frequently a significant source of performance loss for Pentaho Analysis, it is listed in this section.

A data warehouse exists to consolidate and partition transactional data into one streamlined, organized, canonical source for reporting and analysis. Some guidelines to follow in data warehouse design are:

  • Be open to modifying the original design to meet adjusted requirements from business users (iterative design).
  • Remove data that is not actually used by business users.
  • Optimize for the right purpose. There are basically two use cases to consider: analysis (slice/dice/pivot) and static reporting. You could also use a data warehouse to cleanse and consolidate transactional data for data mining, but this model would almost certainly be inappropriate for analysis or reporting.
  • Avoid creating high-cardinality dimensions (putting too many records into fact tables). High-cardinality dimensions will never perform well.
  • If there is a lot of unrelated information in your data warehouse, consider breaking it up into more topic-specific data marts.
  • Create indexes for large fact tables.
  • Create aggregate tables for frequently-computed views.