Best Practices in Modelling IBM Cognos 8 Semantic Layers

In October 2008, we published the below Insight "Best Practices in Modelling IBM Cognos 8 Semantic Layers" on our website. Since then, this page has become the most visited page on our website, with over 50.000 pageviews since publication, clearly showing the need for the topic. Whilst most of the concepts of Framework modelling in Cognos 8 still are valid for Cognos 10, we decided to update the Insight to incorporate new IBM Cognos 10 functionality. We thus are proud to present : "Best Practices in Modelling IBM Cognos 10.2 Semantic Layers". Below you still can find the original IBM Cognos 8 Insight.

Cognos 8 BI Stack

The IBM Cognos 8 Corporate Performance Management system offers report authors a single platform to create reports, dashboards, events and perform analysis on multidimensional data. Cognos uses a three tier architecture to service the report consumers and authors. The client connects using a zero footprint web-browser to Cognos Connection. Zero Footprint means no additional software or applets at all are installed on the client PC.

 
From the end-user perspective, the Cognos platform is split into a number of studios:
  • Query Studio to perform basic reporting using basic queries and formatting
  • Report Studio to perform advanced, pixel perfect reporting with complex queries
  • Analysis Studio to perform multidimensional analysis
  • Metrics Studio to follow metric performance over time and define actions as necessary
  • Event Studio to create agents to follow up on triggers
The metadata used to create reports is created by the modeller and is called a semantic layer. The same layer of metadata can be used in the different end-user studios. This Insight will discuss the best modelling techniques for the metadata layer. But first, what exactly is a semantic layer?

Semantic Layers

The purpose of a semantic layer is to create a business representation of corporate data. This representation hides database complexity to the end-user by creating an intuitive model and by using common business names across the organisation.
 
The semantic layer maps complex data into business terms and shields cryptic database language from the end-user. Furthermore, a semantic layer can handle multilingual features and consolidate different database sources and OLAP cubes. This enables the use of different databases –even from different vendors- and OLAP cubes in a single semantic layer, enabling the ability for use in a single report and in a transparent uniform manner.
 
The semantic layer insulates business users from underlying data complexity, while ensuring the business is accessing the correct data sources and using consistent terminology. This improves end-user productivity and enables greater business autonomy from the Information Services Department in accessing data.
 
The main metadata modelling tool within Cognos 8 is called Framework Manager. Framework Manager is used to create relational and dimensionally modelled relational models, called frameworks. Other metadata modelling tools include Cognos Transformer, used to create OLAP-cubes and Cognos Metric Designer. The metadata modelling tools within Cognos are client-server applications.

Flexible Models

Model flexibility can be defined from two different points of view. How easily can the model be adapted to changing conditions and how easily can the user generate ad hoc query requests? Both questions can be answered by using dimensional modelling.
 
The dimensionally modelled database is ideal for reporting and is often referred to as a data warehouse. In a data warehouse facts and dimensions are established and data is stored at the lowest granular level. In every data warehouse a number of star schema's are present. The central table represents the fact table and only contains numeric and additive measures. The satellite tables represent the set of dimensions that can be used to look at the measures from different angles.

By using conformed dimensions, a "data warehouse bus” is established. Conformed dimensions are dimensions used by multiple fact tables. This method of modelling enables executing multifact, multigrain queries ensuring a predictable, clean set of results. When new facts or dimensions are added, they can be quite easily added to the model, representing a new star schema.

However not all frameworks are build on dimensionally modelled databases. Quite often a data warehouse is not available and reporting is enabled directly on an OLTP (On Line Transaction Processing)-database, used in an operational system like an invoicing or order entry system, or an operational data store. These types of databases are modelled relationally and are highly normalized.

There are a number of drawbacks for reporting on a relational model. The first drawback is query performance, a highly normalized model will lead to dozens of tables in a single SQL statement, leading to large execution plans and potentially slow performance. Doing such queries on an application production environment could even lead to problems with the applications operational performance. Relational data sources also pose a number of modelling challenges for the framework modeller to create predictable query results.

Therefore it is recommended to always use a dimensionally modelled database for querying.

Framework Objects

A framework uses a number of objects to create a structured model. A namespace creates a qualifying container for objects, avoiding naming conflicts. Within a namespace the modeller can use folders to group filters or query subjects. There are three different types of query subjects :
  • Data source query subject: performs a query on the underlying data source
  • Model query subject: refers to an existing query subject in the model.
  • Stored procedure query subject: used to retrieve data from stored procedures.
Figure 1: Cognos 8 BI Framework Manager - Overview of the Data Foundation View
     

    Query subjects are linked together using relationships. A query subject can be edited by replacing the standard SQL with custom written SQL. The same can be done for relationships between query subjects. For maintenance purposes, it is however recommended never to make any changes at all in freehand SQL. When a physical table is changed, the new definition has to be manually adjusted or the table needs to be imported again. Importing is by far the easiest way to allow for bulk changes. When freehand SQL was entered, all these changes are lost following a new import.

    Calculations and determinants can be added without making changes to the SQL code by using the proper tab pages. A determinant is needed to identify certain levels of aggregation within the query subject. This is a particularly useful feature while dealing with multifact, multigrain queries.

    For OLAP functionality, two additional objects are available: a measure dimension and a regular dimension. A measure dimension contains a collection of "facts”. The regular dimension provides the accompanying set of descriptions and identifiers. The measure and regular dimensions are linked with scope relationships to define the level at which the measures are available for reporting.

    Creating Durable Models

    While creating a model it is important to create a proper structure. The use of a multi-tier structure will shield the end-user from changes at database level such as migration to a different database technology or version or changes to column or table names. By creating an efficient layered structure, relational models can be modelled into virtual star schemas, providing predictable and reliable query results to the end user.

    The first step in creating the framework model is importing the metadata. This can be handled by using the Metadata Wizard. It is good practice to create a separate namespace for every data source that is needed in the framework. On top of the namespace for the data source, a global namespace should be created: the Data Foundation View.
     
    Figure 2: Cognos 8 BI Framework Manager - Query subjects and setting basic properties
    Best Practices in Modelling IBM Cognos 8 Semantic Layers
    click to enlarge Best Practices in Modelling IBM Cognos 8 Semantic Layers
     
    When all data source objects are imported, the model should be scrutinized concerning relations between objects. Not only objects within the same data source should be linked together but relationships between different data sources are needed as well. This step will be much harder to do with a relational modelled data source then with a dimensionally modelled data source.

    In the Data Foundation view, calculations, embedded and standalone filters, determinants are added. Database column names are translated to more understandable business names in multiple languages if needed.

    For every query item, the modeller should check if the usage is set correctly. The usage of a field can be identifier, attribute or fact. Facts are numeric, usually additive or semi-additive data. All indexed columns or columns containing business keys should be set as identifier. Attributes are typically all other strings.

    For every fact column, the aggregate should be set. Other options that should be set are the format, screen tip, description … These properties are inherited by derived objects later on in the modelling process. A number of reporting traps is handled in this layer by creating model query subjects.
     
    If you are not the database owner, it is good practice to import all the metadata once more in a Reference View. In this view no changes at all are made, the objects are not used beyond this view in the framework. The Reference View represents the original state of the data source at the time of the model creation. Not only very useful as reference to the original data for the modeller, but this will also allow for detecting undocumented database changes. Other levels in the structure are the Consolidation View and the Presentation View, but first let's delve into modelling a bit deeper.

    Model for predictable results

    The greatest challenge for the model developer lies in creating a model that returns proper query results at all times, no matter what columns were selected in the report by the user. The objective is to model a relationally modelled data source as a virtual star schema. Most of what is following does not apply to a dimensionally modelled data source such as a data warehouse. In this type of data source most reporting traps were already handled by the data warehouse design & team.

    Simplify with dimensional concepts

    When importing from a relational data source, cardinality is detected based on a set of rules specified by the modeller. The best option is to use the primary and foreign key constraints present in the data source.

    Framework Manager uses the following rules:
    • cardinality is always applied in the scope of a query performed by the user
    • 1 or 0-to-n relationship implies a fact but only if ALL relationships to that query subject are 1 or 0-to-n
    • 1 or 0 to 1 implies a dimension
    This means it is possible that a query subject will behave as a dimension in one query and as fact in another query. This is typically the case with master – detail tables, such as INVOICE_HEADER and INVOICE_LINE. This situation can be handled by using a model query subject. The model query subject will logically condense INVOICE_HEADER and INVOICE_LINE into one model query subject INVOICE_FACT, thus enforcing the correct context in every Invoice related query.
     
    Condensing this master-detail relation will avoid unwanted query splits and blind spots. Blind spots occur when there is a missing relationship between two tables due to the omission of a query subject in a query. To avoid cross-joins between two fact tables, all dimensional information from the fact table should be removed. The fact table should only contain keys and numeric values. The removed items are used to create a new dimension using a new model query subject. Facts should always be separated from another fact by a common dimension.

    For clarity towards end users it is recommended to collapse any hierarchical structure between different query subjects such as countries – districts – branches into one model query subject. Dimensions should always be condensed and only have relationships to other facts. It is good practice to isolate the original data source query subjects that were used to build model query subjects in a separate folder. The newly created model query subjects should be in a separate folder as well and will be used as shortcuts in the Consolidation View.

    Handle Ambiguous Relationships

    There are two types of relationships that could provide inconsistent result sets if not handled by the modeller. The first occurs when there are multiple valid relationships. This typically occurs between facts and dimensions. In the invoice table, a number of dates are present: invoice date, ship date, order date… All point to the time dimension. For every role, a query shortcut or model query subject should be created with a single relationship specific for the role to the dimension.

    Another issue occurs when handling recursive relationships. The classic example is the manager – employee relation. An employee has a manager. The manager is an employee and also has a manager that again is an employee. This issue can be solved using a model query subject or by using database specific features such as for example CONNECT BY in the case of Oracle-technology.

    Multi fact multi grain queries

    The key of performing multi fact multi grain queries is by using a conformed dimension shared between both fact tables. When retrieving 2 measures from two different fact tables using a different granularity, Cognos can determine the correct aggregation when determinants are specified. A determinant will specify what set of columns will uniquely define a dataset. This will enable the report developer to create a report showing revenue at week level versus Full Time Equivalents at month level without double counting the lowest grain fact ! Cognos uses the mechanism of stitch queries to perform this type of requests. A stitch query will perform a full outer join to break queries into multiple selects, one for each fact table and then stitch the data back together.

    It is recommended to use the Cognos Model Advisor. This tool provides proven-practice modelling guidelines to identify modelling inconsistencies for faster development.

    Consolidate

    When all data related issues and reporting traps are handled, the next step into the modelling process is creating a Consolidation View. The consolidation view usually is split up into a relational view and a dimensionally modelled relational (DMR) view of the metadata.The first is used for normal reporting in Report or Query Studio. The second is used in Analysis Studio and resembles an OLAP cube. The main difference between a Dimensionally Modelled Relation model (DMR) and an OLAP-cube is that the latter is physically stored in a multidimensional way. The DMR-model is a virtual way of modelling the data source and does not physically stores data.

    Relational

    In this view the modeller will create a model query subject for every data source query subject in the Data Foundation View that is not yet handled by a model query subject. A model query subject is also the best place to use macro functions and parameter maps to handle multilingual tables, containing a field for every language.

    Model query subjects can also be used to override settings such as relations at the Data Foundation level. By using the calculations in Model Query Subjects, modellers can refrain from entering freehand SQL in data source query subjects, which should be avoided at all times for maintenance purposes.

    For all query subjects where no model query subject was yet defined, a model query subject has to be defined in the Consolidation Views. For those model query subjects in the Data Foundation View, a shortcut will be defined to the Consolidation View. The relational view should only contain model query subjects or shortcuts to model query subjects created in the Data Foundation view.

    DMR

    Dimensionally Modelled Relational models are virtual OLAP cubes. The data is presented in an OLAP-style, but is not physically stored on the server. Instead at every user request, a query is executed. This style of modelling is used when you want to enable analysis, using drill up / drill down in Analysis Studio.
     
    However, OLAP cubes and a relational star schema model will yield better performance due to the complex nature of a DMR model.
     
    DMR models are made up out of regular dimensions (dimensions) and measure dimensions (facts). A regular dimension consists of one or more defined hierarchies containing levels, keys, captions and attributes. Level information is used to roll up the measures. Each level should have the key and caption defined. If there is a Unique key, the bottom level should be marked Unique, otherwise, the combination of all upper levels is used to identify a member. If the star schema is modelled in its final form, regular dimensions can be quickly generated by using Merge in New Regular Dimension on the relational view of the Consolidation view.
     
    If a regular dimension is based on a query subject that has determinants specified, it is recommended that one level corresponds to each determinant and that the order of levels is equal to the order of determinants. Create a determinant for every level needed. Multiple hierarchies can be specified, but you cannot use them together in a single report query. If this would be a necessity, create a regular dimension for every hierarchy.

    A measure dimension is a logical grouping of facts which enables OLAP-styled querying of a relational database. Measure dimensions and regular dimensions are related through scope relations; however the underlying query subject joins remain in use. A scope relation will specify what levels of a dimension are ‘in scope' for a certain measure. A scope relation is mandatory and will be created automatically using the underlying query subject joins.

    Presentation

    The final step in modelling a framework is creating a presentation layer. The presentation layer is build from several star schema groupings. Star schema groupings make the model more intuitive to the end user. For every star schema grouping a different namespace is created, showing the end user functional business area's of which to select elements in the query. Using the wizard star schema groupings can be created quickly.

    Remember to run the impact analysis before publishing to gain insight at what reports will be affected. A single framework can be published in multiple packages, each containing part of the framework. This is however not a requirement. A single package can contain the entire presentation layer.

    Multilingual

    Framework Manager allows the modeller to translate static report content such as field names and descriptions. This is done by adding languages to the framework. The design language however can never be changed ! Not only static content can be translated transparently for the end-user… Quite often product descriptions are kept in multiple languages in the data source.
    There are two methods of storing this information:
     
    • separate column for every language for example PRODUCT_EN, PRODUCT_DU, PRODUCT_FR
    • separate row for every language
    Whatever solution was chosen, macro functions enable the modeller to create the proper SQL at runtime, by using the language options set by the user.
     
    When there are multiple columns for every language, the modeller can specify that the column name retrieved at runtime is dependent of the user language. Sometimes the languages are not mapped correctly. Therefore a mapped value is chosen from the Language_lookup parameter map.
    • #'[NAMESPACE].[QUERY SUBJECT].[QUERY ITEM_'+ $Language_lookup{$runLocale}+ '] '#
    If the multilingual data is stored in rows, a filter can be added:
    • [NAMESPACE].[QUERY SUBJECT].[LANGUAGE] = #$Language_lookup{$runLocale}+ '] '#

    Enhance performance

    Aggregate tables are probably the single most cost effective measure in boosting reporting and datawarehouse performance. Unlike competing products Framework Manager does not have native functionality to facilitate aggregates. However nearly all major vendors offer functionality within the database to transparently rewrite queries to aggregate tables. In Oracle for example, this functionality is called query rewrite and materialized views, in DB2 these are called materialized query tables.

    Consequently, the Cognos Framework modeller will have to create part of the metadata in the database, such as specifying the aggregates, dimensions … By using a query rewrite functionality, the database will transparently rewrite the query to the aggregate or detail table, depending on what level of detail the user requested. The use of aggregates is highly recommended.

    To optimize the model performance, it is recommended to cleanup the runtime model folder by times and to only use 1 database connection for each physical database. By using ‘Minimized SQL' as a governor, extreme queries can still be executed, as Cognos minimizes the SQL generated.

    While developing the model, it can be annoying to query the full database while testing the model. For that purpose it is possible to use Design Mode filters.

    Reuse metadata

    Cognos 8 Framework Query Subjects can be directly re-used in Cognos 8 Transformer to generate OLAP cubes. This integration used to be done by using externalized query subjects, a feature that was replaced by this fully integrated method in version 8.3.

    For future versions, it would be beneficial if a user would be able to generate a physical cube directly from the DMR model in Framework Modeller.

    Security

    Part of the Cognos 8 security is already specified in Framework Manager. Framework Manager is used to control what user can access what object. It is highly recommended to grant access to all objects in the Consolidation View. Access can be restricted in the Presentation View.

    Row level security can also be put in place by creating embedded filters using security macro functions such as the LDAP username. This feature will restrict users to query data they are not allowed to. For example a district manager will only be able to query data of his district.

    Segments

    Segments are a very interesting feature to minimize maintenance. A segment is a new project that is linked by a shortcut to the main project. The master project has access to the entire model, including all segments. Suppose an organisation has a centralized data warehouse. The data warehouse team created a framework. The solution is very popular and other teams want to use part of the framework model or the framework model entirely in their environment, likely making minor changes such as adding new facts or dimensions.

    In stead of copying the framework for every team, a segment can be created for the other teams. This has a number of advantages:
    • when the centralized data warehouse changes, changes only need to propagated once to the Framework Model
    • the master project has access to the entire model, the central data warehouse team can analyse impact immediately
    • the centralized team can see how other teams are using the original framework
    In short, segmenting allows a durable solution for similar models used by different teams.

    Conclusion

    Cognos 8 Framework Manager manages the semantic layer of a Cognos 8 deployment. A modeller's main focus is creating a user-friendly, comprehensive model that provides in repeatable and predictable query results. Modeling a metadata layer requires an extensive business and underlying data source knowledge. To create a model that offers repeatable and predictable results, the model should always, physically or virtually be conceived as one or multiple star schemas. By structuring the framework in a layered approach, any downstream effects of database changes can be minimized. Cognos Framework Manager offers a well featured metadata modelling tool that allows a durable approach in modelling semantic layers.