Tabular models support data access through two modes: Cached mode and DirectQuery mode.
- MOLAP. This is the default and most frequently used storage mode. In this mode, when you process the cube, the source data is pulled from the relational store, the required aggregation is then performed within Analysis Services and finally the data is stored in the Analysis Services server in a compressed and optimized multidimensional format.
- ROLAP does not pull data from the underlying relational database source to the OLAP server but rather both the cube detail data and the aggregated data stay in the relational database source. In order to store the calculated aggregation the database server creates additional database objects (indexed views).
The DirectQuery Mode can be compared to the ROLAP mode; both do not store data and rely completely on the relational database to answer a query. ROLAP mode has very rarely been implemented because in many cases the performance is insufficient for end-users as soon as you get high volumes and in most scenarios you do not need real-time access to the data in the relational database. Pseudo real-time is sufficient and it can be achieved through the proactive caching mechanism or intermediate processes during the day. With the new column store index available in SQL Server 2012, a new possibility appears since the index column store can dramatically improve the performance of a Data Warehouse query on your relational database.
There are some limitations associated with the DirectQuery Mode:
- Only DAX is supported which means that it can only be used from Power View (i.e. you cannot browse it from an Excel Pivot Table).
- Time calculations are not supported.
- Calculated columns are not supported.
One last remark: there is also the possibility to choose a mixed mode Cached/DirectQuery. In this case data exists in both storage modes. There is one storage mode used by default to answer the queries but in the connection string you can specify which storage mode you want to use.
The vast majority of projects will probably use either MOLAP or Cached mode; the main difference is that MOLAP mode will build pre-aggregated data but store both the aggregated and atomic data on disk, while Cached mode will load all the atomic data in-memory in a column store format without computing any pre-aggregation. If you go for the Tabular model with Cached mode, the amount of available RAM becomes a constraint that needs to be taken into account. Think twice about this especially if you want to have a large number of different cubes running in parallel on the same server.
There is no simple rule to compute in advance the amount of memory needed to be able to load your entire cube because it depends on the compression of your data when stored in columns instead of rows. The rationale can be defined as follows : the lower the cardinality of your column (cardinality means the number of distinct values), the higher the compression rate of your data. This means that you should avoid including data like transaction identifiers, timestamps, etc. in your model to reduce the amount of memory required.
In this section, we are not going to cover the difference between the ROLAP and Direct Query mode since in both cases most of the work is transferred back to the relational database and not to the model itself.
As explained above, the Tabular in memory engine is fundamentally different from the Multidimensional MOLAP engine. The Tabular model reads data directly from the cache and takes advantage of the query acceleration resulting from the column store indexes, while the Multidimensional model is reading pre-aggregated data or atomic data from disk depending on the existing aggregations defined during the design of the cube.
- By default, the Tabular engine will give great performance without any special tuning.
- This does not mean that the Tabular model will be faster than the Multidimensional model. In fact, we could argue that for any existing aggregation the Multidimensional model is likely to give better performance especially if the aggregated data is already in cache. Contrary to the Tabular model, the Multidimensional model keeps the query results in cache and as such the more the cube is used, the better query performance you will get. The results of DAX queries are never saved within the cache which means that DAX queries will always take the same time to execute.
- The relational concept of the Tabular model will also give good performance results when querying data at the lowest granularity level. This is much more of an issue for the Multi-Dimensional model especially when one needs to cross-join some very big dimensions.
The table below summarizes the different scenarios that we can encounter:
Data Language Layer
The Multidimensional model uses MDX (Multidimensional Expression) as its data language. MDX is a very powerful language but requires a thorough understanding of multidimensional concepts and, therefore, is often perceived as rather complex and difficult to learn.
The Tabular model uses a new language introduced by PowerPivot called DAX (Data Analysis Expression). The syntax of DAX is very similar to the one of Excel formulas and DAX concepts are similar to relational database concepts and therefore easier to grasp for a developer without any experience of Multidimensional concepts. Even though using DAX is rather simple for basic computations (margin, ratio, etc.), it becomes more complex for advanced computations. You can (as with any language) produce the same result with different syntaxes, but each will have a very different performance behavior. The message here is that there is no free lunch, if you want to create some advanced BI applications, you will have to invest some time in learning DAX, even if it is easier than MDX.
Also, there are some specificities of the MDX script like SCOPE assignments (it gives the possibility to overwrite values in the multi-dimensional space) or definition of Named Set (very convenient to filter a report with the TOP 10 Customers for example) that cannot be done in DAX.
The Data Model Layer
Both Tabular and Multidimensional models work with the same concept where you create links between the underlying data set to define your relationships. One important difference is that you can only use one column to establish relationships between tables in a Tabular project whereas in Multi-Dimensional projects you can use multiple columns. This reinforces the need to enable a solid Data Warehouse with appropriate surrogate keys before you start using a Tabular project. Otherwise you will have to build a lot of concatenated key fields which:
- will be difficult to maintain,
- will impact your performance in a negative way and
- will require a lot of space (high cardinality character fields are not good candidates for high compression with the column store).
The Tabular project will seem easier for developers without Multidimensional experience because you do not need to define relationships between attributes. This step is mandatory in Multidimensional models in order to be able to define natural hierarchies and thereby optimize performance. Natural hierarchies are always processed into materialized hierarchies (i.e. the entire hierarchy is computed and optimized for querying). When faced with poor data quality, this can sometimes be very challenging.
Some limitations of Tabular compared to Multidimensional:
- You cannot define role-playing dimensions (this was especially useful for the Date dimension)
- You cannot define Many-to-Many Relationships (there are some workarounds to get the same results but you will have to use very complex DAX formulas)
- You cannot define a Parent-Child hierarchy
- Custom Rollups are not supported (especially useful when aggregating data following a chart of accounts)
- You cannot define Actions (Drill trough, Reporting, etc.)
- Write-back is not supported
Overall, a lot less properties are available in the Tabular model than in the Multidimensional model. Again this leads to the same conclusion, Tabular model is easier to define but it contains some limitations compared to Multidimensional model.
Another very important distinction between both models is that with the Tabular model you can import directly from an existing PowerPivot model. This means that you can easily transfer a personal BI application developed by a Business Analyst to the IT department. We understand the path proposed by Microsoft which is to create footbridges between Business and IT by allowing end-users to experiment with a new solution on their own via PowerPivot and to involve IT later on and only for solutions that have reached sufficient maturity. With this new paradigm, Microsoft presents its solution to BI’s current challenge: "How to get a controlled BI environment that is agile enough to react to continuous business changes?”
Every reporting solution (Microsoft Excel, SQL Server Reporting Services, etc.) that creates reports on top of the SSAS (SQL Server Analysis Services) Multidimensional model will be able to do the same using the Tabular model. They both use the same connection interface and they are both capable of interpreting MDX. So every reporting tool that can generate MDX will be able to use the Tabular model. The Tabular model engine is capable of translating MDX into DAX.
The Multidimensional model on the contrary is not capable of interpreting DAX languages. As a direct consequence, a reporting tool that would generate DAX queries will not be able to use a Multi-Dimensional project as a source. There is currently only one tool that generates DAX queries, which is Power View.
Power View provides intuitive ad-hoc reporting for end-users. They can easily create and interact with views of data from data models based on tabular models. Power View is a browser-based Silverlight application launched from SharePoint Server 2010. The simplicity of use and the capability of creating dynamic reports in a few minutes, which remain dynamical once exported to PowerPoint, will make Power View a very appealing tool for Business Analysts. It however requires Sharepoint 2010 to be deployed in the company.
A last remark, native DAX queries can sometimes be faster than MDX queries, which explains why under some circumstances the same report will be faster when executed from Power View instead of another reporting tool (i.e. Microsoft Excel).
There are no noticeable differences between the Tabular model and the Multidimensional model regarding the administration & management aspects. Both models contain partitioning, perspectives, dynamic security (one will be written in DAX and the other will be written in MDX), Management Views, Backups and AMO/XMLA API.
One difference is worth mentioning: the Tabular model does not support parallel processing of partitions which can have significant impact on the processing time.
The Tabular model looks very promising. Its ease of use and the ability to import a PowerPivot model is definitely something interesting to increase even further the agility of the BI environment within your company.
We can expect Microsoft to work further on the merger between the two models in future releases, but that has not been fully reached yet. This leaves us with the following question: "Which model to choose with SQL Server 2012” ?
Below we have highlighted the differences between the two models in terms of each layer and we can see that both products have advantages and disadvantages. To facilitate your choice we have summarized what we think are the key differentiators between both models:
- More performant than OLAP in majority of the case.
- Easier to develop than Multidimensional model.
- Technology for the future.
- Integration with Power Pivot.
- Mature Technology.
- Scalable Technology able to handle very large volume of data.
- Able to cope with advanced modeling/ computations requirements.
- New product so still enhancements to come.
- Limited to RAM available (= midsized project).
- Missing some advanced computations available with MDX.
- Cannot be used with Power View.
- No major innovations to expect in this product in the future.
- Higher complexity than Tabular.
Table 2: Pros and Cons Tabular Model vs. Multidimensional Model
And what is then our final recommendation? At this time, it is impossible to come up with a simple recommendation for one model regardless the type of project since on the one hand we have a very promising product which is still new and on the other hand we have a very mature technology for which no major innovations are to be expected in the future.
Consequently, we recommend the following:
- Do not migrate your existing multi-Dimensional models if your end-users are happy with it. There is a high probability that you will not be able to reproduce some existing functionalities which might be frustrating for your end-users. The only consideration that could lead to a rewrite in a Tabular model would be to be able to use Power View.
- For every new project, choose the Tabular model unless (1) you have a very high volume of data or (2) you have very complex business requirements (typically advanced financial reporting).