SAP Business Warehouse 7.3 : A step to in-memory datawarehousing ?
The newest version of SAP BW, 7.3, comes with a number of new infoproviders and features, seamingly optimized for the SAP HANA database.
SAP’s promise is an improvement in performance and scalability, improved integration with SAP BusinessObjects, more development options and simplified configuration and monitoring. The question which always pops up with a new version is the necessity and timing of an upgrade. One of the obvious reasons to perform an upgrade is the prerequisite to install SAP BusinessObjects Planning & Consolidation (BPC) version 10 on SAP BW. Without BPC as an upgrade driver, the answer becomes more complex.
In this insight, we will give an overview and discuss the different new infoproviders and features, and elaborate on performance tests executed on BW 7.3 environments. Integration with SAP Business Objects (Data Services) will not be discussed here. Additionally, new features/providers for BW Accelerator is not part of this discussion either, because BWA is a feature that will become obsolete in the future as HANA emerges.
Graphical modeling toolset
SAP BW 7.3 is shipped with a graphical data modeling tool set. In the overview given by SAP on BW 7.3, the goal of this toolset is to drastically reduce the manual work effort in creating dataflows. This proposition will be the subject of our discussion.
One of the first things that comes forward when you start creating a dataflow with the tool is that the graphics are similar to the 'display dataflow' function in earlier versions. The tool seems not to deliver an enhanced graphical interface.
The user has the option to start creating a dataflow from a template or create it from scratch.
When creating a dataflow from scratch, you have to drag in all the desired objects (datasources - DSO's - infosources - ...) , and connect them by dragging lines to each other - which will be your transformations and DTP's. The objects which are now displayed on the screen, are not created yet, they do not have any content and are in fact empty boxes. SAP calls them non-persistent objects. You can now choose to either create new objects for these containers or replace the non-persistent objects by already existing ones.
This process resembles design activities, and the design can now be directly entered in the system. However, there is no significant added value here, nor in the user interface as in the reduction of manual work effort.
Designing a dataflow in Microsoft Visio or Excel offers more possibilities and flexibility, and the reduction of manual effort by directly entering your design in the system is minimal. In the end, all the objects have to be created manually, together with info-objects, mappings, and ABAP.
The other option is to create a dataflow starting from a dataflow template.
SAP offers dataflow templates based on their layered, scalable architecture (LSA) principles, going from simple designs to complex dataflows for huge data volumes. This is shipped together with detailed descriptions for the use case attached to the template. After you deploy the template, the same steps need to be followed to implement the objects, as the template also incorporates non-persistent objects.
We can come to the same conclusion for the manual work reduction, it is neglectible. However, the templates should make it possible to force the BI developer to work using company standards for BW architecture (LSA or not). But in the end, this has more to do with the BI management enforcing architectural standards than it has something to do with available templates in the system .
The toolset also gives the developer the possibility to transport dataflows as a whole, whereas earlier versions forced you to group the objects manually. This might be an improvement for more complex dataflows, where DSO’s or masterdata not directly linked (only ABAP) can be grouped together in one dataflow; which creates more visibility in the system.
Given the different options and interfaces provided by the graphical modelling toolset, we can conclude that the tool is definitely not a giant step forward, nor it is creating huge value for IT or the business. As a result, this should not -on itself- be considered as a reason to upgrade to SAP BW 7.3.
Semantic Partitioned Objects
This is a new info-provider which groups together different physical infoproviders (DSO’s or InfoCubes) which are structurally identical but differ by a semantic split. The different infoproviders within the SPO contain the same dimensions but have data from a different region / costcenter / …
This split can be made also by means of complex logic imbedded in Business Add-ins (BADI). Of course, in earlier versions this could have been set up manually, but it would create an enormous manual effort.
SPO in a classic DB environment
This infoprovider makes a lot of sense when considering cases where one/multiple reports with a pre-defined structure need to be reported in every region/cost center/…
In that case a report runs only on one semantic partition, which drastically improves performance as the query should only hit one small infocube (achieved by so-called partition-pruning). An alternative to this are the physical partitions, but in SAP Business Warehouse these can only be set up on time characteristics, and this feature is database dependent.
When considering huge dataloads (f.e. global datawarehouses), SPO’s can be used to split the dataloads per country, to minimize the risk of the impact of failing dataloads of one country on another. This could also be achieved before BW 7.3, but considerable more manual effort was required.
SPO in a HANA environment
Of course, the remarks made for the classic DB also counts for the HANA environment but in a less manner. With HANA, the data resides in-memory and a data look-up is significantly faster than a classic database read. The absolute time gain will be smaller for the above scenario.
However, the use of SPO’s leads to parallel processing, so you optimize your resources for a HANA infrastructure which could use up to 80 parallell processing units. You will gain a lot of performance improvement for reports where you want to show summarized data from all regions/ cost centers / … The number of parallel processing units in a classic environment is much smaller, so the total gain for HANA will be higher for reports running over different regions / cost centers / … .
This new infoprovider combines historical data and (near) real-time data. In earlier versions, this was very hard to achieve in one infoprovider. This object should contain an infocube which contains the historical data and a virtual infocube or DSO with RDA. There is one transformation from the hybrid provider to the datasource, which is limited in complexity. When a query is executed from a hybridprovider, it reads the historic infocube and reads everything above the latest delta pointer for direct access.
Hybridprovider in a classic DB environment
In a classic DB, the virtualprovider is only useable for very small volumes of data. Knowing this, the only workeable option is to use the DSO with RDA within the hybridprovider. However, the restriction is here that the datasource used must be RDA-enabled, and there are almost no datasources which are RDA-enabled ! As a result, this new infoprovider will only be workable within very few scenario’s on a classic DB.
Hybridprovider in a HANA environment
Because of the restriction with RDA datasources, the virtualprovider will be the typical object to use within a HANA environment. The recommendation is that also your ERP system runs on HANA, as the virtualprovider will directly read from the ERP database to report real-time information. The same limitation exists for datasources connecting to virtualproviders : it needs to allow direct access.
Good news is that most finance and controlling datasources are supporting direct access. The typical business case can be derived from this : hybridproviders allow for financial closing reports where a combination of real-time data and historical data is very useful to speed up the the close cycles.
Transientproviders and Compositeproviders
Transient – and compositeproviders are specifically designed for real-time data reporting, but are omitting the SAP BW metadata concepts – no need for infoobjects – masterdata – infocube design - etc…
A transientprovider can directly access SAP ECC data, and the possibility exists to build these via the BW client in SAP ECC in order to use the BI tools directly on SAP ECC. It consists of a classic infoset or analytical index. The classic infoset is for a typical usage scenario in SAP ECC. For a more performant approach a SAP HANA model can be published to SAP BW, which generates an analytical index. A transientprovider can then be constructed on top of the analytical index. A compositeprovider can join different analytical indexes.
These functionalities seem very similar to what can be achieved with a BO universe, which can also directly connect to a SAP HANA model and combine them. This is why these providers are positioned as being prototyping instruments, in advance to a datawarehouse design.
One of the future interesting features will be the possibility of a direct connection to a BW datasource, which will provide a lot of added value to prototype BW, or enable very quick access to information with a minimal development effort. These types of providers are very interesting when considered within the overall market trend to a more agile approach to datawarehouse projects. In-memory computing will enable to start from prototypes and evolve to a datawarehouse design which better incorporates the business requirements.
This section will describe new specific interesting features in data loading and modelling.
- Hierarchy ETL : Up till now, we still had to use the old 3.x dataflows to load hierarchies. Finallly there is a new hierarchy datasource on which transformations and DTP’s can be built. Where in older versions complex ABAP was necessary to upload or change hierarchies in BW, now this is standardized in a transformation. This is definitely a step forward in terms of ETL functionality.
- The master data deletion function has been upgraded, to enable system admins to minimize runtime and let the deletion run in background without process chain.
- Dataflow generation wizard when an infocube is created : this is also a feature to promote prototyping in SAP BW. The user only needs to know the desired fields and datasource to generate a simple dataflow.
- Generic delta is now possible for Universal Data (UD) & Database Connect (DB) options, as well as for flat files.
- Navigation attributes as source fields : This extra feature for transformations could replace masterdata lookups. This is an interesting feature which increases the system visibility, but will be subject to our performance tests.
- Read DSO transformation rule : Where BW developers used to program ABAP to perform lookups in other DSO’s , now a standard rule is available to perform this lookup with the key fields of the DSO. As this is an interesting feature in terms of visibility in the system, we are also interested in the performance impact, so also this will be subject of our below described tests.
Some articles suggest that SAP BW 7.3 is a quantum leap when it comes to better data load features.
Let’s take this statement and perform some live-system tests. Most of the tests are performed on a classic DB environment : server with 8 Intel Xeon 2,9 GHz processors, 16 GB RAM , Windows Server Enterprise. We used the standard DSO 0FIGL_O10 as basis for our tests, where we uploaded 1.038.220 records. We also made use of SAP’s HANA demo environment to run some performance tests. The HANA environment runs on a server with a Quad 2,13 GHz Intel XeonEZ-4830 processor, 16 GB RAM, Windows Server Enterprise.
- in BW 7.3 the data activation is changed from single lookups to package fetch of the active table, resulting in faster activation and less locks on the lookup tables. As we did not have a perfect test environment on our hands (BW 7.0 and BW 7.3 running on same hardware) we could not perform a valid test. However, other lab tests have indicated that we could expect an increase of 20% to 40% in activation performance.
It is interesting however to note the difference in activation times between a HANA environment and a classic DB environment.
We noted following activation times when most of the SID’s were already generated on the classic DB :
Image 1: Activation classic DB BW7.3
Exactly the same amount of records were activated in the identical DSO on the HANA environment :
Image 2: Activation HANA BW7.3
The performance difference between the classic DB and HANA is huge, we are talking about an improvement with a factor 22.
- During the infocube load, BW 7.3 now makes use of mass processing during SID and DIM-ID determination phase. This option can be switched on/off by setting the RSADMIN parameter – RSDD_ENABLE_MASS_CUBE_WRITE . This will be subject to our tests.
The below figures show the results for a classic DB :
Image 3: No Mass Cube Write Classic DB BW7.3
Image 4: Mass Cube Write Classic DB BW7.3
If we calculate the ratio between the two duration differences (59 Min 20 sec and 45 Min 12 sec) we come to (3560 – 2712) /3560 ~ 24% performance improvement in our test environment.
Now we will look at the improvement in the HANA environment.
Image 5: No Mass Cube Write HANA - limited nr of records
We can directly see a huge improvement in the load performance due to HANA , again with a factor around 20. If we would like to come to a decent estimate for the ratio in HANA, we will re-do this test with a lot more records (12.830.893) and a DSO which includes a lot more infoobjects (129 as opposed to 20 in the DSO 0FIGL_O10). The results are presented below :
Image 6: No Mass Cube Write Hana - large volume
The load finished in 3 hours 42 min 23 sec = 13.343 seconds.
Below the results for the mass cube write on the HANA platform :
Image 7: Mass Cube Write Hana - large volume
click to enlarge
The load finished in 52 minutes 4 secounds = 3.124 seconds . Then we come to a ratio of (13343 – 3124) /3124 ~ 327% performance improvement, a huge difference with the ratio seen on the classic Db environment. The test confirms the statement that HANA provides a significant increase in the level of parallelization.
Now we will take a closer look at the performance of new data load features : navigational attributes and datastore lookup.
We perform the tests on our classic DB server. For this, we loaded the same number of records from one DSO to another, doing a lookup in the company code masterdata (containing 13.971 records) and a DSO containing the same data (+1M records).
Let’s first take a look at the results with a masterdata lookup using ABAP code – in a non-performant manner (doing a SELECT in the routine from the rule) and a peformant manner (using internal tables doing the SELECT in the start routine) :
Image 8: Masterdata Lookup SELECT in rule
Image 9: Masterdata Lookup , SELECT in start routine
We notice a very small difference here in both examples, also due to the low table size of masterdata – which is the normal case for this data type. Below the result using the navigational attribute as the source :
Image 10: Masterdata Lookup, Nav. Attr. as Source
We notice no considerable negative performance impact from using the navigational attributes, it is even likelily to be more performant.
For the DSO lookups, the non-performant lookup is reading the DSO directly in the rule with the table key. We use the table key as the ‘read DSO rule’ is also using the key. The performant code makes use of sorted internal tables because of the large table size. Let’s take a look at the results from the DSO lookup :
Image 11: DSO Lookup, SELECT in the rule
Image 12: DSO Lookup, SELECT In start routine to sorted table
We see a significant performance difference between the two coding examples. Below the result of using the new read DSO rule :
Image 13: DSO Lookup, DSO lookup rule
The new lookup rule seems to be performing well, comparing with the performant and non-performant code.
Where we have seen that the new graphical modelling toolset is not living up to its promises, BW 7.3 provides enough alternative reasons for upgrading your system, even if you choose to continue with a classic database supporting the datawarehouse.
BW 7.3 could deliver the necessary performance improvements to speed up large data loads and heavy reports, and enhance functionality to support complex dataflows.
If a company has chosen to migrate to HANA, an upgrade to 7.3 is required. Besides the minimal version requirement, 7.3 is shipped with extra features which take advantage from the in-memory database.
BW 7.3 provides more possibilities to engage in BW prototyping and direct reporting on SAP ECC, enabled by a performance increase generated by HANA and new BW 7.3 infoproviders which are HANA-optimized. The HANA-alignment strategy of SAP is very present in BW 7.3.
If Moore’s law ("computer technologies roughly double every 2 years”) continues to be true, the cost of in-memory databases will further decrease and all datawarehouses and transactional systems will be in-memory.
SAP BW 7.3 supports this evolution, and everybody wanting to benefit from this should consider a BW upgrade to version 7.3.