Performance of internal tables in SAP BW transformations

Introduction

ABAP… for many developers who put their first steps into SAP’s proprietary programming language it means an adventure of discoveries and refinement that can take a long long time. As it has for me (and still is), over the many years since which I am using it in my job as an SAP BW consultant.

Not everyone is holding on to the journey though; some people will take it up to the point where a working situation is good enough. Until, at a sudden point, the problem arises that time becomes critical.

As with each programming language, there are good ways to do things wrong, wrong ways to do things right, but there is no common approach that will fit to each case you will encounter in your business or technical environment.

This paper does not have the intention to define the best approach for you: it will show you some possibilities of good (or better?) ways to do things right, so you can evaluate and select the best option for your case.

Describing each aspect of ABAP would produce more of a book (series) rather than a paper, so I would like to limit the scope to one specific part: the use of internal tables in transformations in SAP BW.

Business case

Reading (but also writing) about programming code can become very abstract without some kind of reference or example. Therefore imagine the business we are ABAPing for is a car rental company "Dreamrent”, renting of course exotic sports cars but also trucks and motorhomes.

Performance of internal tables in SAP BW transformations

Internal tables

Internal tables resemble the array functionality found in other programming language: an in-memory series of records with a certain structure.

Three types of internal tables exist in ABAP:

  • Standard table
  • Sorted table: always keeps its records sorted according to the defined key
  • Hashed table: a special table with a hash administration that allows access time to be constant, irrespective of its size

SAP ECC vs SAP BW ?

Why is this paper specific to the SAP BW world ? There is no difference in the ABAP language itself, however the environment is different. The SAP ECC is transactional, meaning the data volumes processed within one program are typically limited, in contrast to SAP BW where large volumes of data are processed in bulk. It’s this repetitiveness what can make or break a correct handling of internal tables. One sub-optimal internal table access does not harm, but what if we need to access it ten million times?

Another way in which SAP BW is specific is the predefined framework on which we can rely when using internal tables. Table names for DSOs, InfoObjects etc. all have a fixed pattern.

Global and local areas are also different: whereas in the transactional SAP ECC you can write includes and have global variables access in these includes, in SAP BW you have to keep in mind that the transformation is (usually) going to be executed in parallel mode, with no real global area across all parallel executions.

The most useful place for internal tables in BW transformations is in the start and/or end routine, where all source or result package data is available as an internal table itself, on which you can process logic in one big step.

A second use of internal tables is in the individual transformation rules for target InfoObjects. An example in our business case could be a lookup of the tax category of a vehicle that is stored in DSO_VEH. In this case, the (relevant) lookup data would be filled in the internal table inside the start routine (so only once), and this internal table would be read in the transformation rule of the /BIC/ZTAXCAT InfoObject, for each record. It looks a bit like the master data lookup functionality, but you can use more complex logic for the derivation. Of course you need to choose the appropriate internal table type for your lookup data …

Taking this a step further, you could sort your data in the source package in such a way that you only need to read the internal table once for each sorted key when looping through the transformation rules.

This technique is completely different from using aselect statement with each execution of the transformation rule, which is a technique that could be fine for very small data sets but will result in an unpredictably high runtime for big data volumes.

The basics

Declaration

In older ABAP releases, there used to be an internal table header line (work area). This was a special line that needed no declaration by its own (it became available with the table itself) and always contained the last accessed line of that internal table. It was a confusing mechanism because the variable name to access the contents of the header line was the same as the table itself, plus it was difficult to keep track of what was actually in that line.

For backward compatibility reasons it is still supported, but you should refrain from using it and declare your own structure. This internal table definition would be useful for containing the red cars from the DSO of vehicles from our business case.

DATA: BEGIN OF LS_CARS,
/BIC/ZVEHICLE TYPE /BIC/OIZVEHICLE,
"unique id for each car
/BIC/ZBRAND TYPE /BIC/OIZBRAND,
/BIC/ZMODEL TYPE /BIC/OIZMODEL,
/BIC/ZCOLOR TYPE /BIC/OIZCOLOR,
/BIC/ZMSTART TYPE /BI0/OICALMONTH,
"month/year of entrance
END OF LS_CARS,
LT_REDCARS LIKE STANDARD TABLE OF LS_CARS.


When declaring sorted or hashed tables, you need to specify a key as well. For sorted tables this may be unique or not, hashed tables must be created with a unique key.

Filling the internal table

Internal tables can be filled with records one by one in a routine itself, or can be filled by select statements on the underlying database. We will concentrate on the latter.

A. Select into table

Instead of using selectwith corresponding endselect, while inserting the lines one by one into the table, use the into table addition. This will be a lot faster.

B. Minimize, both horizontally and vertically

Working with a larger internal table than necessary will have different negative effects: a lot more data needs to be transferred between database and application, more memory is required and afterwards more time is required to access the information in the internal table. Although it is tempting to write:

SELECT
*

FROM /BIC/ADSO_CARS00
INTO TABLE LT_REDCARS
WHERE /BIC/ZTYPE EQ ‘CAR’.


it is not ideal, because there are more fields in our database table than we have in the internal table structure. So the database will select all fields for all cars from the database table and will have to throw away the unnecessary fields. Better practice would be to select the exact information that you need, so no time is lost in the selection. Also, since we are only interested in the red cars, it is better to make this selection at the beginning in a where clause to keep the table small.

 

SELECT
/BIC/ZVEHICLE
/BIC/ZBRAND

/BIC/ZMODEL

/BIC/ZCOLOR

/BIC/ZMSTART

FROM /BIC/ADSO_CARS00
INTO TABLE LT_REDCARS
WHERE /BIC/ZTYPE EQ ‘CAR’
AND /BIC/ZCOLOR EQ ‘RED’.

C. Solid selection

The select … into table statement inserts the fields into the internal table in the order in which they were selected. If the field order in the database table would change, the data would end up in the wrong columns of the internal table. Therefore, use the corresponding fields of addition (which in turn requires the names of the fields to match between database table and internal table !).

 

SELECT
/BIC/ZVEHICLE
/BIC/ZBRAND

/BIC/ZMODEL

/BIC/ZCOLOR

/BIC/ZMSTART

FROM /BIC/ADSO_CARS00
INTO CORRESPONDING FIELDS OF TABLE LT_REDCARS
WHERE /BIC/ZTYPE EQ ‘CAR’
AND /BIC/ZCOLOR EQ ‘RED’.


 

D. Mind the database

Now that our database selection is optimal, we should ask ourselves: "Is it optimal?”. The answer could still be negative. Since the ABAP selectstatement is translated into native SQL (depending on the underlying database), it is always useful to consider index definitions. Database performance enhancements still work, even if they are not visible in the ABAP programming layer.

Cleaning up

Data processing in BW usually does a good job in garbage collection, i.e. freeing memory after programs have finished. However this shouldn’t keep you from taking the broom in your hands and clearing internal tables you no longer need. Some of the techniques below could call for significant temporary internal tables for getting around a problem. Refresh the internal table using empty brackets [] if you still need the structure further on or free it to release all the related memory.

REFRESH LT_REDCARS[].

FREE LT_REDCARS.


Advanced

To delta or not to delta ?

Loading procedures in BW can be done in a delta mode if the business logic allows for it. But before you can load into delta mode, you will have to initialize the delta by loading all data which is already available in the source. This means that during the initial load, your code will have to work through packages of (by default) 50.000 records, and afterwards through just one package of a couple of hundred records. Some of the design options you take could pay off better in one or the other scenario.

For all entries

The ABAP statement for all entries looks straightforward, but can have quite some side effects as I will explain. It is meant to select entries from a database table, but only those which correspond to values of a field in an internal table.

A. Purpose

To make it clearer, let’s take another example from our business case. Imagine you want to get the rental information about our red cars into an internal table. We already know the vehicles’ unique identifiers because they are in table LT_REDCAR.

A developer not knowing about the existence of the for all entries statement could have come up with the following valid but slow approach:

LOOP AT
LT_REDCAR INTO LS_REDCAR.
SELECT

/BIC/ZVEHICLE
/BIC/ZDATFROM

/BIC/ZDATTO

/BIC/ZCLIENT

/BIC/ZMILES

FROM/BIC/ADSO_RENT00
APPENDING CORRESPONDING FIELDS OF TABLELT_REDCAR_RENT
WHERE/BIC/ZVEHICLE EQ LS_REDCAR-/BIC/ZVEHICLE.
ENDLOOP.

As opposed to using for all entries:

SELECT
/BIC/ZVEHICLE
/BIC/ZDATFROM

/BIC/ZDATTO

/BIC/ZCLIENT

/BIC/ZMILES

FROM /BIC/ADSO_RENT00
INTO CORRESPONDING FIELDS OF TABLELT_REDCAR_RENT
FOR ALL ENTRIES IN LT_REDCAR
WHERE/BIC/ZVEHICLE EQ LT_REDCAR-/BIC/ZVEHICLE.

It may not look like a big difference, but the first solution will go back and forth to the database, returning the rows one by one, whereas the second one will create one SQL statement for the database to get all records in one roundtrip. Or will it ?

B. For no entries ?

In life, learning can go two ways:

  • you acquire a large amount of theoretical knowledge, without having a direct application for it, after which it fades into a fair amount of knowledge.
  • You acquire a large amount of practical knowledge, making mistakes in real situations and remembering not to make the same mistake again.

The truth will lie somewhere in between… that being said, the following easy mistake is to be considered when using for all entries:

It could happen that the source internal table (the red cars) is empty. You would thus expect no rental information to be loaded into the target internal table, right ? Wrong… for a bizarre reason, in my humble opinion, SAP decided that in such case the whole table of rentals would be fetched from the database… Beats me. Or has beaten me a couple of times, enough not to forget. And when you think: "Is it really this way ?” and start looking in the ABAP help, there it is: neatly describing that all rows will be thrown at your poor internal table. It’s this way L.

C. Behind the scenes

As I mentioned above, the ABAP programming layer is making abstraction of the underlying database platform in order to make it work on multiple platforms. So all SQL statements we write in ABAP language are in fact translated into the real database statements. This means that "normal” SQL elements like select, where and group by are translated to the corresponding database specific syntax.

The for all entrieshowever generates something else: a where clause with IN-operator, e.g. on a DB2 database:

SELECT
"/BIC/ZVEHICLE”
...
FROM"/BIC/ADSO_RENT00”
WHERE "/BIC/ZVEHICLE” IN (‘0000003’, ‘0000006’, ‘0000013’, ‘0000019’, ‘0000020’, ‘0000024’, ‘0000029’, ‘0000030’)

Our database is not going to have a problem with an IN-operator of just eight values, not even a hundred. But what if our business case company would grow globally and had tens of thousands of red cars ? Then the list would become very long… and a potential performance killer. From a certain number onwards, the system will eventually break down the list in several sequential SQL statements.

In the introduction I already mentioned that rules are hard to define, except the rule that you should check your options before tackling a situation.

D. Minimize, part 2

Our new internal table LT_REDCAR_RENT from above now contains the rental information on red cars. Imagine we would like to get the address information for the clients, and that we have a really large professional customer which rented many different red cars.

SELECT
STREET, 

HOUSE

...

FROM /BIC/ADSO_CLNT00
INTO CORRESPONDING FIELDS OF TABLELT_REDCAR_RENT_CLNT
FOR ALL ENTRIES IN LT_REDCAR_RENT
WHERE/BIC/ZCLIENT EQ LT_REDCAR_RENT-/BIC/ZCLIENT.


Because the client is going to appear multiple times in the source internal table (LT_REDCAR_RENT), the result could be that a lookup is made in the client database table several times for the same client, which the ABAP layer will have to filter out, which in turn costs time…

Saving some of this same can be done by creating a unique list of all clients appearing in the source internal table. This "list” will be… another internal table, however smaller and temporary if you clean up your internal tables as soon as you don’t need them anymore.

Tip: hashed tables are useful for this because we can use the special collect statement, avoiding the manual check for existing entries before adding them to that table.

E. Detour

Depending on the data you are working with, there can be other ways to overcome this issue:

  • In case your source internal table is the source package of the transformation, a smaller package size could help. Of course more packages will have to run, but there will be less overhead in the translation into SQL.
  • You could use select on a join of tables, which already combines the information for you at the database level.
  • When you realize you are going to take more than half of the database table, it could be quicker to:
    • Make a temporary hashed table where you collect the distinct keys of your source internal table.
    • Select all records from the database in an internal table.
    • Loop over the internal table, deleting the current line in case no match is found in the hashed table.
  • In order to avoid using up all the system capacity at once, you could implement reading the database table using a cursor, which would fetch the data in blocks but would access each part of the database table just once.

Find that line

Earlier on I mentioned the three internal table types which exist, here are some ideas on their usage.

A.Sorted table

A sorted table has the advantage that read statements automatically happen with a binary search. Sorted tables are useful when looking up records within date ranges (date-from and/or date-to in the internal table).

B. Hashed table

Hashed tables require a bit more overhead, but are preferred if you are going to read it by its full unique table key. The table is not suitable for loop statements with a where clause though. A hashed table can be useful for aggregating information: if the non-key fields are numeric, you can quickly summarize the information.

Divide and conquer

If we need two separate internal tables from the same database table (with some different fields), there are some options:

  • Reading the database table once per internal table A and B
  • Reading the database table only once in table C (containing all fields), copying it to A after which irrelevant rows are removed (delete… where), then copying C to B after which irrelevant rows are removed (delete… where), and freeing up memory taken by C

Of course the above approach is not applicable in all cases… but it’s an idea to keep.

Especially the delete … where combination is much faster than looping over an internal table and checking and deleting lines individually.

Team up, but not entirely

As if the ABAP language would not have enough functions and possibilities, the SQL statements also offer a series of performance optimization options. Making sure you consider every option available when writing the SQL, like the inner/outer/left join. Letting the database take a reasonable amount of the necessary processing can result in performance gain, because handling the data afterwards becomes easier. Except for sorting. Keep that part for your ABAP code, as it will be more efficient.

Aggregate functions can be useful as well, combined with other fields it will also need grouping by those fields, the havingclause…. Tip: if you check on the return code of an SQL that has just an aggregate function, it will always be successful.

Some SQL statements (like aggregate functions above) will bypass table buffering, which could be relevant or not to your system landscape.

Multiply, but with care

Sometimes you will encounter that multiple actions need to happen on your internal table. All actions like sorting or reading are not necessarily efficient on all types of internal tables, so sometimes you would like the internal table to have different types.

This is perfectly possible, and there is a good way to do this right: copy the table from one type to the other in one go:

LT_ITAB_HASHED[] = LT_ITAB_STANDARD[].


Copying the internal table in this way implies that the structure is identical and the key definition is respected.

Merry-stop-going-round

Working with internal tables as described above, using a structure and internal table, can sometimes mean a lot of extra processing if you need to alter many lines in the internal table. You would need a structure for reading the data, keep the index of the read line, then modify your field values in the structure and write the result back using the index. This is not even an option for hashed tables which do not allow index access.

Let’s put this into practice. Like our LT_REDCAR internal table from before, containing all red cars, we would like to have one which also contains the sum of mileages that were driven with it, LT_REDCAR_MILES. The mileage info can be found in the LT_REDCAR_RENT internal table from before.

LOOP AT
LT_REDCAR_MILESINTO LS_REDCAR_MILES.
LV-INDEX = SY-INDEX.
CLEARLV_COUNTER.
LOOPAT LT_REDCAR_RENT INTOLS_REDCAR_RENT
WHERE/BIC/ZVEHICLE = LS_REDCAR_MILES-/BIC/ZVEHICLE.
LV-COUNTER = LV-COUNTER + LS_REDCAR_RENT-/BIC/ZMILES.
ENDLOOP.
LS_REDCAR_MILES-/BIC/ZMILES = LV-COUNTER.
UPDATELT_REDCAR_MILES FROM LS_REDCAR_MILES INDEX LV-INDEX.
ENDLOOP.


There is a tool at our disposal to optimize such processing: field symbols. These are special variables which do not contain a value themselves, but point to something with a value instead. This pointer can be a whole work area or just one field (and other options not discussed here).

When using field symbols (and enhancing the summation part a bit) it can work more efficiently (the field symbol declaration example is listed first):

FIELD-SYMBOLS:

TYPE ANY,

TYPE ANY.

LOOP AT LT_REDCAR_MILES ASSIGNING
.
ASSIGN COMPONENT ‘/BIC/ZMILES’

OF STRUCTURE
TO
.
LOOPAT LT_REDCAR_RENT INTOLS_REDCAR_RENT
WHERE/BIC/ZVEHICLE =
-/BIC/ZVEHICLE.
ADDLS_REDCAR_RENT-/BIC/ZMILES TO
.
ENDLOOP.
ENDLOOP.


The code above shows how individual fields can still be accessed via the field symbol that points to a structure (the whereclause), and how direct access to a field can be done (add statement).

Tip: by using field symbols in a smart way, you could make your code even more dynamic, so that it can be used for e.g. a variable amount of fields in your structure.

Selection table

The selection table (or range) is a powerful built-in concept in ABAP. Although primarily being used for GUI screen development, there are other ways to use it. You can think of it as an internal table on its own for a certain data type, with the following fixed structure:

  • SIGN: I for include, E for exclude.
  • OPTION: the operators EQ, NE, GT, LT etc. and even CP and NP (patterns). In case the HIGH field is filled, BT and NB can be used for (not) between.
  • LOW: the value of the data type that will be used in a single selection criterion, or the lower limit for a BT or NB selection.
  • HIGH: the value of the data type that will be used as upper limit for the selection criterion BT or NB.

A selection table can be used right away in the where clause of a SQL statement, but also in a where clause for looping over an internal table.

Assume we want to select the car rentals that started in the first week of each month in Q1 of 2012, from DSO_RENT.

ANGES
S_DATE
FOR
/BI0/OICALDAY.

S_DATE-SIGN = 'I'.
S_DATE-OPTION = 'BT'.
S_DATE-LOW = '20120101'.
S_DATE-HIGH = '20120107'.
APPEND
S_DATE.

S_DATE-SIGN = 'I'.
S_DATE-OPTION = 'BT'.
S_DATE-LOW = '20120201'.
S_DATE-HIGH = '20120207'.
APPEND
S_DATE.

S_DATE-SIGN = 'I'.
S_DATE-OPTION = 'BT'.
S_DATE-LOW = '20120301'.
S_DATE-HIGH = '20120307'.
APPEND
S_DATE.

LOOP AT
LT_RENT
INTO
LS_RENT
WHERE
/BIC/ZDATFROM
IN
S_DATE.
...
ENDLOOP.


The in operator does the three selections in one go.

Tip: a where clause can get very dynamic in this way, if your conditions are maintained centrally.

Conclusion

Although it may not always be crystal clear which type of internal table or database select will deliver the best performance, hopefully this insight has been able to shed some light on the options you have writing ABAP code. When looking at and testing the different alternatives, it may also be useful reading another insight about how you can measure the performance of your code: Debugging ABAP routine Performance in SAP Business Warehouse.