Getting the most out of your Informatica PowerCenter 8 Environment

Informatica

Informatica is an independent company providing data-integration software and services. Founded in 1993, Informatica employs more then 1.400 employees and has more than 3.000 customers worldwide, including 91 of the Fortune 100 companies.

PowerCenter 8 is Informatica's enterprise data integration platform that serves as the foundation for all data integration projects. PowerCenter 8 is a scalable, platform independent product, capable of reading from and writing to virtually
any data source. The platform allows for processing both structured and unstructured data coming from relational, file-based, mainframe and even message queue data sources. Data can be processed in batch, real-time or by using change data capture (CDC). Dynamic partitioning allows for data to load in concurrent data streams. PowerCenter 8 supports high availability, grid-computing and has extensive failover options. Metadata is stored in a global metadata repository, enabling data lineage functionality.

Informatica PowerCenter 8 is currently distributed in three offerings: The Standard Edition offers a high performance data integration server with global metadata infrastructure and productive, visual tools for development and administration. Real-time Edition extends those capabilities by enabling change data capture (CDC), dynamic partitioning and
extended support for scalable services. The Advanced Edition finally will add data profiling and reporting capabilities, extended metadata analysis and team based development.
 

Tuning

Tuning a PowerCenter 8 ETL environment is not that straightforward. A chain is only as strong as the weakest link. There are four crucial domains that require attention: system, network, database and the PowerCenter 8 installation itself. It goes without saying that without a well performing infrastructure the tuning of the PowerCenter 8 environment will not make much of a difference.

As the first three domains are located in the realms of administrators, this article will only briefly touch these subjects and will mainly focus
on the items available to developers within PowerCenter 8.

Tuning is an iterative process: at each iteration the largest bottleneck is removed, gradually improving performance. Bottlenecks can occur on the system, on the database (either source or target), or within the mapping or session ran by the Integration Service. To identify bottlenecks, run test sessions, monitor the system usage and gather advanced performance statistics while running. Examine the session log in detail as it provides valuable information concerning session performance. From the perspective of a developer, search for performance problems in the following order:

  • source / target
  • mapping
  • session
  • system
Bear in mind that major changes were done in the underlying architecture between PowerCenter version 7 and PowerCenter version 8. If possible use the latest available version of PowerCenter (currently 8.6).
 
If tuning the mapping and session still proves to be inadequate, the underlying system will need to be examined closer. This extended examination needs to be done in close collaboration with the system administrators and database administrators (DBA). They have several options to improve performance without invoking hardware changes. Examples are distributing database files over different disks, improving network bandwidth and lightening the server workload by moving other applications. However if none of this helps, only hardware upgrades will bring redemption to your performance problems.
 

Session logs

The PowerCenter 8 session log provides very detailed information that can be used to establish a baseline and will identify potential problems.

Very useful for the developer are the detailed thread statistics that will help benchmarking your actions. The thread statistics will show if the bottlenecks occur while transforming data or while reading/writing. Always focus attention on the thread with the highest busy percentage first. For every thread, detailed information on the run and idle time are presented. The busy percentage is calculated: (run time idle time) / (run time * 100).

Each session has a minimum of three threads:

  • reader thread
  • transformation thread
  • writer thread
An example:

***** RUN INFO FOR TGT LOAD ORDER GROUP [1], CONCURRENT SET [1] *****
Thread [READER_1_1_1] created for [the read stage] of partition point [SQ_X_T_CT_F_SITE_WK_ENROLL] has completed: Total Run Time = [31.984171] secs, Total Idle Time = [0.000000] secs, Busy Percentage = [100.000000].
Thread [TRANSF_1_1_1] created for [the transformation stage] of partition point [SQ_X_T_CT_F_SITE_WK_ENROLL] has completed: Total Run Time = [0.624996] secs, Total Idle Time = [0.453115] secs, Busy Percentage = [27.501083].
Thread [WRITER_1_*_1] created for [the write stage] of partition point [T_CT_F_SITE_WK_BSC] has completed: Total Run Time = [476.668825] secs, Total Idle Time = [0.000000] secs, Busy Percentage = [100.000000].

In this particular case it is obvious that the database can be considered as the main bottleneck. Both reading and writing use most of the execution time. The actual transformations only use a very small amount of time. If a reader or writer thread is 100% busy, consider partitioning the session. This will allow the mapping to open several connections to the database, each reading/writing data from/to a partition thus improving data read/write speed.

Severity Timestamp Node Thread Message Code Message
INFO 23/Dec/2008 09:02:22 node_etl02 MANAGER PETL_24031

***** RUN INFO FOR TGT LOAD ORDER GROUP [1], CONCURRENT SET [1] *****
Thread [READER_1_1_1] created for [the read stage] of partition point [SQ_T_CT_F_SITE_WK_BSC] has completed. The total run time was insufficient for any meaningful statistics.
Thread [TRANSF_1_1_1] created for [the transformation stage] of partition point [SQ_T_CT_F_SITE_WK_BSC] has completed: Total Run Time = [22.765478] secs, Total Idle Time = [0.000000] secs, Busy Percentage = [100.000000].
Thread [WRITER_1_*_1] created for [the write stage] of partition point [T_CT_F_SITE_WK_BSC] has completed: Total Run Time = [30.937302] secs, Total Idle Time = [20.345600] secs, Busy Percentage = [34.23602355].
 

In the example above, the transformation thread poses the largest bottleneck and needs to be dealt with first. The reader thread finished so quickly no meaningful statistics were possible. The writer thread spends the majority of time in the idle state, waiting for data to emerge from the transformation thread. Perhaps an unsorted aggregator is used, causing the Integration Service to sort all data before releasing any aggregated record ?

The number of threads can increase if the sessions will read/write to multiple targets, if sessions have multiple execution paths, if partitioning is used
 

Establishing a baseline

To be able to benchmark the increase or decrease of performance following an action it is important to establish a baseline to compare with. It is good practice to log in detail every iteration in the tuning process. This log will enable the developer to clearly identify the actions that enhanced or decreased performance and serve as later reference for future tuning efforts. Never assume that an action will improve performance because the action is a best practice or worked before: always test and compare with hard figures. The thread statistics are used to build this log.

This log file could look like this:

Figure 1: An example log file
 

Optimally reading sources

Reading from sources breaks down into two distinct categories: reading relational sources and reading flat files. Sometimes, both source types are combined in a single mapping.
A homogeneous join is a join between relational sources that combine data from a single origin: for example a number of Oracle tables being joined.
A heterogeneous joins is a join between sources that combine data from different origins: when for example Oracle-data is joined with a flat file.
 
Whatever source you are trying to read, always try to limit the incoming data stream maximally. Place filters as early as possible in the mapping, preferably in the source qualifier. This will ensure only data needed by the Integration Services is picked up from the database and transported over the network. If you suspect the performance of reading relational data is not optimal, replace the relational source with a flat file source containing the same data. If there is a difference in performance, the path towards the source database should be investigated more closely, such as execution plan of the query, database performance, network performance, network package sizes,
 
When using homogeneous relational sources, use a single source qualifier with a user defined join instead of a joiner transformation. This will force the join being executed on the database instead of the PowerCenter 8 platform. If a joiner transformation is used instead, all data is first picked up from the database server, then transported to the PowerCenter 8 platform, sorted and only as a last step joined by the Integration server.
Consider pre-sorting the data in the database, this will make further sorting for later aggregators, joiners, by the Integration Service unnecessary. Make sure the query executed on the database has a favourable execution plan. Use the explain plan (Oracle) facility to verify the query's execution plan if indexes are optimally used. Do not use synonyms or database links unless really needed as these will slow down the data stream.

In general it is good practice to always generate keys for primary key and foreign key fields. If no key is available or known a dummy key should be used. In the reference table an extra dummy record should be inserted. This method will improve join performance when using homogenous joins. In general three dummy rows should be included:

  • 999999 Not applicable
  • 999998 Not available
  • 999997 Missing
When using heterogeneous sources there is no alternative but to use a joiner transformation. To ease up matters for the Integration service, ensure that all relational sources are sorted and joined in advance in the source qualifier. Flat file sources need to be sorted before joining, using a sorter transformation. When joining the 2 sorted sources, check the sorted input property at the joiner transformation. The sorted input option allows the joiner transformation to start passing records to subsequent transformations as soon as the key value changes. Normal behaviour would be to hold passing data until all data is sorted and processed in the joiner transformation.
By matching the session property Line Sequential buffer length to the size of exactly one record overhead is minimized. If possible stage flat files in a staging table. Joining and filtering can then be done in the database.
 

Optimally writing to targets

One of the most common performance issues in PowerCenter 8 is slow writing to target databases. This is usually caused by a lack of database or network performance. You can test for this behaviour by replacing the relational target with a flat file target. If performance increases considerably it is clear something is wrong with the relational target.
Indexes are usually the root cause of slow target behaviour. In Oracle, every index on a table will decrease the performance of an insert statement by 25%. The more indexes are defined, the slower insert/update statements will be. Every time an update/insert statement is executed, the indexes need to be updated as well. Try dropping the indexes on the target table. If this does not increases performance, the network is likely causing the problem.

In general avoid having too many targets in a single mapping. Increasing the commit interval will decrease the amount of session overhead. Three different commit types are available for targets:

  • target based commit: fastest
  • source base commit: in between
  • user defined commit: slowest , avoid using user defined commit when not really necessary
There are two likely scenarios when writing to targets:

Session is only inserting data

PowerCenter 8 has two methods for inserting data in a relational target: normal or bulk loads. Normal loads will generate DML-statements. Bulk loads will bypass the database log and are available for DB2, Sybase, Oracle (SQL Loader), or Microsoft SQL Server. This loading method has a considerable performance gain but has two drawbacks: the recovery of a session will not be possible as no rollback data is kept by the database. When bulk loading, the target table cannot have any indexes defined upon, so drop and recreate the indexes before and after the session. For every case you will have to weigh if dropping and recreating the indexes while using a bulk load outperforms a classic insert-statement with all indexes in place.
 
Remember to use a very large commit interval when using bulk loads with Oracle and Microsoft to avoid unnecessary overhead. Dropping and recreating indexes can be done by using pre -and post session tasks or by calling a stored procedure within the mapping.

Session is mainly updating a limited set of data

When the session is updating a set of records in a large table, the use of a primary key or unique index is absolutely necessary. Be sure to check the explain plan and verify the proper index usage. Sometimes it is faster to only keep unique indexes while loading data and dropping the non-unique indexes not needed by the session. These indexes can be recreated at the end of the session.
 

Clever mapping logic

Now data is being read and written in the most optimal way, it is time to focus our attention to the actual mapping. The basic idea is simple: minimize the incoming data stream and create as little as possible overhead within the mapping. A first step in achieving this goal is to reduce the number of transformations to a minimum by reusing common logic. Perhaps the use of the same lookup in different pipes could be redesigned to only use the lookup once? By using clever caching strategies, cache can be reused in the mapping or throughout the workflow. Especially with active transformations (transformations where the number of records is being changed) the use of caching is extremely important. Active transformations that reduce the number of records should be placed as early as possible in the mapping.

Data type conversions between transformations in the mapping are costly. Be sure to check if all explicit and implicit conversions really are necessary. When the data from a source is passed directly to a target without any other actions to be done, connect the source qualifier directly to the target without the use of other transformations.

Single pass reading allows multiple targets being populated using the data from the same source qualifier. Consider using single pass reading if there are multiple sessions using the same source: the existing mapping logic can be combined by using multiple pipelines. Common data manipulations for all pipelines should be done before splitting out the pipeline.

By times it is better not to create mappings at all: staging mappings could be replaced by snapshots or replication in the database. Databases are specialized in these types of data transfer and are in general far more efficient in processing then passing the data through PowerCenter 8.
 

Transformation Mechanics

Every transformation has its specifics related to performance. In the section below the most important items are discussed.

A joiner transformation should be used to join heterogeneous data sources. Homogeneous sources should always be joined in the database by using the user defined join in the source qualifier transformation.

If not sorted at database level, always use a sorter transformation to sort the data before entering the joiner transformation. Make sure the sorter transformation has sufficient cache to enable a 1-pass sort. Not having sufficient cache will plummet performance. The data could be sorted in the joiner, but there are three advantages of using the sorter transformation:

  • The use of sorted input enables the joiner transformation to start passing data to subsequent transformations before all data was passed in the joiner transformation. Consequently, the transformations following the joiner transformation start receiving data nearly immediately and do not have to wait until all the data was sorted and joined in the joiner transformation. This logic is only valid when the source can be sorted in the database: for example when joining SQL-Server and Oracle. Both sources can be sorted in the database, making additional sorting using sorters superfluous. When a sorter is needed, for example when joining Oracle and a flat file, the sorter will have to wait until all data is read from the flat file before records to the joiner transformation can be passed.
  • The sorting algorithm used in the sorter is faster then the algorithm used in joiners or aggregators.
  • The use of sorted input in the joiner transformation, allows for a smaller cache size, leaving more memory for other transformations or sessions. Again, when a flat file is used, a sorter will be needed prior to the joiner transformation. Although the joiner transformation uses less cache, the sorter cache will need to be sufficiently large to enable sorting all input records.

Figure 2: Example of an unnecessary sort. The data should be sorted in advance at database level.
 
As outer joins are far more expensive than inner joins, try to avoid them as much as possible. The master source should be designated as the source containing fewer rows then the detail source. Join as early as possible in the pipeline as this limits the number of pipes and decreases the amount of data being sent to other transformations.

Only use a filter transformation for non-relational sources. When using relational sources, filter in the source qualifier. Filter as early as possible in the data stream. Try filtering by using numeric lookup conditions. Numeric matching is considerably faster then the matching of strings. Avoid complex logic in the filter condition. Be creative in rewriting complex expressions to the shortest possible length. When multiple filters are needed, consider using a router transformation as this will simplify mapping logic.

A lookup transformation is used to lookup values from another table. Clever use of lookup caches can make a huge difference in performance.

By default, lookup transformations are cached. The selected lookup fields from the lookup table are read into memory and a lookup cache file is built every time the lookup is called. To minimize the usage of lookup cache, only retrieve lookup ports that are really needed.

However, to cache or not to cache a lookup really depends on the situation. An uncached lookup makes perfect sense if only a small percentage of lookup rows will be used. For example if we only need 200 rows in a 10 000 000 rows table. In this particular case, building the lookup cache would require an extensive amount of time. A direct select to the database for every lookup row will be much faster on the condition that the lookup key in the database is indexed.

Sometimes a lookup is used multiple times in an execution path of a mapping or workflow. Re-caching the lookup every time would be time consuming and unnecessary, as long as the lookup source table remains unchanged. The persistent lookup cache property was created to handle this type of situation. Only when calling the lookup the first time, the lookup cache file is refreshed. All following lookups reuse the persistent cache file. Using a persistent cache can improve performance considerably because the Integration Service builds the memory cache from the cache files instead of the database.

Use dynamic lookup cache when the lookup source table is a target in the mapping and updated dynamically throughout the mapping. Normal lookups caches are static. The records that were inserted in the session are not available to the lookup cache. When using dynamic lookup cache, newly inserted or updated records are update in the lookup cache immediately.

Ensure sufficient memory cache is available for the lookup. If not, the Integration server will have to write to disk, slowing down.

By using the Additional Concurrent Pipelines property at session level, lookup caches will start building concurrently at the start of the mapping. Normal behaviour would be that a lookup cache is created only when the lookup is called. Pre-building caches versus building caches on demand can increase the total session performance considerably, but only when the pre-build lookups will be used for sure in the session. Again, the performance gain of setting this property will depend on the particular situation.

An aggregator transformation is an active transformation, used to group and aggregate data. If the input was not sorted already, always use a sorter transformation in front of the aggregator transformation. As with the joiner transformation the aggregator transformation will accumulate data until the dataset is complete and only starts processing and sending output records from there on. When sorted input is used, the aggregator will process and sent output records as soon as the first set of records is complete. This will allow for much faster processing and smaller caches. Use as little as possible functions and difficult nested conditions. Especially avoid the use of complex expressions in the group by ports. If needed use an expression transformation to build these expressions in advance. When using change data capture, incremental aggregation will enhance performance considerably.

Sometimes, simple aggregations can be done by using an expression transformation that uses variables. In certain cases this could be a valid alternative for an aggregation transformation.

Expression transformations are generally used to calculate variables. Try not to use complex nested conditions, use decode instead. Functions are more expensive than operators; avoid using functions if the same can be achieved by using operators. Implicit data type conversion is expensive. Try to convert data types as little as possible. Working with numbers is generally faster then working with strings. Be creative in rewriting complex expressions to the shortest possible length.

The use of a sequence generator transformation versus a database sequence depends on the load method of the target table. If using bulk loading, database sequences cannot be used. The sequence generator transformation can overcome this problem. Every row is given a unique sequence number. Typically a number of values are cached for performance reasons.

There is however a big catch. Unused sequence numbers at the end of the session are lost. The next time the session is run, the sequence generator will cache a new batch of numbers.

For example: a sequence generator caches 10000 values. 10000 rows are loaded, using the cached values. At row 10001, a new batch of sequence values is cached: from 10000 to 20000. However the last row in the session is 10002. All values between 10002 and 20000 are lost. Next time the session is ran, the first inserted row will have a key of 20001.

To avoid these gaps use a sequence generator in combination with an unconnected lookup. First look up the latest key value in the target table. Then use an expression that will call the sequence generator and add one to the key value that was just retrieved. The sequence generator should restart numbering at every run. There are some advantages to this approach:

  • a database sequence is emulated and bulk loads remain possible.
  • if rows are deleted, gaps between key values will be caused by deletes and not by caching issues.
  • the limits of a sequence will not be reached so quickly
As an added advantage, the use of this method will prevent migration problems with persistent values between repositories. This method is easy to implement and does not imply a performance penalty while running the session.
 

Clean up error handling

Dealing with transformation errors in advance can save a lot of time while executing a session. By default, every error row is written into a bad file, which is a text based file containing all error rows. On top of that the error data is logged into the session log that, if sufficient transformation errors occur, will explode in size. Both cause a lot of extra overhead and slow down a session considerably.
 
In general, it is better to capture all data quality issues that could cause transformation errors in advance and write flawed records into an error table.
 

Collecting Advanced Performance data

To really get into a session and understand exactly what is happening within a session, even more detailed performance data than available into the session log can be captured. This can be done by, at session level, enabling the checkbox Collect performance data'.

This option will allow the developer to see detailed transformation based statistics in the Workflow Monitor while running the session. When finished a performance file is written to the session log folder. For every source qualifier and target definition performance details are provided, along with counters that show performance information about each transformation.

A number of counters are of particular interest to the developer:

  • errorrows: should always be zero, if errors occur, remove the cause
  • readfromdisk/writetodisk: indicates not enough cache memory is available. Increase the cache size until this counter is no longer shown.

Figure 3: Session statistics while using Collect Performance Data
 

Memory Optimization

Memory plays an important role when the Integration Service is running sessions. Optimizing cache sizes can really make a huge difference in performance.

Buffer memory is used to hold source and target data while processing and is allocated when the session is initialized. DTM Buffer is used to create the internal data structures. Buffer blocks are used to bring data in and out of the Integration Service. Increasing the DTM buffer size will increase the amount of blocks available to the Integration Service. Ideally a buffer block can contain 100 rows at the same time.
You can configure the amount of buffer memory yourself or you can configure the Integration Service to automatically calculate buffer settings at run time. In stead of calculating all values manually or by trial and error, run the session once on auto and retrieve the correct values from the session log:
Severity Timestamp Node Thread Message Code Message
INFO 12/30/2008 1:23:57 AM INFO8_ASPMIS003 MAPPING TM_6660 Total Buffer Pool size is 90000000 bytes and Block size is 65536 bytes.
 
The Integration server uses the index and data caches for mostly active transformations: aggregator, joiner, sorter, lookup, rank
Configuring the correct amount of cache is really necessary as the Integration Server will write and read from disk if not properly sized. The index cache should be about half of the data cache. Cache files should be stored on a fast drive and surely not on a network share.
The easiest way of calculating the correct cache sizes is by keeping the defaults on auto and examining the session log. In the session log a line is written for every lookup that looks like this:
 
Severity Timestamp Node Thread Message Code Message

INFO 12/30/2008 1:26:11 AM INFO8_ASPMIS003 LKPDP_2:TRANSF_1_1 DBG_21641 LKP_VORIG_RECORD: Index cache size = [12000800], Data cache size = [24002560]

Copy these values into the session properties. Be sure to verify the performance counters to validate no disk reads/writes are done.

Sorter transformations need special attention concerning cache sizes as well. If not enough cache is available, the sorter will require a multi pass sort, dropping the session performance. If so, a warning will be displayed in the session log:

TRANSF_1_1_1> SORT_40427 Sorter Transformation [srt_PRESTATIE] required 4-pass sort (1-pass temp I/O: 19578880 bytes). You may try to set the cache size to 27 MB or higher for 1-pass in-memory sort.

The maximum amount of memory used by transformation caches is set by two properties:

  • Maximum Memory Allowed for Automatic Memory Attributes
  • Maximum Percentage of Total Memory Allowed for Automatic Memory Attributes
The smaller of the two is used. When the value is 0, the automatic memory attributes are disabled. If this value is set too low, an error will occur if a lookup with manually configure cache wants to allocate more memory then available. Keep in mind that sessions can run in parallel: every session will try to allocate RAM-memory.
 
Ensure plenty of RAM-memory is available for the Integration Service. Do not assume that adding cache memory will increase performance, at a certain point optimum performance is reached and adding further memory will not be beneficial.
 

Further session optimization

High precision

The high precision mode will allow using decimals up to a precision of 28 digits. Using this kind of precision will result in a performance penalty in reading and writing data. It is therefore recommended to disable high precision when not really needed. When turned off, decimals are converted to doubles that have a precision up to 15 digits.
 

Concurrent sessions

Depending on the available hardware, sessions can be run concurrently instead of sequential. At Integration Service level the number of concurrent sessions can be set. This value is set default to 10. Depending on the number of CPU's at the PowerCenter 8 server and at source and target database this value can be increased or decreased. The next step is in designing a workflow that will launch a number of sessions concurrently. By trial and error an optimal sampling can be achieved.

Session logging

The amount detail in a session log is determined by the tracing level. This level ranges from Terse' to Verbose Data'. For debugging or testing purposes the Verbose Data' option will trace every row that passes in the mapping in the session log. At terse, only initialization information, error messages, and notification of rejected data are logged. It is quite clear the Verbose Data' option causes a severe performance penalty.

For lookups, use the Additional Concurrent Pipelines for Lookup Creation' to start building lookups as soon as the session is initialized. By the time the lookups are needed in the session, the cache creation hopefully is already finished.
 

Partioning

If a transformation thread is 100% busy, consider adding a partition point in the segment. Pipeline partitioning will allow for parallel execution within a single session. A session will have multiple threads for processing data concurrently. Processing data in pipeline partitions can improve performance, but only if enough CPU's are available. As a rule of thumb, 1.5 CPU's should be available per partition. Adding a partition point will increase the number of pipeline stages. This means a transformation will logically be used a number of times, so remember to multiply the cache memory of transformations, session,.. by the number of partitions. Partitioning can be specified on sources / targets and the mapping transformations.

Using partitioning requires the Partition Option' in the PowerCenter 8 license.


Figure 4: The execution time of this mapping was reduced from 400 to 90 seconds using partitioning and advanced mapping tuning.
 
 
 
 
 
 

Pushdown Optimization

Pushdown optimization will push the transformation processing to the database level without extracting the data. This will reduce the movement of data when source and target are in the same database instance. Possibly, more optimal specific database processing can be used to even further enhancing performance. The metadata en lineage however is kept in PowerCenter.
 
Three different options are possible:
  • Partial pushdown optimization to source: one or more transformations can be processed in the source
  • Partial pushdown optimization to target: one or more transformations can be processed in the target
  • Full pushdown optimization: all transformations can be processed in the database A number of transformations are not supported for pushdown: XML, ranker, router, Normalizer, Update Strategy,
Pushdown optimization can be used with sessions with multiple partitions, if the partition types are pass-through of key range partitioning. You can configure a session for pushdown optimization in the session properties. Use the Pushdown Optimization Viewer to examine the transformations that can be pushed to the database. Using pushdown requires the Pushdown Optimization Option' in the PowerCenter 8 license.
 

Architecture

64-Bit PowerCenter 8 versions will allow better memory usage as the 2GB limitation is removed. When PowerCenter 8 is run on a grid, the workflows can be configured to use resources efficiently and maximize scalability. Within a grid, tasks are distributed to nodes. To improve performance on a grid, the network bandwidth between the nodes is of importance as a lot of data is transferred between the nodes. This data should always be stored on local disks for optimal performance. This includes the caches and any source and target file. Of course even 64-bit computing will not help if the system is not properly setup. Make sure plenty of disk space is available at the PowerCenter 8 server.

For optimal performance, consider running the Integration service in ASCII data movement mode when all sources and targets use 7 or 8-bit ASCII as UNICODE can take up to 16 bits.

The repository database should be located on the PowerCenter 8 machine. If not, the repository database should be physically separated from any target or source database. This will prevent the same database machine is writing to a target while reading from the repository. Always use native connections over ODBC connections as they are a lot faster. Maximize the use of parallel operations on the database. The use of parallelism will cut execution times considerably. Remove any other application from the PowerCenter 8 server apart from the repository database installation.

Increase the database network packet size to further improve performance. For Oracle this can be done in the listener.ora and tnsnames.ora. Each database vender has some specific options that can be beneficial for performance. For Oracle, the use of IPC protocol over TCP will result in a performance gain of at least by a factor 2 to 6. Inter Process Control (IPC) will remove the network layer between the client and Oracle database-server. This can only be used if the database is residing on the same machine as the PowerCenter 8 server. Check the product documentation for further documentation.

By careful load monitoring of the target/source databases and the servers of PowerCenter 8 and databases while running a session, potential bottlenecks at database or system level can be identified. Perhaps the database memory is insufficient? Perhaps too much swapping is occurring on the PowerCenter 8 server ? Perhaps the CPU's are overloaded ?
The tuning of servers and database is just as important as delivering an optimized mapping and should not be ignored. Tuning a system for a datawarehouse poses different challenges then tuning a system for an OLTP-application. Try to involve DBA's and admins as soon as possible in this process so they fully understand the sensitivities involved with data warehousing.
 

Conclusion

Increased competition and tighter budgets force IT managers to look for alternatives to decrease data centre costs while providing more services to a growing and demanding user base. One way in accomplishing this is by reducing the load on existing servers by optimizing the execution of workflows. The reduced execution times of optimized sessions increase the amount of free space in an often crowded ETL load window.

Using a number of simple guidelines, sessions can be tuned to perform at their best. In typical situations, the total ETL time can be reduced by 10 to 40% without making any structural investments in hardware. Performance tuning of sessions however is not an exact science and often needs trial and error to find an optimum. Tuning sessions can be a labour intensive, even tedious, as the tuning process tends to repeat itself as time passes by.

Tuning is not a job solely done by the developer. Tuning is a team effort. To ensure optimal performance, developers, administrators and DBA's need to be actively and continuously involved. Care for optimization and tuning is not something that needs to be taken care off when development has completed. The guidelines above should be used as soon as the project starts, to ensure optimal performance of the PowerCenter 8 system from the start to end of the system lifecycle.