In today's fast-paced, data-driven world, businesses rely on seamless data integration to unlock the full potential of their systems and drive intelligent decision-making. Yet, integrating SAP data into modern data platforms presents a unique set of challenges, from handling complex data structures to ensuring real-time synchronisation and maintaining data governance. The release of SAP notes added an additional layer of restrictions, impacting various tools that rely on Operational Data Provisioning (ODP). Hence, organisations currently leaning on ODP connectors should look for alternative solutions. Through this blog, we'll explore these ingestion hurdles in depth, looking deeper into common pain points and uncovering effective solutions.
Key challenges of SAP Data Ingestion
Integrating SAP data into modern platforms comes with several key challenges that businesses must navigate to ensure a smooth ingestion process.
High Data volumes
SAP systems generate vast amounts of granular data across various business processes, leading to significant data volumes that require efficient handling. For example, financial transactions in SAP S/4HANA can produce billions of records, with detailed logs of invoices, payments, and reconciliations.
To manage this scale, businesses must adopt optimised file formats such as Parquet or Delta Lake, which enhance storage efficiency and query performance. Additionally, incremental loading strategies, such as delta replication and change data capture (CDC), are essential to avoid unnecessary full data loads and improve processing speed. Remember how full loads can put a heavy load on the operational SAP database when extracting large volumes of data at once. This might slow down other processes running at the same time, such as transactions used by businesses, for instance.
Complex SAP Data structures
SAP ERP systems count over 130.000 tables, covering diverse business processes such as finance, logistics, HR, etc. This leads to a complex data model made of complex tables. While SAP provides standard documentation for core tables, many organisations have customised their SAP environments, leading to undocumented table structures that require manual exploration.
Furthermore, not all tables have a change data marker, which makes incremental data extraction even harder. As a consequence, businesses must rely on alternative methods such as timestamp-based filtering or fill-table loads, which can be inefficient for large datasets.
Real-time data processing
In today’s data-driven environments, businesses increasingly require real-time access to SAP data for informed decision-making, predictive analytics, and operational efficiency. Whether monitoring inventory levels, tracking financial transactions, or optimising supply chain logistics, having up-to-the-minute data is critical for responsiveness and agility. However, enabling real-time data streaming can place a substantial load on SAP systems, potentially affecting database performance and system resources. Continuous data extraction processes can slow down transactional operations, impacting other critical workloads. To mitigate these risks, organisations must implement efficient data replication strategies, such as change data capture (CDC) or event-driven architectures, ensuring that real-time processing does not compromise core system functions.
SAP note
SAP comes with strict licensing terms, which considerably limit how data can be accessed and used. SAP Note 3255746 prohibits the use of ODP Data Replication APIs by third-party applications for accessing on-premises or Cloud Private Edition sources. Previously, ODP was widely used for real-time data extraction from SAP ERP systems like SAP S/4HANA and ECC, enabling integration with external platforms such as Databricks, Azure Data Factory, and Qlik Replicate.
However, with this update, SAP has restricted the use of ODP RFC modules, stating that they are intended only for SAP-internal applications and may be modified without notice. This means that third-party ingestion tools relying on ODP APIs may lose their ability to extract data directly from SAP systems, forcing businesses to explore alternative integration methods. Take for instance Fivetran, which uses the ODP framework to perform the initial full loads, or Azure Data Factory proposing the CDC connector.
SAP has recommended using ODP via OData APIs instead, as they provide a stable and officially supported interface for data extraction.
Note 2971304 outlines the limitations associated with HANA database log replication.
The providers are working on adapting their connectors to support SAP OData. Relying on OData connectors may lead to slower performances compared to RFC-based methods.
Choosing the right tool for seamless SAP data integration
When selecting a tool for integrating SAP data, it's important to consider some key factors to ensure the solution aligns with the business and technical requirements.
Key factors to consider
- Number of sources it can connect to: A robust SAP data ingestion tool should support multiple data sources, ensuring connectivity beyond core SAP (source)systems. Some tools focus solely on SAP S/4HANA and BW/4HANA, while others extend to older systems like SAP ECC, SAP C4C, SAP HANA, Successfactors, Ariba, ...
- The ability to integrate with cloud storage solutions (Azure Data Lake, AWS S3, Google Cloud) and modern data warehouses (Snowflake, Databricks, Microsoft Fabric) is crucial for scalability. Choosing a tool that supports diverse target systems ensures long-term flexibility for evolving data architectures.
- Types of SAP objects it can connect to: SAP data is structured across various objects, including tables, CDS views and extractors. Some integration tool work only with raw SAP tables, while others provide optimised connectors for CDS views or even extractors to simplify data extraction. Selecting a tool that supports a wide range of SAP objects ensures that businesses can extract data efficiently without complex workarounds.
- The ease of configuration is another critical factor. Some tools offer pre-built connectors and automation, reducing the need for custom scripting. Others required coding or any other logic to be set up. A low-code approach can significantly reduce setup time and maintenance efforts.
- Use of proprietary Change Data Capture (CDC) mechanism: Incremental data loading is essential to avoid full-table extractions and optimise performance. Some tools leverage SAP’s built-in CDC capabilities, while others implement proprietary replication methods. For example, SAP Datasphere uses Replication Flows, while Fivetran employs its own automated CDC approach to track changes efficiently.
- Impact on source systems: Data integration must preserve SAP operations, ensuring that essential systems remain stable and performant without any disruption.
Using SAP Datasphere as a Bridge for ingesting SAP Data into non-SAP platforms
SAP's recommended approach is to use SAP Datasphere replication flows to deliver SAP data directly into a data lake in Parquet format. In this scenario, SAP Datasphere is a valuable bridge between SAP and non-SAP platforms.
SAP Datasphere serves as a powerful integration tool that facilitates data ingestion from SAP systems, such as S/4HANA and BW/4HANA, into non-SAP platforms like Microsoft Fabric or Databricks. By leveraging SAP Datasphere’s replication capabilities, organisations can establish a structured data flow that ensures accurate and efficient data movement across heterogeneous environments.
The picture below clearly demonstrates how the replication flow functionality from SAP Datasphere can be used to replicate data from SAP S/4HANA directly into Azure Data Lake Gen 2. Once replicated into Azure, Databricks reads the Parquet files produced by the replication flows and loads them into Delta Live Tables. Data can also be used for transformations and machine learning (ML) use cases, for instance. In this scenario, data is passed through Datasphere without persisting anything in that system.
To enable data transfer from SAP S/4HANA to Microsoft Fabric or Databricks, SAP Datasphere plays a crucial role in orchestrating the ingestion pipeline. The process begins with setting up SAP Datasphere to replicate data from S/4HANA to a designated target. It is also a prerequisite to establish the necessary connections for both the source and target of the replication flow before the flow can access and transfer data between the specified environments.
Within SAP Datasphere, the Replication Flow defines the parameters for extracting data from SAP S/4HANA. This includes specifying source objects like CDS views or ODP sources, identifying the target system, and structuring the data for efficient storage and retrieval. The supported target connectivity includes (amongst others) an AWS S3 bucket, Google Cloud Storage, Azure Data Lake Storage or directly into a DataSphere-managed data lake. In this article, we focus solely on ingesting SAP data into a non-SAP storage account.
Each target has its own specific configurations, such as different file formats for target object stores (e.g. CSV, Parquet, etc.).
When replicating SAP Data to cloud storage targets, several file types are created:
- SUCCESS File: Confirms that data replication was completed successfully. This file serves as a validation tool, eliminating the need for manual status checks on the replication flow.
- Part Files: A series of files is generated, each containing a portion of the replicated data. These files are split based on factors such as source size, structure, and update frequency to optimise data handling.
- Metadata Files: These files store internal replication details and are primarily used by the system. Their structure may change without prior notice, as they are intended for system-level operations.
Additionally, built-in transformation features support data projection, mapping, and alignment with the target system in SAP HANA Cloud, enabling businesses to refine the data structure before it reaches downstream applications. At element61, we typically map data one-to-one from SAP, maintaining its original structure for consistency. Further data refinement and alignment occur in upper layers, where business logic, transformations, and enrichment processes optimise the data for reporting and analytics. This approach ensures flexibility, enabling organisations to adapt their data model while preserving the integrity of source information.
By utilising SAP Datasphere as an ingestion tool, enterprises can simplify SAP data integration, enhance accessibility across non-SAP ecosystems, and improve interoperability between diverse platforms like Databricks, Microsoft Fabric, and beyond. Following this idea, SAP Datasphere acts as a hub where SAP Data is captured and then fanned out, pushing a copy to an external data lake for further processing and analysis. This way, we ensure we leverage SAP's integration capabilities while taking advantage of a third-party storage and compute.
How are changes and deletion events handled
The Replication Flow provides flexibility in managing data transfers through initial full loads, delta loads, or scheduled periodic loads, optimising performance based on business needs.
When mapping source to target in the replication flow, three fields are automatically added:
- operation_flag: identifies the performed action on the source record: insertion, update, or deletion.
- recordstamp: timestamp of the change
- Is_deleted: indicates whether a record was deleted in the source.
These three fields are automatically generated in the target structure and are filled by the systems.
CDC in replication flows enables efficient data replication by capturing real-time updates or delta changes from source systems. The CDC capabilities depend on the type of source system and the provisioning method used. CDC is supported through the ODP framework. It comes with built-in delta mechanisms managing automatic updates after an initial load. For CDS views, annotations must be defined beforehand to enable this CDC mechanism.
Premium Outbound Integration in SAP Datasphere: Optimising data transfer to non-SAP Platforms
As we have seen in the previous section, replication flows unlock great potential when it comes to integrating SAP data with cloud storage providers. But before defining a cloud storage provider as a replication flow target, admins must allocate capacity units to Premium Outbound Integration, enabling non-SAP targets.
SAP Datasphere’s Premium Outbound Integration provides data movement capabilities, enabling organisations to efficiently transfer large volumes of data from SAP systems to external platforms like Azure Data Lake Gen or AWS S3. This feature ensures low latency and improved performance.
The Premium Outbound configuration is determined based on the volume of replicated data, with each block representing 20 GB of outbound data capacity. These blocks are allocated to the SAP Datasphere tenant based on the expected data volume.
Premium Outbound Integration is priced per the amount of data transferred outside of SAP Datasphere. The cost is calculated in Capacity Units (CU), which vary based on workload and services. SAP’s latest pricing updates have optimised the Premium Outbound feature, requiring fewer CUs per 20 GB block, making large-scale data replication more cost-effective.
Organisations leveraging Premium Outbound integration gain connectivity to cloud storage and data lakes, ensuring efficient data processing and analytics in non-SAP environments.
Although SAP Datasphere's Replication flows have become the recommended approach for extracting data from SAP sources, it come with certain limitations that must be considered before implementing. One key constraint involves the cost implications, which can quickly become significant depending on the volume and frequency of data transfers. For more details about pricing, please refer to SAP for a tailored quote. What's more, SAP limits the number of concurrent replication threads. This has clearly raised concerns among customers who require simultaneous replication of many source objects, as this will probably impact performance in large-scale data integration scenarios.
How Fivetran simplifies integration with modern platforms
Fivetran is a SaaS platform focused on automated, zero-maintenance data ingestion. It provides pre-built connectors for a wide range of data sources, including SAP, simplifying data extraction without the need for manual coding or custom scripts. Originally designed to streamline complex ETL processes, Fivetran now emphasises ELT, making raw data quickly available for transformation downstream. It enables real-time data synchronisation and schema replication, ensuring that target systems always reflect the current state of source systems. As a plug-and-play solution, Fivetran helps organisations rapidly connect operational systems to modern analytics platforms such as Microsoft Fabric and Databricks. In this context, we’ll focus on its role in data ingestion, where it removes much of the operational overhead typically associated with building and maintaining pipelines.
Choosing the right deployment model
Fivetran offers three deployment models to suit different security, compliance, and infrastructure requirements:
- SaaS Deployment: This is the default and most common option, where Fivetran is fully hosted in its secure cloud environment. All data extraction, processing, and delivery are managed by Fivetran, providing a fully managed, low-maintenance experience. It's ideal for organisations seeking rapid deployment with minimal operational effort. However, it doesn't suit customers who require a 1-minute sync of data. Neither is the SaaS deployment relevant for SAP data ingestion.
- Hybrid Deployment: In this model, Fivetran’s control plane remains cloud-hosted, but data processing occurs within the customer’s environment, either on-premises or in a private cloud. This approach offers enhanced control over sensitive data while retaining the ease of management from a cloud-based interface. It’s well-suited for organisations with strict data governance or regulatory requirements.
- Self-Hosted Deployment (HVR): Using Fivetran’s HVR technology, this option enables customers to fully self-host the data pipeline components within their own network. All data extraction, processing, and delivery are performed locally, giving maximum control over data movement and security. This is particularly useful for highly regulated industries or environments with strict network restrictions and zero external connectivity.
Depending on the chosen deployment mode, some features and connectors may not be available. The SaaS deployment offers the broadest feature set and connector support. However, private or hybrid deployments may be necessary to meet specific compliance or infrastructure requirements. Both the hybrid and self-hosted deployments allow a 1-minute sync, making real-time, low-latency data movement possible. What's more, the deployment mode is closely tied to the pricing tier, as higher plans offer more flexibility and features. For example, the Business Critical pricing plan is a requirement for self-hosted HVR, ensuring organisations meet strict security requirements
How do we use Fivetran to ingest SAP Data
Fivetran provides fully managed connectors to ingest data from a wide range of SAP systems, including SAP ECC, SAP S/4HANA, Concur, and SuccessFactors. The connector ecosystem continues to expand, with support for SAP Ariba and OData connectors expected in the near future.
Ingesting SAP data through Fivetran is a streamlined, no-code process using one of the available connectors. Configure the connector by specifying the destination schema name, Fivetran automatically handles schema creation and manages schema drift.
You provide credentials such as the SAP host, username, and password for Fivetran to authenticate.
By leveraging the OData API, Fivetran can extract data from CDS views and BW extractors. Note that these need to be exposed beforehand, using the dedicated SAP annotations. What's more, not all BW extractors seamlessly integrate with the OData API. The implementation usually depends on the specific design and configuration of each extractor.
With access to over 100.000 SAP ERP tables, select the specific tables needed, for example, materials management tables related to inventory and procurement.
Users can visualise their data pipeline, from the connector to output mode, with an in-platform data lineage group.
Within the Fivetran UI, the schema tab allows you to manage your dataset by adding or blocking tables and columns, adjusting sync modes, or hashing PII fields for privacy. The ingested data is copied, not federated, into a storage account and made available to analytics platforms like Databricks or Microsoft Fabric. Depending on the destination platform, you can configure the format in which the data is written, such as Delta Lake or Parquet. This ensures efficient querying and processing of ingested data.
Automating change detection using Fivetran's CDC mechanism
The main goal of Fivetran's CDC mechanism is to efficiently sync only changed data by doing incremental loads depending on changes (inserts, updates and deletes). This saves time and resources, as it does not overload the operational source system. Fivetran reads the transaction logs, triggers or delta queues from SAP to identify the rows that have been inserted, updated or deleted. These changes are then applied to the destination tables. To manage soft deletes and incremental syncs, Fivetran adds metadata columns to track the state of the data.
| Column Name | Description |
| fivetran_synced | Timestamp or marker of the last time this row was synced by Fivetran. Helps track recency of data. |
| fivetran_deleted | Boolean flag (often true or false) indicating whether the row has been deleted in the source system. Fivetran uses soft deletes by marking this flag instead of physically deleting rows right away. This way, the history is preserved, allowing downstream reconciliation and auditing. It also prevents accidental data loss. |
| fivetran_id | Unique ID used by Fivetran to track each row internally. |
| fivetran_cursor | A value used to mark the position in the source system’s transaction log or delta queue for incremental sync. |
Fivetran can track archived records by adding a _fivetran_sap_archived column to differentiate archived records from deleted records.
Once selected, the initial synchronisations begin, and Fivetran automatically sets up change data capture (CDC) for ongoing, incremental updates. You can configure sync frequency from every minute up to once every 24 hours.
Following the release of the SAP note restricting the use of the ODP framework for data ingestion into third-party tools, Fivetran extractors remain unaffected. The Fivetran ODP OData connector leverages the OData interface rather than ODP. The SAP ERP on HANA connector uses an ABAP add-on and does not rely on the ODP framework. Meanwhile, Fivetran HVR utilises database log-based replication. Note that for HANA databases, SAP does not support log-based replication.
Cost model
Fivetran follows a usage-based pricing model based on Monthly Active Rows (MAR), with a typical 1-year contract. The initial full load is free, and you are billed for OPEX only. This model offers flexibility, as you only pay for what you use, but it comes at the cost of unpredictable billing. As data volumes grow or fluctuate, your costs will increase, which can be difficult to manage without close monitoring. Pricing also varies depending on data source types, especially for complex or high-volume connectors like SAP. Fivetran is well-suited for organisations with fluctuating workloads or fast-growing data environments.
Qlik Talend's approach to scalable data ingestion
Qlik Talend Cloud is a modern, cloud-based data integration platform that enables seamless data ingestion from virtually any source, including all SAP systems. By the way, it stands out for its broad SAP integration capabilities, offering over 20 SAP-specific connectors. As a no-code solution, it allows users to connect to SAP environments without the need for custom development or complex configurations. The platform is highly scalable, supporting both small and enterprise-level data pipelines with ease. Its intuitive interface and prebuilt connectors simplify data extraction, transformation, and loading (ETL), accelerating time to insight.
Choosing the right setup: Qlik Replicate vs. Qlik Talend Cloud
Qlik Talend offers three deployment options for data ingestion:
- SaaS Deployment (Qlik Talend cloud): A cloud-native data integration platform, a fully managed SaaS solution. This set-up is ideal for organisations looking for agile, cloud-first data operations. Sources can be accessed
- Hybrid Deployment (Qlik Talend Cloud): This deployment refers to an architecture where Qlik Cloud is combined with on-premises or private cloud data sources and infrastructure. This approach allows organisations to take advantage of Qlik Cloud’s scalability, AI features, and SaaS delivery, without fully migrating all data to the cloud.
- On-premises deployment (Qlik Replicate): Qlik Replicate is installed and managed on-premises, giving organisations full control over their infrastructure and data movements. This is ideal for organisations needing fast, secure and controlled data movement within or across networks.
How do we use Qlik to ingest SAP Data
Qlik provides a user-friendly platform for ingesting SAP data, designed for low-code users and data engineers. It comes with a complete drag & drop interface, allowing users to add new sources in only a few minutes. Users can build an end-to-end pipeline without coding. Low-code transformation flow designer and pro-code GEN AI SQL assistant.
Qlik Talend provides seamless connectivity to over 300 systems, including Salesforce, Microsoft SQL Server, IBM DB2, and Oracle, enabling it to handle even the most complex data environments with ease. Its robust SAP integration is particularly impressive, offering 20+ specialised SAP connectors to ensure smooth data exchanges across enterprise systems.
It provides flexible ingestion mechanisms for either real-time ingestion or scheduled ingestion.
Automating change detection using Qlik's CDC mechanism
Qlik offers automated CDC capabilities to keep SAP pipelines up to date without requiring full loads. Depending on the source, Qlik offers both trigger-based and log-based CDC.
Trigger-based CDC is very easy to manage. It's very thorough and low-touch. Log-based CDC, on the other hand, is faster than trigger-based CDC and has a smaller system footprint. When log-based CDC is selected, Qlik will read the changes directly from the SAP HANA logs. For trigger-based, triggers are deployed by Qlik on relevant SAP tables. These triggers record every insert, update or delete operation into a separate (change) table. Qlik then reads from that table on a regular basis and replicates those changes to the target system in (near) real-time. This way, there is no need to load the full data set, and the operational database can be unburdened. This CDC process can be applied to both S/4HANA and SAP ECC. Note that for HANA databases, SAP does not support log-based replication.
Other Qlik SAP Connectors
Since the release of the SAP note restricting the use of the ODP framework for data ingestion into third-party tools, Qlik has released an ODP OData connector that leverages the OData interface rather than ODP. Through the OData connector, organisations can ingest data from CDS Views in real-time. Qlik has highly tuned their endpoint to work with CDC. While OData is generally around 30% slower than direct ODP, the Qlik OData connector has the advantage that it has almost the same performance as ODP when in CDC mode after full load!
Lastly, Qlik also allows you to directly connect to the SAP Extractors. The extractor is the original method that SAP uses to move data into an SAP Business Warehouse (BW) system. We can call it through a custom-developed API that uses programs and functions delivered with a transport. For customers looking to reuse these same extractors, this can be a great way to migrate from your BW environment to your Modern Data Platform!
Cost model
Qlik uses a capacity-based pricing model, where you commit to a fixed amount of data volume per month, with contracts typically lasting 1 or 3 years. The initial full load (CAPEX) is free, and you only pay for ongoing operations (OPEX). As long as you stay within your allocated monthly volume, the price remains predictable and stable, making budgeting easier. This model suits organisations that value predictability and can estimate their data volume well.
Azure Data Factory
Azure Data Factory (ADF) is Microsoft's cloud-based, fully managed data integration service for orchestrating and automating data movement and transformation. For SAP, ADF offers native connectors that leverage both RFC and OData protocols, enabling companies to pull tables from SAP ECC or BW. ADF can write back to a wide variety of targets: Azure Blob Storage, Azure Data Lake Storage Gen 1/2, Cosmos DB, ... ADF supports self-hosted integration runtimes, enabling secure data movement between on-premises and cloud environments.
How to extract SAP data with ADF
ADF provides a low-code approach to achieve the ingestion process. In case your SAP systems are hosted on-premises or within a private network, you might need to set up a self-hosted Integration Runtime (IR) first to enable connectivity between your SAP source and Azure.
Once the IR is set up, a linked service should be created to define the connection information to your data sources and destinations.
Azure Data Factory provides several SAP connectors to support a wide variety of data extraction scenarios from SAP. Connectors range from SAP table, SAP ECC, SAP HANA or OData connectors. SAP BW Open Hub or ODP-based extractions are available as well. Select the relevant connector and enter details such as hostname, client number, user credentials, ...
Datasets should then be defined as source and sink node. As source, we will typically specify the table, view or entity to be extracted from the SAP operational system. Azure Data Lake Storage could be a sink. Don't forget to define the output format (CSV, Parquet or JSON, for instance) and set the destination path for storage.
Once those prerequisite steps are completed, a pipeline can be built, including a Copy Data activity between source and sink. Column mapping is done here.
Incremental loading can be foreseen to transfer only modified records using date/timestamp fields or any other change pointer.
Automating change detection using ADF's CDC mechanism
Remember, SAP Note 3255746 explicitly prohibits the use of ODP APIs for third-party applications; this thus includes ADF. In short, ADF's CDC capabilities via ODP are no longer a viable option for extracting SAP data changes in an automatic manner.
Delta or incremental reads on SAP tables can be performed using subqueries that filter data based on specific conditions, typically a timestamp referring to a creation or change date. This allows ADF to load only the new or change records, hence reducing the load on the SAP system. But... some tables don't have that column to identify inserted or updated records.
Cost model
Azure Data Factory pricing is based on pipeline orchestration, activity execution, and data movement. You pay for each activity run, which includes tasks like copying data, executing data flows, or calling external services; each run (e.g., reading a table) incurs a separate charge. The integration runtime provides the compute needed to execute these activities. In addition, read and write operations (especially during data movement between sources and sinks) are priced according to the volume processed. Costs also include pipeline management operations such as creation, monitoring, and metadata activities, though these are usually minimal compared to execution and compute charges. The initial full load cost depends on several factors but is not fixed. It is usage-based and hard to predict upfront.
Which tool fits best your SAP data integration needs
When integrating SAP data into modern platforms, selecting the right tool is crucial for efficiency, scalability, and flexibility. SAP Datasphere, Fivetran, Qlik and Azure Data Factory each offer distinct advantages, catering to different business needs and technical requirements. While some tools prioritise real-time ingestion and automation, others focus on deep SAP integration and governance.
Start your SAP data integration journey now
This blog is part of a blog series about how companies can modernise their current SAP BW landscape by bringing it to the cloud.
- Modernising SAP BW Blog Series Part 1: From legacy to innovation: steps to migrate from SAP BW to a Modern Data Platform
- Modernizing SAP BW Blog Series Part 2: Unlocking SAP Data: overcoming challenges in Modern Data integration
- Modernising SAP BW Blog Series Part 3: Beyond BW: Building a scalable modernisation strategy with SAP Business Data Cloud
In this article & our previous articles, we have already outlined a solid base. However, details matter, and you want to ask an expert advice on how to modernise your legacy BW system? Don't hesitate to reach out via info@element61.be. Let's shape the future of your data strategy together. We have some SAP BW experts ready to help and guide you further!