How to handle large datasets in near real-time with Qlik Cloud Data Integration via CDC

Qlik Cloud Data Integration is a powerful tool that forms a crucial component of the Qlik-powered Modern Data Platform. A Qlik-powered Modern Data Platform offers end-to-end data integration and analytics capabilities, enabling businesses to accelerate their insights and make informed and timely decisions based on data-driven intelligence. In this article, we will step-by-step guide you through how you can leverage Qlik Cloud Data Integration for data warehouse automation and CDC (change data capture) streaming. In the end you will be able to analyze large datasets in near real-time.

In this step-by-step example, we will stream our data from an Azure SQL database to Azure Databricks. Next, we will create our data warehouse in Databricks via Qlik Cloud Data Integration. Lastly, we will consume pour data from Databricks in Qlik Sense (Qlik Cloud Analytics).

Image
Qlik Cloud Data Integration - Architecture Example

Step 1: Onboarding

Qlik Cloud Data Integration simplifies the process of onboarding data by providing a command and control mechanism through the Qlik Cloud Platform. It facilitates secure replication to the Cloud Data Platform, ensuring data movement with zero footprint impact on source systems. This platform offers features like full load and changes (CDC), real-time data integration, and low impact on source systems. It also allows for historic data retention (type 2), ensuring that businesses can access and analyze data from different time periods. In contrast to other CDC ingestion tools, the onboarding step of Qlik Cloud Data Integration allows you to maintain transactional consistency within and in between tables automatically.

In this next video, we will create a data project and onboard data. During the onboarding process, we load data from the source (Azure SQL database) to the target (Azure Databricks).

Image
onboarding

 

Data sources and targets

The amount of data sources and targets is continuously growing (50+). Examples of data sources are: Oracle, SAP HANA, IBM DB2 for iSeries, Salesforce, SAP Concur, Netsuite, Facebook & Google Ads, etc. For the latest information, please visit the Qlik Cloud Data Integration help page.

Step 2: Data Transformation

With Qlik Cloud Data Integration, businesses can transform their data effortlessly. It offers a low-code interface, enabling users to perform row-level transformations, data cleaning, and data modelling. The tool supports custom SQL for more complex use cases, empowering users to manipulate and structure data according to their specific requirements. Metadata-driven capabilities ensure lightweight transformations while maintaining complete control over data.

In the previous video, we onboarded the data. This means the data is now present in Databricks and is continuously being updated through change data capture. In this second step, we will transform our data and create the final data model that we will analyse with Qlik Cloud Analytics in the final step.

Image
transformations

Integration with Third-Party Data

Qlik Cloud Data Integration extends its functionality to include third-party data integration. Users can easily transform registered data sources, such as data already present in cloud data platforms. This integration allows for seamless data consolidation and ensures that businesses can leverage a unified view of their information for analysis and decision-making.

Step 3: Consume your data with Qlik Cloud Analytics

Qlik Cloud Data Integration seamlessly integrates with Qlik Cloud Analytics, providing businesses with a comprehensive data analytics solution. The platform supports real-time monitoring, on-demand data exploration, and hybrid analytics. Users can build machine learning models using Qlik AutoML, perform what-if analyses, and identify key drivers influencing their business performance. The Qlik Cloud Platform also offers Qlik Application Automation, enabling reverse ETL, write-back capabilities and alerting. Finally, Direct Query enables companies to work with large datasets from various data sources, including popular databases like Amazon Redshift, Azure Databricks, Snowflake, and more.

In the final video we use Qlik Sense partial reloads and Application Automation to frequently reload our Qlik Sense app and get near-real-time insights from our database through Azure Databricks and Qlik Cloud Data Integration.

Image
consumption

Conclusion

Qlik Cloud Data Integration plays a vital role in accelerating business insights by providing scalability, real-time data integration, and an open architecture. As such Qlik Cloud Data Integration empowers businesses to gain actionable insights and make informed & timely decisions for improved outcomes.