Customer intimacy through a Big Data Platform


Customer intimacy through a Big Data Platform

De Lijn is the publicly-owned company responsible for the preparation, planning and execution of all public transport in Flanders by bus or tram. It employs approximately 7.000 employees.

Challenge: using data to improve communication

In the era of digitization, customers expect relevant and personalized communication. Therefore, De Lijn wants to personalize its communication towards its customers and use Artificial Intelligence and Machine Learning to make the communication proactive (read: predictive), relevant (read: targeted to only relevant customers) and smart.

The objective of this project was to design and set-up a Big Data Platform which centralizes all the relevant data of a customer/traveller. The following functionalities had to be part of the platform:

  • It needs to run in real-time (as communication needs to be real-time relevant)
  • It needs to cope with a lot of data incl. website and app behaviour and digital Mobib data 
  • It needs to handle AI (in real-time) as profiling will help us understand the needs and relevance for each visitor.

Opportunity: personalized proactive communication to all travellers of De Lijn



Project approach

De Lijn worked with element61 to both design and set up the Big Data Platform and first AI-profiling. The project has had two phases:

  • Phase 1: from functionalities to a architecture everybody supports
    The initial phase aimed at designing the Big Data Platform by translating all defined use-cases into a working functional architecture.
    AI use-cases included marketing use-cases such as customer churn, next-product-to-buy, stop-prediction and home-location prediction but also other use-cases including predictive maintenance.

Given the use-cases are very broad and open, a build-strategy was most relevant. Rather than going for an out-of-the-box Marketing tool, the customer platform would be a Big Data Platform build and ran by De Lijn using open-source technologies focused on Python.

In a vendor-selection all options where compared including Cloud (AWS, Google, Microsoft) and on-premise. The vendor selection focused on costs but also knowledge and access to skills. In line with the Flanders Cloud Strategy, De Lijn opted for a Cloud set-up with maximum use of PaaS services. Coding is done with Python as main language.

The outcome of Phase 1 resulting in conclusions

  • Microsoft Azure Cloud as Cloud vendor to benefit from existing Microsoft investment but also in perspective to access to skills (e.g. using partners such as element61)
  • a Modern Data Platform architecture was concluded including a real-time stream, batch-stream and various data tools including Azure Databricks for AI and data-discovery.
  • Phase 2: implementation and delivery
    Towards the end of Q4 2018 the actual implementation of this platform started with an agile mindset building based on the use-case. At time of this writing the further implementation is still ongoing with the support of element61. 
    In the next section we outline what we are implementing and how customers of De Lijn benefit from this platform.

The solution: Big Data Platform running on Azure

The chosen Big Data Platform was aimed at Marketng use-cases but could, over time, serve other use-cases at well. The data-focus was put on integrating these Marketing data sources:

Customer intimacy through a Big Data Platform


In order to support personalized communication in real-time and with AI infusion, the following functional architecture was designed. 

  • First things first, the actual communications are send through Selligent Marketing Cloud where the data platform calls their API to trigger a communication (mail or push notification). The data platform needs to calculate the message and the target audience
  • The message is typically trigger through a real-time event being it a bus delay, a hop-on of a customer, a visit on the website. This real-time data is continuously absorbed in a real-time ingestion layer where an event-hub guarantees all messages are received and handled
  • Once received, the relevant data is filtered and if need computed: this could be calculating the bus occupation or profiling a travel on a certain behaviour. The calculation results are stored in a real-time insights DB (NoSQL or SQL technology) and thus exposed to the developers as an real-time API. The app and website can thus query, in real-time, the bus occupation.
  • In parallel, other data (incl. meta-data) can be ingested into our Data Lake. This is typically data which isn't real-time yet still relevant for the profiling of AI training. In batch, this data could be cleaned and computed in e.g. cleaned data files or an updated churn model. The trained model or the data output of these batch analytics are stored back in the Data Lake or directly inserted in to the SQL Warehouse for end-user dashboarding and data exploration.

This functional set-up gives De Lijn flexibility to tackle various Marketing use-case and to deliver personalization communication based on real-time triggers.

Customer intimacy through a Big Data Platform

Why Microsoft Azure?

The Microsoft Azure Platform was chosen as a prefered Cloud platform given it supported best a hybrid mix with the on-premise data set-up as, strengthened the existing partnership with Microsoft and leveraged in-house knowledge already in place. 

A full-cloud strategy with PaaS services was chosen for the Big Data Platform. This was in line with our go-to-market priorities - i.e.

-Time to market: Get a first version out as soon as possible. Don’t design and build the perfect solution.
-Flexible: Ability to implement many different use cases. Design an architecture that is flexible enough to change over time.
-Cost-effective: Allow for fast implementation of use cases, and low-cost setup & maintenance.
-Governed & sustainable: Guarantee governance from the start to allow for sustainable growth

The architecture currently running uses below building blocks in Azure to deliver the functional architecture (real-time & batch) as shown above:

Customer intimacy through a Big Data Platform



At the time of writing (May 2019), this data platform is now fully set-up and the data sources are all connected. With a solid data platform in place, the objective is there to soon ship and deliver the first personalized proactive messages focused on delayed & canceled buses and proactive notify users who are likely to take that bus.  We'll keep you up to date (over Push Notification) on this project!

Want to know more?

  • Contact us in how to get started with your Big Data Platform for delivering Personalized Marketing Communication