Monitoring Data Quality in a Lakehouse with Great Expectations

Your role

As a Data Analyst or Data Scientist you trust that the data you have access to is correct, complete and of high quality. You want clean data for your analyses, reports & Machine Learning experiments. However many organizations struggle with Data Quality incl. missing data, missing values, changing column names, wrong master data, etc.
Great Expectations is a Python package to monitor data quality & set expectations smartly, allowing validating data as it enters through the Data & AI platform. Imagine having access to direct data quality statistics and/or alerts when data quality expectations aren’t met.


Your profile


In this traineeship, the student will

  1. collaborate on embedding Great Expectations in a Databricks Lakehouse framework
  2. research & apply how to leverage it smartly and maximally for supporting Machine Learning & Analytics reporting use-cases with Databricks

This traineeship will combine both the technical perspective as well as the business perspective on how to quantify/measure & monitor data quality.
The student will be able to work with state-of-the-art technology while using this on a real-life use case.

As a trainee, we expect you to:

  • Translate your academic knowledge into business solutions & a first hands-on experience;
  • Do data development using Python, SQL in Azure and Databricks;
  • Show your documentation and reporting skills through presentations and demo’s;
  • Show creativity and out-of-the-box thinking in this challenging use-case.

What are we looking for:

  • Character: customer-oriented, able to work in a team, keen to create something new in international teams;
  • Working Practice: analytical, structured and result-orientated;
  • Passionate about analytics, machine learning technology & applications and eager to learn;
  • Enthusiasm to be part of a growing team & industry.

Language: working level English

Interested to find out more ? Send us your profile and motivation at