Accelerating ML research: Building a Synthetic Data Generator

Your role

As a (researching) Data Scientist at element61 you have a solid use case but not always relevant data to test your new machine learning approach or new analytics pipeline. To validate your use cases, you can use synthetic data which looks & feels almost identical to real data. Synthetic data is data that has similar distributions, examples, links across different tables, etc. as real-life data would have. The goal of this traineeship is to build a synthetic data generator to support in this process.

Your profile


In this traineeship, the student will work on further building a synthetic data generator. A generator that – given a specific dataset – generates a relevant, sizeable dataset fit for actually testing your development code.

This traineeship will combine both the technical angle (i.e. building the generator) with the business & architectural angle (i.e. what should it do, when is it really useful as data scientist, etc.). 
The student will be able to work with state-of-the-art technology while using this on a real-life use case.

As a trainee, we expect you to:

  • Translate your academic knowledge into business solutions & a first hands-on experience;
  • Do data development using Python, SQL in Azure and Databricks;
  • Show your documentation and reporting skills through presentations and demo’s;
  • Show creativity and out-of-the-box thinking in this challenging use-case.

What are we looking for:

  • Character: customer-oriented, able to work in a team, keen to create something new in international teams;
  • Working Practice: analytical, structured and result-orientated;
  • Passionate about analytics, machine learning technology & applications and eager to learn;
  • Enthusiasm to be part of a growing team & industry.

Language: working level English

Interested to find out more ? Send us your profile and motivation at