Scalable Machine Learning with Apache Spark

As a Data Scientist, Apache Spark can allow you to run Machine Learning at scale. Using a cluster for computing, Spark can help accelerate your development time, simplify your machine learning lifcycle and allow for faster hypterparameter tuning. Additionally, Databricks can offer you a familiar environment with notebooks, Python and SQL. An environment where you can collaborate with colleagues while versioning your work and leveraging directly your enterprise data lake. 

Course objective

In this course we don't focus on basic programming with Spark* but rather how to use Spark for Machine learning. This course highlights some of the key differences between SparkML and single-node libraries such as scikit-learn. We'll cover in depth how you can use both in scaling your Machine Leaning development

* Note: if you want to learn how to program Spark: Please take a look at our intro + advanced course on Apache Spark Programming with Databricks 


This training is a 1-day training. We'll cover - at minimum - the following "how to":

  • running data cleansing for data imputation and missing values using Spark dataframes (SQL and Pyspark)
  • using your familiar Python packages (sk-learn, XGBoost) scalable in Spark
  • using Spark ML library
  • leveraging User-Defined functions 
  • using koalas

The day will consist of 50% theory and 50% hands-on exercises (Databricks environment is provided)

Note: in this course we specifically don't cover MLflow given it deserves a dedicated training day to cover all its features. If interested, please join our Using MLflow with Databricks course.


  • You are a  Data Scientist interested in the cloud for data & analytic workloads
  • You have experience with Python and working knowledge of machine learning and data science
  • You are familiar with the concepts of Spark - e.g. by following prior the Apache Spark Programming with Databricks course


  • € 675 per day

Interested to know more?

For more information, please reach out and we can give you more details & practicals.

The full element61 Training schedule (incl. when which training runs) can be found here.

Scalable Machine Learning with Apache Spark

For a complete overview of all courses, visit our academy page.