Course: Data Engineering with Databricks

Data engineering is a vital component of any data-driven organization. With the increasing volume and complexity of data, it has become essential to have a powerful and efficient platform to manage and process it. And this is where Databricks comes into play. Databricks is a cloud-based platform that provides data engineers with a unified analytics engine for big data and machine learning. It combines the power of Apache Spark with an easy-to-use interface, making it an indispensable tool in any data engineer's toolbox. 
As a data engineer, you want to know how to easily and quickly process large datasets, build and train machine learning models, and perform advanced analytics. 
This training is necessary for any data engineer looking to build a robust and scalable data infrastructure.

Course description

This training program is designed to provide participants with the knowledge and skills required for data engineering with Databricks on Azure. The program promises to cover a wide range of topics and is set to equip participants with the practical skills needed to excel in their field. We will explore the most efficient ways to facilitate the Spark engine and distributed computing principles within the Databricks environment. Our main emphasis is on fully utilizing Databricks to address all queries related to data engineering. 

 

Databricks Logo

Duration & Agenda

This training is a 3-day training covering end-to-end data development through Databricks.

 

Day 1 focuses on the Spark Core and the Databricks platform:

  • What are the Spark architectural components?
  • Databricks platform overview
  • DataFrame reader, writer, transformation, and aggregation
  • What is Lazy Execution
  • Basic & complex type
  • Spark internals

Day 2 focuses on building Data Lakehouses:

  • What are the Spark architectural components?
  • Lakehouse architecture vs. traditional data warehouse
  • Medaillon structure
  • Databricks SQL vs. Python development
  • Delta Lake: transaction log, parquet
  • Unity catalog 
    • Security
    • Governance
    • Lineage
  • ADF + Databricks + ADLS, how does it come together?

Day 3 is and advanced course on optimizations

  • Delta internals
  • Optimizations
  • What's coming in the future?
  • Data processing options
  • Streaming vs. batch (incl autoloader)
  • Cache

After completing this course, you will have a thorough understanding of basic and advanced optimization techniques and the ability to master data engineering with Databricks on Azure, significantly improving your skills.

Target audience

  • You are an (aspirant) BI professional with knowledge of data modelling & data-warehouse development. You know SQL, or Python, and you have a notion of dimensional data concepts.

  • You are looking to know what's what in the Azure Cloud and get some practical tips (rather than reading online documentation)

    • Note: we recommend all participants to follow the Azure Fundamentals training before this Databricks training as it gives a broad overview of Azure Cloud, resources & cloud data analytics concepts & key resources.

Format

The training consists plenary lecturing with hands-on lab environment. The course can be taught in both English or Dutch, also on-site at the customers' premises.

Cost

2.000 € per participant for 3 days

More information or registration? contact academy@element61.be

For a complete overview of all courses, visit our academy page.