Azure Databricks Training

Level

Intermediate

Duration

24h / 3 days

Date

Individually arranged

Price

Individually arranged

Azure Databricks is a big data service based on the Apache Spark platform that enables the creation, training, and exploration of data in the cloud. It is a data processing platform that provides scalability, performance, and ease of use. Azure Databricks allows teams to coordinate work more easily and share code.

What You Will Learn

Fundamentals of the Azure Databricks platform.
Data processing and preparation techniques.
Data analysis using Databricks SQL.
Utilization of Apache Spark for data processing.

Who is this training for?

Individuals who want to leverage data to optimize processes.
Those who wish to deepen their understanding of Apache Spark.
Individuals with basic knowledge of data analysis.
Developers, Data Engineers, and Data Scientists.

Training Program

What is the Databricks Lakehouse Platform

Describe what the Databricks Lakehouse Platform is.
Explain the origin of the Lakehouse data management paradigm.
Outline fundamental challenges related to managing and using data.
Describe security features of the Databricks Lakehouse Platform.
Provide examples of organizations that have benefited from using the Databricks Lakehouse Platform.

What is Databricks SQL

Summarize fundamental concepts for using Databricks SQL effectively.
Identify tools and features in Databricks SQL for querying data and sharing insights.
Explain how Databricks SQL supports data analysis workflows that allow users to extract and share business insights.

What is Databricks Machine Learning

Describe the basic overview of Databricks Machine Learning.
Identify how using Databricks Machine Learning benefits data science and machine learning teams.
Summarize the fundamental components and functionalities of Databricks Machine Learning.
Provide examples of successful use cases of Databricks Machine Learning by real Databricks customers.

What is Databricks Data Science and Data Engineering Workspace

Describe the basic overview of Databricks Data Science and Engineering Workspace.
Identify assets provided by the workspace.
Describe a simple development workflow that queries and aggregates data.

Databricks Workspaces and Services

Databricks Architecture and Services.
Data Science and Engineering Workspace.
Create and Manage Interactive Clusters.
Notebook Basics.
Git Versioning with Databricks Repos.
Using Databricks Repos.
Getting Started with the Databricks Platform.

Delta Lakehouse

What is Delta Lake.
Managing Delta Tables.
Manipulating Tables with Delta Lake.
Advanced Delta.

Relational Entities on Databricks

Databases and Views.
Views and CTEs.

ETL with Spark SQL

Query Files Directly.
Providing Options.
Creating Delta Tables.
Writing to Tables.
Cleaning Data.
Advanced SQL Transformations.
UDFs.

Getting Started with Databricks SQL

Navigating Databricks SQL.
Unity Catalog on Databricks SQL.
Schemas, Tables, and Views on Databricks SQL.

Basic SQL on Databricks SQL

Ingesting Data for Databricks SQL.
Joins.
Delta Commands in Databricks SQL.

Presenting Data Visually

Data Visualization.
Data Visualizations on Databricks SQL.
Dashboards on Databricks SQL.
Notifying Stakeholders.

Apache Spark Programming – DataFrames

Databricks Platform.
Databricks Ecosystem.
Spark SQL.
DataFrames.
SparkSession.
Reader and Writer.
Data Sources.
DataFrame and Column.
Column and Expression.
Transformation Actions and Rows.

Apache Spark Programming – Transformations

Aggregation.
Aggregation Functions.
Datetimes.
Dates and Timestamps.
Complex Types.
Additional Functions.
UDFs.
UDFs Vectorized Functions.

Apache Spark Programming – Spark Internals

Spark Architecture.
Spark Cluster, Spark Execution.
Shuffling and Caching.
Query Optimization.
Partitioning.

Apache Spark Programming – Structured Streaming

Apache Spark Programming.
Streaming.

Contact us

we will organize training for you tailored to your needs

Przemysław Wołosz

Key Account Manager

+48 730 830 801

przemyslaw.wolosz@infoShareAcademy.com