Azure Databricks Training
Level
IntermediateDuration
24h / 3 daysDate
Individually arrangedPrice
Individually arrangedAzure Databricks Training
Azure Databricks is a big data service based on the Apache Spark platform that enables the creation, training, and exploration of data in the cloud. It is a data processing platform that provides scalability, performance, and ease of use. Azure Databricks allows teams to coordinate work more easily and share code.
What You Will Learn
- Fundamentals of the Azure Databricks platform.
- Data processing and preparation techniques.
- Data analysis using Databricks SQL.
- Utilization of Apache Spark for data processing.
Who is this training for?
Individuals who want to leverage data to optimize processes.
Those who wish to deepen their understanding of Apache Spark.
Individuals with basic knowledge of data analysis.
Developers, Data Engineers, and Data Scientists.
Training Program
-
What is the Databricks Lakehouse Platform
- Description of the Databricks Lakehouse Platform
- Origin of the Lakehouse data management paradigm
- Fundamental challenges in managing and using data
- Security features of the Databricks Lakehouse Platform
- Examples of organizations benefiting from Databricks Lakehouse
-
What is Databricks SQL
- Fundamental concepts for using Databricks SQL effectively
- Tools and features for querying data and sharing insights
- Supporting data analysis workflows and business insight sharing
-
What is Databricks Machine Learning
- Overview of Databricks Machine Learning
- Benefits for data science and machine learning teams
- Core components and functionalities
- Examples of real-world customer use cases
-
Databricks Data Science and Data Engineering Workspace
- Overview of the workspace
- Assets provided by the workspace
- Example development workflow for querying and aggregating data
-
Databricks Workspaces and Services
- Databricks architecture and services
- Data Science and Engineering Workspace
- Creating and managing interactive clusters
- Notebook basics
- Git versioning with Databricks Repos
- Using Databricks Repos
- Getting started with the Databricks platform
-
Delta Lakehouse
- What is Delta Lake
- Managing Delta tables
- Manipulating tables with Delta Lake
- Advanced Delta features
-
Relational Entities on Databricks
- Databases and views
- Views and Common Table Expressions (CTEs)
-
ETL with Spark SQL
- Querying files directly
- Providing options
- Creating Delta tables
- Writing to tables
- Cleaning data
- Advanced SQL transformations
- User-defined functions (UDFs)
-
Getting Started with Databricks SQL
- Navigating Databricks SQL
- Unity Catalog on Databricks SQL
- Schemas, tables, and views
- Basic SQL operations
- Ingesting data
- Joins
- Delta commands in Databricks SQL
-
Presenting Data Visually
- Data visualization concepts
- Visualizations in Databricks SQL
- Dashboards
- Notifying stakeholders
-
Apache Spark Programming – DataFrames
- Databricks platform and ecosystem
- Spark SQL
- DataFrames and SparkSession
- Reader and writer APIs
- Data sources
- DataFrame, column, and expressions
- Transformations, actions, and rows
-
Apache Spark Programming – Transformations
- Aggregation and aggregation functions
- Date and time processing
- Dates and timestamps
- Complex data types
- Additional functions
- UDFs and vectorized UDFs
-
Apache Spark Programming – Spark Internals
- Spark architecture
- Spark cluster and execution model
- Shuffling and caching
- Query optimization
- Partitioning
-
Apache Spark Programming – Structured Streaming
- Streaming fundamentals in Apache Spark