Azure Databricks Training
Level
IntermediateDuration
24h / 3 daysDate
Individually arrangedPrice
Individually arrangedAzure Databricks Training
Azure Databricks is a big data service based on the Apache Spark platform that enables the creation, training, and exploration of data in the cloud. It is a data processing platform that provides scalability, performance, and ease of use. Azure Databricks allows teams to coordinate work more easily and share code.
What You Will Learn
- Fundamentals of the Azure Databricks platform.
- Data processing and preparation techniques.
- Data analysis using Databricks SQL.
- Utilization of Apache Spark for data processing.
Who is this training for?
Individuals who want to leverage data to optimize processes.
Those who wish to deepen their understanding of Apache Spark.
Individuals with basic knowledge of data analysis.
Developers, Data Engineers, and Data Scientists.
Training Program
What is the Databricks Lakehouse Platform
- Describe what the Databricks Lakehouse Platform is.
- Explain the origin of the Lakehouse data management paradigm.
- Outline fundamental challenges related to managing and using data.
- Describe security features of the Databricks Lakehouse Platform.
- Provide examples of organizations that have benefited from using the Databricks Lakehouse Platform.
What is Databricks SQL
- Summarize fundamental concepts for using Databricks SQL effectively.
- Identify tools and features in Databricks SQL for querying data and sharing insights.
- Explain how Databricks SQL supports data analysis workflows that allow users to extract and share business insights.
What is Databricks Machine Learning
- Describe the basic overview of Databricks Machine Learning.
- Identify how using Databricks Machine Learning benefits data science and machine learning teams.
- Summarize the fundamental components and functionalities of Databricks Machine Learning.
- Provide examples of successful use cases of Databricks Machine Learning by real Databricks customers.
What is Databricks Data Science and Data Engineering Workspace
- Describe the basic overview of Databricks Data Science and Engineering Workspace.
- Identify assets provided by the workspace.
- Describe a simple development workflow that queries and aggregates data.
Databricks Workspaces and Services
- Databricks Architecture and Services.
- Data Science and Engineering Workspace.
- Create and Manage Interactive Clusters.
- Notebook Basics.
- Git Versioning with Databricks Repos.
- Using Databricks Repos.
- Getting Started with the Databricks Platform.
Delta Lakehouse
- What is Delta Lake.
- Managing Delta Tables.
- Manipulating Tables with Delta Lake.
- Advanced Delta.
Relational Entities on Databricks
- Databases and Views.
- Views and CTEs.
ETL with Spark SQL
- Query Files Directly.
- Providing Options.
- Creating Delta Tables.
- Writing to Tables.
- Cleaning Data.
- Advanced SQL Transformations.
- UDFs.
Getting Started with Databricks SQL
- Navigating Databricks SQL.
- Unity Catalog on Databricks SQL.
- Schemas, Tables, and Views on Databricks SQL.
Basic SQL on Databricks SQL
- Ingesting Data for Databricks SQL.
- Joins.
- Delta Commands in Databricks SQL.
Presenting Data Visually
- Data Visualization.
- Data Visualizations on Databricks SQL.
- Dashboards on Databricks SQL.
- Notifying Stakeholders.
Apache Spark Programming – DataFrames
- Databricks Platform.
- Databricks Ecosystem.
- Spark SQL.
- DataFrames.
- SparkSession.
- Reader and Writer.
- Data Sources.
- DataFrame and Column.
- Column and Expression.
- Transformation Actions and Rows.
Apache Spark Programming – Transformations
- Aggregation.
- Aggregation Functions.
- Datetimes.
- Dates and Timestamps.
- Complex Types.
- Additional Functions.
- UDFs.
- UDFs Vectorized Functions.
Apache Spark Programming – Spark Internals
- Spark Architecture.
- Spark Cluster, Spark Execution.
- Shuffling and Caching.
- Query Optimization.
- Partitioning.
Apache Spark Programming – Structured Streaming
- Apache Spark Programming.
- Streaming.