Azure Databricks Training

Level

Intermediate

Duration

24h / 3 days

Date

Individually arranged

Price

Individually arranged

Azure Databricks Training

Azure Databricks is a big data service based on the Apache Spark platform that enables the creation, training, and exploration of data in the cloud. It is a data processing platform that provides scalability, performance, and ease of use. Azure Databricks allows teams to coordinate work more easily and share code.

What You Will Learn

  • Fundamentals of the Azure Databricks platform.
  • Data processing and preparation techniques.
  • Data analysis using Databricks SQL.
  • Utilization of Apache Spark for data processing.
Who is this training for?
  • logo infoshare Individuals who want to leverage data to optimize processes.
  • logo infoshare Those who wish to deepen their understanding of Apache Spark.
  • logo infoshare Individuals with basic knowledge of data analysis.
  • logo infoshare Developers, Data Engineers, and Data Scientists.

Training Program

What is the Databricks Lakehouse Platform

  • Describe what the Databricks Lakehouse Platform is.
  • Explain the origin of the Lakehouse data management paradigm.
  • Outline fundamental challenges related to managing and using data.
  • Describe security features of the Databricks Lakehouse Platform.
  • Provide examples of organizations that have benefited from using the Databricks Lakehouse Platform.

What is Databricks SQL

  • Summarize fundamental concepts for using Databricks SQL effectively.
  • Identify tools and features in Databricks SQL for querying data and sharing insights.
  • Explain how Databricks SQL supports data analysis workflows that allow users to extract and share business insights.

What is Databricks Machine Learning

  • Describe the basic overview of Databricks Machine Learning.
  • Identify how using Databricks Machine Learning benefits data science and machine learning teams.
  • Summarize the fundamental components and functionalities of Databricks Machine Learning.
  • Provide examples of successful use cases of Databricks Machine Learning by real Databricks customers.

What is Databricks Data Science and Data Engineering Workspace

  • Describe the basic overview of Databricks Data Science and Engineering Workspace.
  • Identify assets provided by the workspace.
  • Describe a simple development workflow that queries and aggregates data.

Databricks Workspaces and Services

  • Databricks Architecture and Services.
  • Data Science and Engineering Workspace.
  • Create and Manage Interactive Clusters.
  • Notebook Basics.
  • Git Versioning with Databricks Repos.
  • Using Databricks Repos.
  • Getting Started with the Databricks Platform.

Delta Lakehouse

  • What is Delta Lake.
  • Managing Delta Tables.
  • Manipulating Tables with Delta Lake.
  • Advanced Delta.

Relational Entities on Databricks

  • Databases and Views.
  • Views and CTEs.

ETL with Spark SQL

  • Query Files Directly.
  • Providing Options.
  • Creating Delta Tables.
  • Writing to Tables.
  • Cleaning Data.
  • Advanced SQL Transformations.
  • UDFs.

Getting Started with Databricks SQL

  • Navigating Databricks SQL.
  • Unity Catalog on Databricks SQL.
  • Schemas, Tables, and Views on Databricks SQL.

Basic SQL on Databricks SQL

  • Ingesting Data for Databricks SQL.
  • Joins.
  • Delta Commands in Databricks SQL.

Presenting Data Visually

  • Data Visualization.
  • Data Visualizations on Databricks SQL.
  • Dashboards on Databricks SQL.
  • Notifying Stakeholders.

Apache Spark Programming – DataFrames

  • Databricks Platform.
  • Databricks Ecosystem.
  • Spark SQL.
  • DataFrames.
  • SparkSession.
  • Reader and Writer.
  • Data Sources.
  • DataFrame and Column.
  • Column and Expression.
  • Transformation Actions and Rows.

Apache Spark Programming – Transformations

  • Aggregation.
  • Aggregation Functions.
  • Datetimes.
  • Dates and Timestamps.
  • Complex Types.
  • Additional Functions.
  • UDFs.
  • UDFs Vectorized Functions.

Apache Spark Programming – Spark Internals

  • Spark Architecture.
  • Spark Cluster, Spark Execution.
  • Shuffling and Caching.
  • Query Optimization.
  • Partitioning.

Apache Spark Programming – Structured Streaming

  • Apache Spark Programming.
  • Streaming.

Contact us

we will organize training for you tailored to your needs

Przemysław Wołosz

Key Account Manager

przemyslaw.wolosz@infoShareAcademy.com

    The controller of your personal data is InfoShare Academy Sp. z o.o. with its registered office in Gdańsk, al. Grunwaldzka 427B, 80-309 Gdańsk, KRS: 0000531749, NIP: 5842742213. Personal data are processed in accordance with information clause.