Big Data Platform Design Using Apache Tools

Level

Intermediate

Duration

24h / 3 days

Date

Individually arranged

Price

Individually arranged

Big Data Platform Design Using Apache Tools

The Big Data Platform Design Using Apache Tools training is a practical, 3-day workshop where participants will learn modern methods of building scalable and efficient Big Data platforms. The program is based on a set of popular and open-source Apache tools such as Apache Hadoop, Spark, Kafka, NiFi, Flink, Iceberg, and Airflow. This course not only covers the theoretical foundations of architecture but also provides practical skills in designing, implementing, and managing complex analytical systems. The training combines 80% practice with 20% theory, enabling quick acquisition of competencies to work with large data volumes in production environments.

What You Will Learn

Design and implement data pipelines for batch and stream processing
Understand the principles of building modern, scalable Big Data architecture using Apache tools
Gain skills in configuring and managing systems like Hadoop, Kafka, NiFi, Spark, and Flink
Master techniques for managing metadata, data lineage, and automating workflows
Learn best deployment practices and methods for optimizing and monitoring Big Data platforms

Who is this training for?

IT specialists, Big Data architects, and data engineers aiming to design modern, scalable Big Data platforms
DevOps and administrators responsible for deploying and managing Hadoop/Spark/Kafka infrastructure
Data analysts and engineers who wish to understand the architecture and tools of Apache for data processing and analysis
Individuals planning to expand existing solutions or start new Big Data projects

Training Program

Day 1: Fundamentals of Big Data Architecture and Apache Tools

Module 1: Introduction to Big Data Architecture
- Basic concepts and layers of Big Data architecture: data, processing, management, analysis
- Architecture models: Data Lake, Lambda, Kappa, Data Lakehouse
- Design criteria: data type, scalability, batch vs. stream processing
- Overview of data processing methods: batch vs. stream
Module 2: Apache Hadoop and HDFS
- HDFS architecture: NameNode and DataNode roles
- Batch processing with MapReduce – basics and use cases
- Administration and monitoring of Hadoop clusters
Module 3: Basics of Python Programming in the Context of Big Data
- Functional programming concepts and Python vs. Java comparison
- Python elements for data processing: DataFrames, lambdas, comprehensions, map, filter
- Practical exercises: simple data processing and integration with Big Data tools (e.g. PySpark)

Day 2: Data Processing and Integration Tools

Module 4: Streaming and Queues – Apache Kafka and Apache NiFi
- Apache Kafka architecture: producers, consumers, partitions, replication
- Apache NiFi: managing data flows and integrating sources and sinks
- Practical exercises: creating and monitoring data flows
Module 5: Real-Time and Batch Data Analysis – Apache Spark and Flink
- Spark architecture: RDD, DataFrame, Spark SQL
- Flink: stream processing, time windows, state management
- Designing batch and streaming jobs, optimization, Catalyst
- Integration with Apache Hadoop and application deployment

Day 3 (Optional): Data Storage, Workflow Management, and Governance

Module 6: Data and Metadata Management
- Apache Iceberg: scalable table format, ACID support, query optimization
- Apache Atlas: metadata management, governance, data lineage
- Apache Druid: architecture, indexing, real-time and batch analytics
Module 7: Automation and Orchestration – Apache Airflow and CI/CD
- Designing workflows and managing dependencies with Airflow
- Implementing data pipelines and automating processing
- Integration with CI/CD tools and production environments
- Defining DAGs and working with tasks in Python and Bash

Contact us

we will organize training for you tailored to your needs

Przemysław Wołosz

Key Account Manager

+48 730 830 801

przemyslaw.wolosz@infoShareAcademy.com

Big Data Platform Design Using Apache Tools

Big Data Platform Design Using Apache Tools

What You Will Learn

Training Program

Day 1: Fundamentals of Big Data Architecture and Apache Tools

Day 2: Data Processing and Integration Tools

Day 3 (Optional): Data Storage, Workflow Management, and Governance

Contact us

we will organize training for you tailored to your needs