Apache Flink is an open source platform for distributed stream and batch data processing
Flink’s core is a streaming dataflow engine that provides data distribution, communication, and fault tolerance for distributed computations over data streams. Flink builds batch processing on top of the streaming engine, overlaying native iteration support, managed memory, and program optimization. This 5 day course covers every technical aspect a developer, architect and DevOps individual will need to know to install, administrate, develop toward, manage and monitor every capability of this 4th generation, distributed data-flow Apache Software Project (ASF).
By 4th generation it is meant that Flink is not just the culmination of the ideas and functions for data flow that developers have had to assemble from predecessor Apache projects such as Spark and Kafka, it is a much more powerful and performant complement or successor to both of those projects. Flink, in fact, changes the very meaning of “data flow” and “infinite” versus
“finite” data sources.
In addition, Flink makes batch and micro-batch processing simple subclasses of true streaming. For those software engineers who use imperative or functional languages, Flink supports Python, Java and Scala APIs. For those developers who work with a tabular data set visualization and SQL, Flink provides a 100% SQL interface. As with the Flink programming APIs, Flink SQL can be used for batch, micro-batch and pure streaming processing. Flink allows the use of the same programming paradigm for data flow and data analysis with finite data sets, infinite data sets, heterogeneous data sets, batch, micro-batch and streaming data.
This course will present all essential concepts, libraries and techniques, in a complete hands-on environment, for understanding, creating and supporting Flink and Flink-ecosystem-based applications.
PREREQUISITES
Development experience with Linux, Java and Hadoop are a prerequisite. Knowledge or experience with implementing EAI/EII patterns is assumed. Experience with a distributed data flow project such as NiFi is helpful. Experience with or comprehensive conceptual knowledge of Spark and/or Kafka are helpful. It is suggested that a student new to Hadoop first take the DFHz course “Advanced Hadoop.” A student not familiar with EAI/EII patterns is referred to http://www.enterpriseintegrationpatterns.com/
TARGET AUDIENCE
We believe the audience for this class will be bifurcated into two types of software engineers. First, those Java or Scala software engineers, with minimal knowledge of Spark and Kafka, who must quickly generate rigorous, extensible, enterprise-level applications reliant upon a distributed data flow topology.
Second, those software engineers who have worked with Java, the Spark API and the Kafka API who desire to understand how the Flink functionality and performance complements or supersedes the functionality offered by Spark and Kafka. Companies like Alibaba, Capital One, Ericsson, Netflix and Uber consider Spark and Kafka to be 3 rd generation and Flink 4 th generation in their capabilities.
FORMAT 50% Lecture, 50% Lab
DURATION
This is a 4 day class when taught on-site with ILT or via web-ex with VILT. It is also offered on a per-module basis for on-line self-enablement via our LMS, Brane.
AGENDA SUMMARY
Day 1: Introduction to Flink concepts, ecosystem, use cases
Day 2: Application development with Flink
Day 3: Extending Flink into the Flink ecosystem
Day 4: DevOps, installation options, deployment and monitoring
Day 5: Performance enhancement practices with Flink and Flink ecosystem