NiFi (Niagrafiles) was created by the NSA as a big-data automation system. The NSA released NiFi as open-source software in 2014 and became an incubator development as Apache NiFi. Apache Nifi graduated from incubator status and became a Top-Level-Project with the Apache Software Foundation on July 20, 2015.
Apache NiFi enables visual programming for the creation of scalable directed graphs for data routing, transformation, system mediation logic and data provenance.
This course introduces the student to the high-level capabilities of Apache NiFi and a subproject of NiFi, MiNiFi.
Discover how to exploit the following capabilities of NiFi and MiNiFi
- The Web-based user interface of NiFi.
- The interface allows for a seamless experience between design, control, feedback, and monitoring
- The simple, yet highly configurable editors that allow the student to incorporate the behaviors necessary for:
- Loss tolerant vs guaranteed delivery
- Low latency vs high throughput
- Dynamic prioritization
- Flow can be modified at runtime
- Back pressure
- The comprehensive treatment of Data Provenance.
- Each NiFi processor can be made to contribute to the comprehensive tracking of dataflow from beginning to end. Data provenance refers to records of the inputs, entities, systems, and processes that influence data of interest, providing a historical record of the data and its origins
- The NiFi extension points.
- The student will learn how to build their own processors for rapid development and effective testing
- The ability to place existing and new security processes on a NiFi data flow.
- While the class will not cover existing security frameworks in detail (such as SSL, SSH, HTTPS, encrypted content, etc…), the student will be shown the manner in which multi-tenant authorization and internal authorization/policy management can be incorporated into a NiFi data flow.
- MiNiFi, a subproject of Apache NiFi, is a complementary data collection approach that supplements the core tenets of NiFi in dataflow management, focusing on the collection of data at the source of its creation. The student will be provided a Raspberry Pi processor to allow the student to understand that the perspectives of the role of MiNiFi are from the perspective of the agent acting immediately at, or directly adjacent to, source sensors, systems, or servers. At the completion of the course the student will understand the specific capabilities MiNiFi. These capabilities are:
- Small size and low resource consumption
- Central management of agents
- Generation of data provenance (full chain of custody of information)
- Integration with NiFi for follow-on dataflow management
PREREQUISITES
Development experience with Linux, Java and Hadoop are a prerequisite. Knowledge or experience with implementing EAI/EII patterns is assumed. It is suggested that a student new to Hadoop first take the DFHz course “Advanced Hadoop.” A student not familiar with EAI/EII patterns might wish to visit http://www.enterpriseintegrationpatterns.com/
TARGET AUDIENCE
This course is intended for data infrastructure architects, data engineers, data analysts and individuals involved with data governance and data provenance who need to automate the requirements and capabilities of their existing or greenfield data flow topologies and data infrastructures.
FORMAT
50% Lecture 50% Hands-on Labs
DURATION
This is a 5 day class when taught on-site with ILT or via web-ex with VILT. It is also offered on a per-module basis for on-line self-enablement via our LMS, Brane.
AGENDA SUMMARY
Day 1: Introduction to NiFi
Day 2: Introduction to MiNiFi
Day 3: NiFi and Data Provenance Practices
Day 4: Integration of NiFi with Security frameworks and practices
Day 5: Extending NiFi