https://dfheinz.com/byos/
ASF Service Stack Hadoop would not have become synonymous with “Big Data” had it not been for the pioneering work and marketing efforts of companies such as MapR, Cloudera and Hortonworks. Each of these organizations made the concurrent use of a number of ASF distributed, component-based services accessible by bundling these services in a deployable stack with a centralized management component. Today, almost a decade since the commercial introduction of Hadoop, many organizations are using and managing the services of Hadoop along with those of other ASF components such as Pig, Hive, Scoop, Mahout, Flume and more.
Metron is an amalgamation and augmentation of several open-source ASF projects that provides a centralized management capability for security monitoring and analysis for the identification and disposition of any level of a cyberthreat. Metron provides capabilities for log aggregation, full packet capture indexing, storage, advanced behavioral analytics and data enrichment. Metron is a single platform that applies the most current threat-intelligence information to security telemetry.
https://dfheinz.com/flink/
Apache Flink is an open source platform for distributed stream and batch data processing Flink’s core is a streaming dataflow engine that provides data distribution, communication, and fault tolerance for distributed computations over data streams. Flink builds batch processing on top of the streaming engine, overlaying native iteration support, managed memory, and program optimization. This 5 day course covers every technical aspect a developer, architect and DevOps individual will need to know to install, administrate, develop toward, manage and monitor every capability of this 4th generation, distributed data-flow Apache Software Project (ASF).
https://dfheinz.com/hadoop/
The Apache Hadoop software library is a framework that allows for the distributed storage and analysis of large data sets across clusters of computers supporting simple-to-complex programming models. Designed to scale up from single servers to thousands of machines, Advanced Hadoop offers a robust local computation and storage model. Rather than rely on hardware to deliver high-availability, the modern library manages failures at the application layer. This enables highly-available services to be delivered seamlessly on top of computer clusters.
https://dfheinz.com/blockchainbasics/
This course will present a non-technical introduction to blockchain fundamentals. This course fills the gap between purely technical description and implementation of blockchain and the ability to ascertain the economic impact the application of blockchain may have in the future.

 

 

 

 

 

 

 

 

 

A technical introduction to the blockchain data structure independent of implementation such as how it is used in the Bitcoin network. An understanding of blockchain data structure will augment the access and storage challenges for any use case.
Apache Kafka is a multi-purpose distributed streaming platform that offers a fully-functional SQL interface. Kafka can be used to build streaming data pipelines that reliably get data between systems or applications and build streaming applications that transform, analyze or react to streams of data. It offers simple, fast data transport from one data system to another and eliminates the micro-batching pains often experienced by Spark or Storm users.
Apache NiFi enables visual programming for the creation of scalable directed graphs for data routing, transformation, system mediation logic and data provenance. This course is intended for data infrastructure architects, data engineers, data analysts and individuals involved with data governance and data provenance who need to automate the requirements and capabilities of their existing or greenfield data flow topologies and data infrastructures.

 

 

With its power and simplicity Python has become the scripting language of choice for many large organizations, including Nissan, Google and IBM. Apache Zeppelin is an open-source, web-based notebook and integrated development environment (IDE) that enables data-driven, interactive data analytics and collaborative documents with Python, SQL, Scala, R, Java and other programming languages and frameworks.
Many organizations leading social movements, Twitter and Foursquare as an example, are early champions of Scala and are currently in production using Scala applications. In this course, construct elegant class hierarchies for maximum code reuse and extensibility, implement their behavior using higher-order functions and anything in-between.
The ASF Hive project gained popularity quickly because it allowed data users to visualize the distributed, poly-structured data in the HDFS as SQL datasets (Tables) and to perform transformational operations on the visualizations with SQL queries. Hive has undergone significant rewrites to appease data users who are familiar with SQL and who demand the capabilities and performance of a relational data base even though the data in HDFS is stored as distributed, poly-structured, possibly replicated pieces (blocks) across a network.
Machine learning algorithms are used to devise complex models and algorithms that lend themselves to prediction. In commercial use, this is known as “predictive analytics.” Analytical models allow data scientists to “produce reliable, repeatable decisions and results” and uncover “hidden insights” through learning from historical relationships and trends in the data. This course instructs the student in key concepts and fundamental practices of machine learning (through lecture and labs using the Scikit-Learn libraries) that are relevant to the activities of a data scientist.

 

This course provides the gate-way to becoming a data scientist. The entry-point to becoming a data scientist is knowledge of various statistical techniques used by data scientists referred to as exploratory data analysis, or EDA. Individuals needing to be exposed to over 30 essential concepts in statistics needed by Data Scientist.
This course will augment a network engineer’s hardware and network knowledge to transform the individual into a highly-sought, hybrid hardware-component-aware/software-component-ware data flow architect, engineer or analyst. This 4 day class will show network engineers how to work with software engineers to merge the intelligent behavior of the ASF components with the data flow speed and volume of network hardware components.