The course introduces Apache Spark, its architecture, and the execution model. The course includes a short introduction to the functional programming language Scala with basic syntax, case class and collection APIs. You’ll learn how to manipulate Apache Spark’s programming model, Resilient Distributed Dataset (RDD), through its APIs for data processing, and understand how to build Spark applications with Scala. In addition to batch and iterative data processing, Apache Spark also supports stream processing, which is very important for companies to extract business insight at near real-time. The second half of the course covers stream processing capability and developing streaming applications with Apache Spark.
By the end of the course, you’ll have a good foundation in Scala language and a strong understanding of Apache Spark’s architecture, execution model and programming model. In addition, you’ll be able to manipulate RDDs through Apache Spark’s API and develop Apache Spark applications in Scala for batch, interactive and stream processing applications. You should have prior object-oriented programming experience to learn Scala and this course only offers a short introduction to Scala.
- Big data processing ecosystem
- Introduction to Apache Spark architecture and execution model
- Introduction to Scala programming language
- Apache Spark programming model with RDD
- Data processing with Apache Spark RDD Scala APIs
- How to develop Apache Spark applications with Scala
- Introduction to streaming processing with Apache Spark
- How to develop stream processing applications with Apache Spark
Skills Needed: Programming experience with Java is required. Knowledge of Hadoop is recommended.