Apache Spark with Scala, Introduction

Apache Spark is a unified data analytics engine that can support batch, interactive, iterative, streaming, and graph processing use cases. The combination of elegant application programming interfaces (APIs) and a fast in-memory, general-purpose cluster computing system makes it an attractive option for companies to leverage for various data processing needs. Written in Scala, Apache Spark APIs are available in three programming languages: Scala, Java, and Python. This course, however, focuses on the API in Scala language, a functional programming language.

In this foundational course you will explore Apache Spark, its architecture, and the execution model. We’ll start with a short introduction to Scala, its basic syntax, case class, and collection APIs. You’ll learn how to process large amounts of data using DataFrame, Apache Spark’s structured data processing programming model that provides simple, powerful APIs. In addition to batch and iterative data processing, Apache Spark also supports stream processing, which enables companies to extract interesting and useful business insights at near real-time.

The second half of the course covers stream processing capability and developing streaming applications with Apache Spark. We will briefly cover machine learning and how the Apache Spark MLlib component makes practical machine learning scalable and easy.

By the end of the course, you’ll have a good foundation in Scala language and a strong understanding of Apache Spark’s architecture, execution model, and programming model. You’ll be able to manipulate DataFrame through Apache Spark’s API and develop Apache Spark applications in Scala for batch, interactive, and stream processing applications. You will gain fundamental concepts in machine learning and be able to leverage MLlib library to build simple machine learning applications.

Learning Outcomes:
At the conclusion of the course, you should be able to

  • Describe the Apache Spark’s architecture, execution model and programming model
  • Perform data processing by manipulating Apache Spark DataFrame APIs
  • Build batch and streaming data processing applications
  • Build small to medium Scala applications using Scala programming language

Topics include:

  • Introduction to Apache Spark architecture and execution model
  • Introduction to Scala programming language
  • Apache Spark programming model with DataFrame
  • Data processing with Apache Spark DataFrame Scala APIs
  • How to develop Apache Spark applications with Scala
  • Development of streaming processing applications with Apache Spark
  • Introduction to streaming processing with Apache Spark
  • Machine learning and developing machine learning applications with MLlib library

Skills Needed: Programming experience with Java is required. Knowledge of Hadoop is recommended.


Course Availability Notification

Please use this form to be notified when this course is open for enrollment.

Contact Us
Speak to a student services representative.

Call (408) 861-3860


Course Inquiry

Ask us any questions you may have about this course.

Contact Us
Speak to a student services representative.

Call (408) 861-3860