Coronavirus (COVID-19) Update

Enjoy a fully remote summer of professional education at the Silicon Valley Campus and the re-emergence of in-person courses in September. Visit our COVID update page.


Apache Spark with Scala, Introduction | DBDA.X400

Apache Spark is a unified data analytics engine that can support batch, interactive, iterative, streaming, and graph processing use cases. The combination of elegant application programming interfaces (APIs) and a fast in-memory, general-purpose cluster computing system makes it an attractive option for companies to leverage for various data processing needs. Written in Scala, Apache Spark APIs are available in three programming languages: Scala, Java, and Python. This course, however, focuses on the API in Scala language, a functional programming language.

In this foundational course you will explore Apache Spark, its architecture, and the execution model. We’ll start with a short introduction to Scala, its basic syntax, case class, and collection APIs. You’ll learn how to process large amounts of data using DataFrame, Apache Spark’s structured data processing programming model that provides simple, powerful APIs. In addition to batch and iterative data processing, Apache Spark also supports stream processing, which enables companies to extract interesting and useful business insights at near real-time.

The second half of the course covers stream processing capability and developing streaming applications with Apache Spark. We will briefly cover machine learning and how the Apache Spark MLlib component makes practical machine learning scalable and easy.

By the end of the course, you’ll have a good foundation in Scala language and a strong understanding of Apache Spark’s architecture, execution model, and programming model. You’ll be able to manipulate DataFrame through Apache Spark’s API and develop Apache Spark applications in Scala for batch, interactive, and stream processing applications. You will gain fundamental concepts in machine learning and be able to leverage MLlib library to build simple machine learning applications.

Learning Outcomes:
At the conclusion of the course, you should be able to:

  • Describe the Apache Spark’s architecture, execution model and programming model
  • Perform data processing by manipulating Apache Spark DataFrame APIs
  • Build batch and streaming data processing applications using Apache Spark
  • Build small to medium Scala applications using Scala programming language

Skills Needed: Programming experience with Java is required. Knowledge of Hadoop is recommended.

Have a question about this course?
Speak to a student services representative.
Call (408) 861-3860
This course is related to the following programs:


Sections Open for Enrollment:

Open Sections and Schedule
Start / End Date Units Cost Instructor
10-05-2021 to 12-07-2021 3.0 $980

Hien C Luu



Date: Start Time: End Time: Meeting Type: Location:
Tue, 10-05-2021 6:30 p.m. 9:30 p.m. Live-Online REMOTE
Tue, 10-12-2021 6:30 p.m. 9:30 p.m. Live-Online REMOTE
Tue, 10-19-2021 6:30 p.m. 9:30 p.m. Live-Online REMOTE
Tue, 10-26-2021 6:30 p.m. 9:30 p.m. Live-Online REMOTE
Tue, 11-02-2021 6:30 p.m. 9:30 p.m. Live-Online REMOTE
Tue, 11-09-2021 6:30 p.m. 9:30 p.m. Live-Online REMOTE
Tue, 11-16-2021 6:30 p.m. 9:30 p.m. Live-Online REMOTE
Tue, 11-23-2021 6:30 p.m. 9:30 p.m. Live-Online REMOTE
Tue, 11-30-2021 6:30 p.m. 9:30 p.m. Live-Online REMOTE
Tue, 12-07-2021 6:30 p.m. 9:30 p.m. Live-Online REMOTE