Fall Hours • COVID-19 Update

The Silicon Valley Campus is open 4–9:30 p.m. on Monday–Friday and 8 a.m.–5 p.m. on Saturday.

All campus visitors must be vaccinated, wear a mask, & submit a COVID symptom check. Visit our COVID page for updates.

Course

Apache Spark with Scala, Introduction | DBDA.X400


Apache Spark is a unified data analytics engine that can support batch, interactive, iterative, streaming, and graph processing use cases. The combination of elegant application programming interfaces (APIs) and a fast in-memory, general-purpose cluster computing system makes it an attractive option for companies to leverage for various data processing needs. Written in Scala, Apache Spark APIs are available in three programming languages: Scala, Java, and Python. This course, however, focuses on the API in Scala language, a functional programming language.

In this foundational course you will explore Apache Spark, its architecture, and the execution model. We’ll start with a short introduction to Scala, its basic syntax, case class, and collection APIs. You’ll learn how to process large amounts of data using DataFrame, Apache Spark’s structured data processing programming model that provides simple, powerful APIs. In addition to batch and iterative data processing, Apache Spark also supports stream processing, which enables companies to extract interesting and useful business insights at near real-time.

The second half of the course covers stream processing capability and developing streaming applications with Apache Spark. We will briefly cover machine learning and how the Apache Spark MLlib component makes practical machine learning scalable and easy.

By the end of the course, you’ll have a good foundation in Scala language and a strong understanding of Apache Spark’s architecture, execution model, and programming model. You’ll be able to manipulate DataFrame through Apache Spark’s API and develop Apache Spark applications in Scala for batch, interactive, and stream processing applications. You will gain fundamental concepts in machine learning and be able to leverage MLlib library to build simple machine learning applications.


Learning Outcomes:
At the conclusion of the course, you should be able to:

  • Describe the Apache Spark’s architecture, execution model and programming model
  • Perform data processing by manipulating Apache Spark DataFrame APIs
  • Build batch and streaming data processing applications using Apache Spark
  • Build small to medium Scala applications using Scala programming language

Skills Needed: Programming experience with Java is required. Knowledge of Hadoop is recommended.

Have a question about this course?
Speak to a student services representative.
Call (408) 861-3860
ENROLL EARLY!
  • Save your seat and help us confirm course scheduling. Enroll at least seven days before your course starts.
  • ACCESSING CANVAS—Learn more about accessing your course on Canvas in our FAQ section.
This course is related to the following programs:

Prerequisite(s):

Course Availability Notification

Please use this form to be notified when this course is open for enrollment.

Contact Us
Speak to a student services representative.

Call (408) 861-3860

Envelope extension@ucsc.edu