Coronavirus (COVID-19) Update
All ongoing and spring classes have been moved to a remote format. Please check our coronavirus update page for our latest announcements.
Apache Spark with Scala, Introduction
Apache Spark is a unified data analytics engine that can support batch, interactive, iterative, streaming, and graph processing use cases. The combination of elegant application programming interfaces (APIs) and a fast in-memory, general-purpose cluster computing system makes it an attractive option for companies to leverage for various data processing needs. Written in Scala, Apache Spark APIs are available in three programming languages: Scala, Java, and Python. This course, however, focuses on the API in Scala language, a functional programming language.
In this foundational course you will explore Apache Spark, its architecture, and the execution model. We’ll start with a short introduction to Scala, its basic syntax, case class, and collection APIs. You’ll learn how to process large amounts of data using DataFrame, Apache Spark’s structured data processing programming model that provides simple, powerful APIs. In addition to batch and iterative data processing, Apache Spark also supports stream processing, which enables companies to extract interesting and useful business insights at near real-time.
The second half of the course covers stream processing capability and developing streaming applications with Apache Spark. We will briefly cover machine learning and how the Apache Spark MLlib component makes practical machine learning scalable and easy.
By the end of the course, you’ll have a good foundation in Scala language and a strong understanding of Apache Spark’s architecture, execution model, and programming model. You’ll be able to manipulate DataFrame through Apache Spark’s API and develop Apache Spark applications in Scala for batch, interactive, and stream processing applications. You will gain fundamental concepts in machine learning and be able to leverage MLlib library to build simple machine learning applications.
- Introduction to Apache Spark architecture and execution model
- Introduction to Scala programming language
- Apache Spark programming model with DataFrame
- Data processing with Apache Spark DataFrame Scala APIs
- How to develop Apache Spark applications with Scala
- Development of streaming processing applications with Apache Spark
- Introduction to streaming processing with Apache Spark
- Machine learning and developing machine learning applications with MLlib library
Skills Needed: Programming experience with Java is required. Knowledge of Hadoop is recommended.
Course Availability Notification
Please use this form to be notified when this course is open for enrollment.
Ask us any questions you may have about this course.