Apache Spark with Scala, Introduction | DBDA.X400
Apache Spark is a unified data analytics engine that can support batch, interactive, iterative, streaming, and graph processing use cases. The combination of elegant application programming interfaces (APIs) and a fast in-memory, general-purpose cluster computing system makes it an attractive option for companies to leverage for various data processing needs. Written in Scala, Apache Spark APIs are available in three programming languages: Scala, Java, and Python. This course, however, focuses on the API in Scala language, a functional programming language.
In this foundational course you will explore Apache Spark, its architecture, and the execution model. We’ll start with a short introduction to Scala, its basic syntax, case class, and collection APIs. You’ll learn how to process large amounts of data using DataFrame, Apache Spark’s structured data processing programming model that provides simple, powerful APIs. In addition to batch and iterative data processing, Apache Spark also supports stream processing, which enables companies to extract interesting and useful business insights at near real-time.
The second half of the course covers stream processing capability and developing streaming applications with Apache Spark. We will briefly cover machine learning and how the Apache Spark MLlib component makes practical machine learning scalable and easy.
By the end of the course, you’ll have a good foundation in Scala language and a strong understanding of Apache Spark’s architecture, execution model, and programming model. You’ll be able to manipulate DataFrame through Apache Spark’s API and develop Apache Spark applications in Scala for batch, interactive, and stream processing applications. You will gain fundamental concepts in machine learning and be able to leverage MLlib library to build simple machine learning applications.
At the conclusion of the course, you should be able to:
- Describe the Apache Spark’s architecture, execution model and programming model
- Perform data processing by manipulating Apache Spark DataFrame APIs
- Build batch and streaming data processing applications using Apache Spark
- Build small to medium Scala applications using Scala programming language
Skills Needed: Programming experience with Java is required. Knowledge of Hadoop is recommended.
Course Availability Notification
Please use this form to be notified when this course is open for enrollment.