Building Closed Memorial Day Weekend
The Silicon Valley Campus is closed Saturday, May 27 through Monday, May 29, 2023, in honor of Memorial Day. We will reopen for classes on Tuesday.

Data Engineering with Hadoop | DBDA.X424
The first half of the course includes an overview of the frameworks for MapReduce and Spark. You will learn how to write MapReduce/Spark jobs and how to optimize data processing applications. The second half of the course covers SQL based tools for Big Data. We use Hive to build ETL jobs. The course also includes the fundamentals of NoSQL databases like HBase and Kafka.
The course consists of interactive lectures, hands-on labs in class, and take home practice exercises. Upon completion of this course, you will possess a strong understanding of the tools used to build Big Data applications using MapReduce, Spark, and Hive.
Learning Outcomes
At the conclusion of the course, you should be able to
- Describe the role Hadoop plays in the analysis of big data
- Discuss the inner workings of Hadoop's computing framework, including MapReduce processing and Hadoop's file system (HDFS)
- Develop MapReduce applications - with exposure to both traditional MR2 and Spark
- Use Hive and NOSQL databases for data analysis
- Leverage the Hadoop ecosystem to become productive in analyzing data
Topics Include
- Big Data applications architecture
- Understanding Hadoop distributed file system (HDFS)
- How MapReduce framework works
- Introduction to HBase (Hadoop NoSQL database)
- Introduction to Apache Kafka
- Developing MapReduce applications
- Introduction to Spark and SparkSQL
- Developing Spark/SparkSQL applications
- Managing tables and query development in Hive
- Introduction to data pipelines
Note(s): This course uses EMR Hadoop distribution. Students are required to have computers—with 64bit CPU and a minimum of 8GB of memory.
Skills Needed: Basic SQL skills and the ability to create simple programs in a modern programming language are required. An understanding of database, parallel or distributed computing is helpful.
Next Section Starts In:
- Save Your Seat
Help us confirm course scheduling. Enroll at least seven days before your course starts. - Accessing Canvas
Learn more about gaining access to your course on Canvas in our FAQ section. -
Accessibility and Accommodation
For accessibility questions or to request an accommodation, please visit Access for Students with Disabilities or email the Extension registrar. -
Finance Your Education
Here are ways to pay for your education.
Sections Open for Enrollment:
Schedule
Date: | Start Time: | End Time: | Meeting Type: | Location: |
---|---|---|---|---|
Sat, 06-17-2023 | 9:00 a.m. | 12:00 p.m. | Live-Online | REMOTE |
Sat, 06-24-2023 | 9:00 a.m. | 12:00 p.m. | Live-Online | REMOTE |
Sat, 07-08-2023 | 9:00 a.m. | 12:00 p.m. | Live-Online | REMOTE |
Sat, 07-15-2023 | 9:00 a.m. | 12:00 p.m. | Live-Online | REMOTE |
Sat, 07-22-2023 | 9:00 a.m. | 12:00 p.m. | Live-Online | REMOTE |
Sat, 07-29-2023 | 9:00 a.m. | 12:00 p.m. | Live-Online | REMOTE |
Sat, 08-05-2023 | 9:00 a.m. | 12:00 p.m. | Live-Online | REMOTE |
Sat, 08-12-2023 | 9:00 a.m. | 12:00 p.m. | Live-Online | REMOTE |
Sat, 08-19-2023 | 9:00 a.m. | 12:00 p.m. | Live-Online | REMOTE |
Sat, 08-26-2023 | 9:00 a.m. | 12:00 p.m. | Live-Online | REMOTE |