Method
Flexible
Term
WINTER
Units
3.0 QUARTER UNITS
Cost
$910

Course Description

Please note: This course was previously offered under the title “Big Data, Introduction.”


In the era of big data and compute-intensive analytics, the ability to write high-performance Python code is essential. This course is designed for learners with basic Python knowledge who want to handle large volumes of data efficiently and optimize their workflows. We will explore how to make Python performant—moving beyond basic pandas use—by introducing tools, techniques, and tradeoffs for improving execution speed, memory use, and scalability.

You will learn strategies such as vectorization, avoiding unnecessary loops, leveraging data structures like NumPy arrays, and using multithreading/multiprocessing. We will also explore distributed computing with PySpark and Dask, and introduce Polars as a cutting-edge alternative to pandas. These skills will be placed in the broader context of big data frameworks and architectures, including Apache Spark, Apache Kafka, and modern NoSQL databases like MongoDB and Cassandra. GPU optimization techniques will also be discussed at an introductory level.

The final project will integrate these concepts into the design of a high-performance data processing pipeline, giving you hands-on experience with tools and methods to analyze large datasets efficiently.

Topics

  • Introduction to performance optimization in Python for data analytics
  • Tradeoffs in compute time, memory, latency, and scalability
  • Vectorization and avoiding inefficient loops
  • Working with NumPy arrays and alternative data structures
  • Multithreading and multiprocessing in Python
  • Distributed computing with PySpark and Dask
  • Introduction to Polars for high-speed data processing
  • Apache Kafka for real-time data streams
  • NoSQL databases: MongoDB and Cassandra
  • GPU acceleration for Python workloads
  • Designing a high-performance data pipeline

    Prerequisites / Skills Needed

Basic Python programming knowledge and familiarity with Python data analysis libraries such as pandas, or completion of a course such as DBDA.X420 - “Python for Data Analysis.”

 

  • Flexible Attend in person or via Zoom at scheduled times.
Schedule
Date
Start Time
End Time
Meeting Type
Location
Fri, 01-09-2026
6:30pm
9:30pm
Flexible
SANTA CLARA / REMOTE
Fri, 01-09-2026
6:30pm
9:30pm
Flexible
SANTA CLARA / REMOTE
Fri, 01-16-2026
6:30pm
9:30pm
Flexible
SANTA CLARA / REMOTE
Fri, 01-16-2026
6:30pm
9:30pm
Flexible
SANTA CLARA / REMOTE
Fri, 01-23-2026
6:30pm
9:30pm
Flexible
SANTA CLARA / REMOTE
Fri, 01-23-2026
6:30pm
9:30pm
Flexible
SANTA CLARA / REMOTE
Fri, 01-30-2026
6:30pm
9:30pm
Flexible
SANTA CLARA / REMOTE
Fri, 01-30-2026
6:30pm
9:30pm
Flexible
SANTA CLARA / REMOTE
Fri, 02-06-2026
6:30pm
9:30pm
Flexible
SANTA CLARA / REMOTE
Fri, 02-06-2026
6:30pm
9:30pm
Flexible
SANTA CLARA / REMOTE
Fri, 02-13-2026
6:30pm
9:30pm
Flexible
SANTA CLARA / REMOTE
Fri, 02-13-2026
6:30pm
9:30pm
Flexible
SANTA CLARA / REMOTE
Fri, 02-20-2026
6:30pm
9:30pm
Flexible
SANTA CLARA / REMOTE
Fri, 02-20-2026
6:30pm
9:30pm
Flexible
SANTA CLARA / REMOTE
Fri, 02-27-2026
6:30pm
9:30pm
Flexible
SANTA CLARA / REMOTE
Fri, 02-27-2026
6:30pm
9:30pm
Flexible
SANTA CLARA / REMOTE
Fri, 03-06-2026
6:30pm
9:30pm
Flexible
SANTA CLARA / REMOTE
Fri, 03-06-2026
6:30pm
9:30pm
Flexible
SANTA CLARA / REMOTE
Fri, 03-13-2026
6:30pm
9:30pm
Flexible
SANTA CLARA / REMOTE
Fri, 03-13-2026
6:30pm
9:30pm
Flexible
SANTA CLARA / REMOTE
 

10/22/25: Instructor TBA.

This class meets simultaneously in a classroom and remotely via Zoom. Students are expected to attend and participate in the course, either in-person or remotely, during the days and times that are specified on the course schedule. Students attending remotely are also strongly encouraged to have their cameras on to get the most out of the remote learning experience. Students attending the class in-person are expected to bring a laptop to each class meeting.

To see all meeting dates, click "Full Schedule" below.

You will be granted access in Canvas to your course site and course materials approximately 24 hours prior to the published start date of the course.

Demo