Fall Hours • COVID-19 Update

The Silicon Valley Campus is open 4–9:30 p.m. on Monday–Friday and 8 a.m.–5 p.m. on Saturday.

All campus visitors must be vaccinated, wear a mask, & submit a COVID symptom check. Visit our COVID page for updates.


Deep Reinforcement Learning | AISV.802

This advanced course starts with a quick review of some deep learning architectures followed by an introduction to fundamental concepts of reinforcement learning (RL) that we illustrate with concrete examples. Next, we’ll explore the Bellman equation, policies, models, Q-learning, the SARSA algorithm, and temporal difference (TD) learning.

In this deep reinforcement learning (DRL) course, you will learn how to solve common tasks in RL, including some well-known simulations, such as CartPole, MountainCar, and FrozenLake. You will be introduced to concepts such as clipping regions and policy gradients, as well as an extensive collection of algorithms, including DQN, prioritized experience replay, DDQN, D4PG, A2C, PPO, TRPO, DDPG, A2C, and SAC.

Eventually the course introduces additional algorithms, such as ACER and ACTKR, as well as DRL libraries, such as Google Dopamine and Tensor Flow-Agents. In almost all cases, the code samples are written in TF2.Keras, along with a limited number of code samples in PyTorch. The development of a plethora of DRL algorithms has improved the accuracy of diverse areas, such as natural language processing and robotics. In addition, DRL-based systems represent the state-of-the-art in Go as well as highly sophisticated multi-player games (including StarCraft and Dota).

Topics Include:

  • Deep learning architectures
  • Markov decision processes
  • Reinforcement and deep reinforcement learning
  • Policy gradients and various algorithms
  • Proximal policy optimization
  • Various actor/critic algorithms
  • Deep RL libraries

Learning Outcomes:

At the conclusion of the course, the student should be able to:

  • Describe Q learning, models, and policies
  • Define the purpose of the Bellman equation
  • Discuss the advantages/disadvantages of reinforcement learning
  • Explain how the epsilon-greedy algorithm differs from a pure greedy algorithm
  • Discuss how deep learning enhances reinforcement learning
  • Describe GANs and how they pertain to autonomous vehicles

Prerequisites - Please note that this course covers advanced topics, and students are expected to have completed one of the prerequisite courses or have equivalent experience."

Have a question about this course?
Speak to a student services representative.
Call (408) 861-3860
  • Save your seat and help us confirm course scheduling. Enroll at least seven days before your course starts.
  • ACCESSING CANVAS—Learn more about accessing your course on Canvas in our FAQ section.
This course is related to the following programs:


Sections Open for Enrollment:

Open Sections and Schedule
Start / End Date Units Cost Instructor
01-26-2022 to 03-30-2022 3.0 CEUs $1020

Ajay K Baranwal



Date: Start Time: End Time: Meeting Type: Location:
Wed, 01-26-2022 6:30 p.m. 9:30 p.m. Live-Online REMOTE
Wed, 02-02-2022 6:30 p.m. 9:30 p.m. Live-Online REMOTE
Wed, 02-09-2022 6:30 p.m. 9:30 p.m. Live-Online REMOTE
Wed, 02-16-2022 6:30 p.m. 9:30 p.m. Live-Online REMOTE
Wed, 02-23-2022 6:30 p.m. 9:30 p.m. Live-Online REMOTE
Wed, 03-02-2022 6:30 p.m. 9:30 p.m. Live-Online REMOTE
Wed, 03-09-2022 6:30 p.m. 9:30 p.m. Live-Online REMOTE
Wed, 03-16-2022 6:30 p.m. 9:30 p.m. Live-Online REMOTE
Wed, 03-23-2022 6:30 p.m. 9:30 p.m. Live-Online REMOTE
Wed, 03-30-2022 6:30 p.m. 9:30 p.m. Live-Online REMOTE