Deep Reinforcement Learning - Course | UCSC Silicon Valley Extension

Deep Reinforcement Learning - AISV_X403 - Stacks of colorful building blocks of data — Deep Reinforcement Learning | AISV.X403

Reinforcement Learning from Human Feedback (RLHF) is a critical component of ChatGPT to improve rewards on the generated text. This course will introduce students to RLHF and how ChatGPT leverages PPO, a policy gradient-based reinforcement learning algorithm, in order to build a ChatGPT-like system. As an advanced AI course, students get hands-on experience with a variety of reinforcement learning (RL) and deep reinforcement learning (DRL) tools used to teach machines to make human-like decisions based on observation and interpretation of surrounding environments. The development of a plethora of DRL algorithms shows tremendous improvement in state-of-the-art games like Go and highly sophisticated multi-player games such as StarCraft and Dota, as well as control systems, natural language, self-driving cars, and robotics.

After a quick review of deep learning building blocks, and RL and DRL fundamentals, we will dive into available promising DRL algorithms, illustrating them with concrete examples and simulation environments. Students will learn to solve everyday tasks in RL, including well-known simulations such as CartPole, MountainCar, and MuJoCo.

You will learn Markov decision process (MDP) formulation and an extensive collection of DRL algorithms: deep q-learning (DQN, DDQN, PER), policy gradients methods (A2C, A3C, TRPO, PPO, ACER, ACKTR, SAC), deterministic policy gradients methods (DPG, DDPG, TD3), and inverse reinforcement learning. To implement these DRL algorithms, students will code in Python 3, OpenAI Gym, tf2.keras, and TensorFlow-Agents. We will also review other popular DRL libraries, such as Google Dopamine, Keras-RL, and Facebook Horizon.

Learning Outcomes
At the conclusion of the course, you should be able to

Formulate an MDP
Describe value functions, models, and policies
Define the purpose of the Bellman equation
Discuss the advantages and disadvantages of RL
Explain how the epsilon-greedy algorithm differs from a pure greedy algorithm
Explain the difference between model-based and model-free RL
Discuss how DL enhances RL
Discuss and implement the value-based and policy-based RL
Use and create RL environments with OpenAI Gym and TF-Agents
Apply learned RL algorithms to popular simulators and a lightweight ChatGPT-like system

Topics Include

Deep learning building blocks
Markov decision processes
Reinforcement and deep reinforcement learning
Value-based, model-based, model-free algorithms
Policy gradients-based algorithms
Proximal policy optimization
Various actor/critic algorithms
Deep RL libraries
Term project

Note: For this course there will be a term project related to ChatGPT

Have a question about this course?

Speak to a student services representative.
(408) 861-3860
FAQ

ENROLL EARLY!

Save Your Seat
Help us confirm course scheduling. Enroll at least seven days before your course starts.
Accessing Canvas
Learn more about gaining access to your course on Canvas in our FAQ section.
Accessibility and Accommodation
For accessibility questions or to request an accommodation, please visit Access for Students with Disabilities or email the Extension registrar.
Finance Your Education
Here are ways to pay for your education.

This course is related to the following programs:

Certificate Program in Artificial Intelligence Application Development

Artificial Intelligence

Prerequisite(s):

Deep Learning and Artificial Intelligence

Estimated Cost: TBD

Course Availability Notification

Please use this form to be notified when this course is open for enrollment.

Speak to a student services representative.

(408) 861-3860

extension@ucsc.edu

Deep Reinforcement Learning | AISV.X403

Prerequisite(s):

Estimated Cost: TBD

Course Availability Notification