Deep Reinforcement Learning

Skills you will gain

Master MDP Formulation: Understand and implement Markov decision processes, value functions, policies, and the Bellman equation for RL problem-solving.
Leverage Modern DRL Algorithms: Apply deep Q-learning (DQN, DDQN), policy gradient methods (A2C, A3C, PPO), GRPO for reasoning, and inverse reinforcement learning to complex tasks.
Hands-on RL Coding: Use Python 3, Gymnasium, tf.keras, and popular DRL libraries such as Google Dopamine, Keras-RL, Hugging Face TRL, and Facebook Horizon to build RL models.
Enhance RL with Deep Learning: Explore how deep neural networks improve reinforcement learning techniques and enable human-like decision-making.
Apply RLHF to Real-World Systems: Implement Reinforcement Learning from Human Feedback and PPO to understand how modern LLMs like GPT are trained — and build a lightweight ChatGPT-like system.

Course Description

Reinforcement Learning from Human Feedback (RLHF) is a critical component of modern LLMs, such as GPT, used in ChatGPT to improve rewards on the generated text. This course will introduce students to Deep Reinforcement Learning, RLHF and how ChatGPT's GPT family of LLMs leverages PPO, a policy gradient-based reinforcement learning algorithm, in order to build a ChatGPT-like system. As an advanced AI course, students gain hands-on experience with a variety of reinforcement learning (RL) and deep reinforcement learning (DRL) tools used to teach machines to make human-like decisions based on observation and interpretation of surrounding environments. The course also examines how DRL algorithms have advanced state-of-the-art games like Go and highly sophisticated multi-player games such as StarCraft and Dota, as well as control systems, natural language, self-driving cars, and robotics.

After a review of deep learning building blocks, and RL and DRL fundamentals, students explore promising DRL algorithms through concrete examples and simulation environments. Students learn to solve everyday tasks in RL, including well-known simulations such as CartPole, MountainCar, and MuJoCo.

Students examine Markov decision process (MDP) formulation and an extensive collection of DRL algorithms: deep q-learning (DQN, DDQN), policy gradients methods (A2C, A3C, PPO), reasoning policy gradients methods (GPRO), and inverse reinforcement learning. To implement these DRL algorithms, students will code in Python 3, Gymnasium environment, and tf.keras. The course also reviews other popular DRL libraries, such as Google Dopamine, Keras-RL, and Facebook Horizon.

Learning Outcomes
At the conclusion of the course, you should be able to

Formulate an MDP
Describe value functions, models, and policies
Define the purpose of the Bellman equation
Discuss the advantages and disadvantages of RL
Explain how the epsilon-greedy algorithm differs from a pure greedy algorithm
Explain the difference between model-based and model-free RL
Discuss how DL/DNN enhances RL
Discuss and implement the value-based and policy-based RL
Use and create RL environments with Gymnasium and other frameworks, such as hugging face's TRL
Apply learned RL algorithms to popular simulators and a lightweight ChatGPT-like system

Topics Include

Deep learning building blocks
Markov decision processes
Reinforcement and deep reinforcement learning
Deep RL libraries
Value-based, model-based, model-free algorithms
Policy gradients-based algorithms
Proximal policy optimization (PPO)
Various actor/critic algorithms
Group Relative Policy Optimization (GRPO) - Reasoning in PPO
Inverse Reinforcement Learning
Reinforcement Learning from Human Preference (RLHF)
Term project

Note: For this course there will be a term project related to fine-tuning LLMs.

*This course may be applied to a certificate only if you are currently declared in a program.

Prerequisites / Skills Needed

Prerequisites:

AISV.X401: Deep Learning and Artificial Intelligence

Syllabus Library

Currently no classes scheduled. Would you like to be notified when a class is available?

Deep Reinforcement Learning

Skills you will gain

Course Description

Prerequisites / Skills Needed

This course applies to these programs:

Artificial Intelligence

Artificial Intelligence Application Development

Deep Reinforcement Learning

Skills you will gain

Course Description

Prerequisites / Skills Needed

This course applies to these programs:

Artificial Intelligence

Artificial Intelligence Application Development

Ask A Question