Skills you will gain
- Master MDP Formulation: Understand and implement Markov Decision Processes for RL problem-solving.
- Leverage DRL Algorithms: Apply deep Q-learning, policy gradients, and advanced algorithms like PPO and SAC.
- Hands-on RL Coding: Use Python, OpenAI Gym, TensorFlow-Agents, and other DRL libraries to build RL models.
- Enhance RL with Deep Learning: Explore how deep learning improves reinforcement learning techniques.
- Solve Real-World RL Problems: Apply RL algorithms to popular simulators and build ChatGPT-like systems.
Course Description
Reinforcement Learning from Human Feedback (RLHF) is a critical component of modern LLMs, such as GPT, used in ChatGPT to improve rewards on the generated text. This course will introduce students to Deep Reinforcement Learning, RLHF and how ChatGPT's GPT family of LLMs leverages PPO, a policy gradient-based reinforcement learning algorithm, in order to build a ChatGPT-like system. As an advanced AI course, students gain hands-on experience with a variety of reinforcement learning (RL) and deep reinforcement learning (DRL) tools used to teach machines to make human-like decisions based on observation and interpretation of surrounding environments. The course also examines how DRL algorithms have advanced state-of-the-art games like Go and highly sophisticated multi-player games such as StarCraft and Dota, as well as control systems, natural language, self-driving cars, and robotics.
After a review of deep learning building blocks, and RL and DRL fundamentals, students explore promising DRL algorithms through concrete examples and simulation environments. Students learn to solve everyday tasks in RL, including well-known simulations such as CartPole, MountainCar, and MuJoCo.
Students examine Markov decision process (MDP) formulation and an extensive collection of DRL algorithms: deep q-learning (DQN, DDQN), policy gradients methods (A2C, A3C, PPO), reasoning policy gradients methods (GPRO), and inverse reinforcement learning. To implement these DRL algorithms, students will code in Python 3, Gymnasium environment, and tf.keras. The course also reviews other popular DRL libraries, such as Google Dopamine, Keras-RL, and Facebook Horizon.
Learning Outcomes
At the conclusion of the course, you should be able to
- Formulate an MDP
- Describe value functions, models, and policies
- Define the purpose of the Bellman equation
- Discuss the advantages and disadvantages of RL
- Explain how the epsilon-greedy algorithm differs from a pure greedy algorithm
- Explain the difference between model-based and model-free RL
- Discuss how DL/DNN enhances RL
- Discuss and implement the value-based and policy-based RL
- Use and create RL environments with Gymnasium and other frameworks, such as hugging face’s TRL
- Apply learned RL algorithms to popular simulators and a lightweight ChatGPT-like system
Topics Include
- Deep learning building blocks
- Markov decision processes
- Reinforcement and deep reinforcement learning
- Deep RL libraries
- Value-based, model-based, model-free algorithms
- Policy gradients-based algorithms
- Proximal policy optimization (PPO)
- Various actor/critic algorithms
- Group Relative Policy Optimization (GRPO) - Reasoning in PPO
- Inverse Reinforcement Learning
- Reinforcement Learning from Human Preference (RLHF)
- Term project
Note: For this course there will be a term project related to fine-tuning LLMs.
Prerequisites / Skills Needed
Prerequisites:
- AISV.X401: Deep Learning and Artificial Intelligence
- Live-Online Attend via Zoom at scheduled times.
| Date | Start Time | End Time | Meeting Type | Location |
|---|---|---|---|---|
| Thu, 06-18-2026 | 6:00pm | 9:00pm | Live-Online | REMOTE |
| Thu, 06-25-2026 | 6:00pm | 9:00pm | Live-Online | REMOTE |
| Thu, 07-02-2026 | 6:00pm | 9:00pm | Live-Online | REMOTE |
| Thu, 07-09-2026 | 6:00pm | 9:00pm | Live-Online | REMOTE |
| Thu, 07-16-2026 | 6:00pm | 9:00pm | Live-Online | REMOTE |
| Thu, 07-23-2026 | 6:00pm | 9:00pm | Live-Online | REMOTE |
| Thu, 07-30-2026 | 6:00pm | 9:00pm | Live-Online | REMOTE |
| Thu, 08-06-2026 | 6:00pm | 9:00pm | Live-Online | REMOTE |
| Thu, 08-13-2026 | 6:00pm | 9:00pm | Live-Online | REMOTE |
| Thu, 08-20-2026 | 6:00pm | 9:00pm | Live-Online | REMOTE |
Due to the advanced nature of this course, students must complete the "Deep Learning and Artificial Intelligence" course, or have prior instructor approval to register. Please inquire with any questions.
This class is offered in an online synchronous format. Students are expected to log into this course via Canvas at the start time of scheduled meetings and participate via Zoom, for the duration of each scheduled class meeting.
To see all meeting dates, click "Full Schedule" below.
You will be granted access in Canvas to your course site and course materials approximately 24 hours prior to the published start date of the course.
Required Tools & Materials: Students are required to bring a laptop with Python 3 installed.
Recommended Text(s): Reinforcement Learning, second edition, Authors: Richard S. Sutton, Andrew G. Barto, Publisher: MIT Press, Publication Date: 2018-11-13, ISBN: 9780262352703
|| Prerequisites:
Prerequisites / Skills Needed
This course applies to these programs: