Units
3.0 QUARTER UNITS

Course Description

Reinforcement Learning from Human Feedback (RLHF) is a critical component of ChatGPT to improve rewards on the generated text. This course will introduce students to RLHF and how ChatGPT leverages PPO, a policy gradient-based reinforcement learning algorithm, in order to build a ChatGPT-like system. As an advanced AI course, students get hands-on experience with a variety of reinforcement learning (RL) and deep reinforcement learning (DRL) tools used to teach machines to make human-like decisions based on observation and interpretation of surrounding environments. The development of a plethora of DRL algorithms shows tremendous improvement in state-of-the-art games like Go and highly sophisticated multi-player games such as StarCraft and Dota, as well as control systems, natural language, self-driving cars, and robotics.

After a quick review of deep learning building blocks, and RL and DRL fundamentals, we will dive into available promising DRL algorithms, illustrating them with concrete examples and simulation environments. Students will learn to solve everyday tasks in RL, including well-known simulations such as CartPole, MountainCar, and MuJoCo.

You will learn Markov decision process (MDP) formulation and an extensive collection of DRL algorithms: deep q-learning (DQN, DDQN, PER), policy gradients methods (A2C, A3C, TRPO, PPO, ACER, ACKTR, SAC), deterministic policy gradients methods (DPG, DDPG, TD3), and inverse reinforcement learning. To implement these DRL algorithms, students will code in Python 3, OpenAI Gym, tf2.keras, and TensorFlow-Agents. We will also review other popular DRL libraries, such as Google Dopamine, Keras-RL, and Facebook Horizon.

Topics

  • Deep learning building blocks
  • Markov decision processes
  • Reinforcement and deep reinforcement learning
  • Value-based, model-based, model-free algorithms
  • Policy gradients-based algorithms
  • Proximal policy optimization
  • Various actor/critic algorithms
  • Deep RL libraries
  • Term project

Note: For this course there will be a term project related to ChatGPT

Currently no classes scheduled. Would you like to be notified when a class is available?
Demo