Deep Reinforcement Learning

Skills you will gain

Master MDP Formulation: Understand and implement Markov Decision Processes for RL problem-solving.
Leverage DRL Algorithms: Apply deep Q-learning, policy gradients, and advanced algorithms like PPO and SAC.
Hands-on RL Coding: Use Python, OpenAI Gym, TensorFlow-Agents, and other DRL libraries to build RL models.
Enhance RL with Deep Learning: Explore how deep learning improves reinforcement learning techniques.
Solve Real-World RL Problems: Apply RL algorithms to popular simulators and build ChatGPT-like systems.

Course Description

Reinforcement Learning from Human Feedback (RLHF) is a critical component of ChatGPT to improve rewards on the generated text. This course will introduce students to RLHF and how ChatGPT leverages PPO, a policy gradient-based reinforcement learning algorithm, in order to build a ChatGPT-like system. As an advanced AI course, students get hands-on experience with a variety of reinforcement learning (RL) and deep reinforcement learning (DRL) tools used to teach machines to make human-like decisions based on observation and interpretation of surrounding environments. The development of a plethora of DRL algorithms shows tremendous improvement in state-of-the-art games like Go and highly sophisticated multi-player games such as StarCraft and Dota, as well as control systems, natural language, self-driving cars, and robotics.

After a quick review of deep learning building blocks, and RL and DRL fundamentals, we will dive into available promising DRL algorithms, illustrating them with concrete examples and simulation environments. Students will learn to solve everyday tasks in RL, including well-known simulations such as CartPole, MountainCar, and MuJoCo.

You will learn Markov decision process (MDP) formulation and an extensive collection of DRL algorithms: deep q-learning (DQN, DDQN, PER), policy gradients methods (A2C, A3C, TRPO, PPO, ACER, ACKTR, SAC), deterministic policy gradients methods (DPG, DDPG, TD3), and inverse reinforcement learning. To implement these DRL algorithms, students will code in Python 3, OpenAI Gym, tf2.keras, and TensorFlow-Agents. We will also review other popular DRL libraries, such as Google Dopamine, Keras-RL, and Facebook Horizon.

Topics

Deep learning building blocks
Markov decision processes
Reinforcement and deep reinforcement learning
Value-based, model-based, model-free algorithms
Policy gradients-based algorithms
Proximal policy optimization
Various actor/critic algorithms
Deep RL libraries
Term project

Note: For this course there will be a term project related to ChatGPT

Prerequisites / Skills Needed

Prerequisites:

AISV.X401: Deep Learning and Artificial Intelligence

Syllabus Library

Ajay Baranwal

Mar. 30 - Jun. 8, Monday, 6:30pm - 9:30pm

Live-Online

Details

AISV.X403.(3)

Schedule

Date	Start Time	End Time	Meeting Type	Location
Mon, 03-30-2026	6:30pm	9:30pm	Live-Online	REMOTE
Mon, 04-06-2026	6:30pm	9:30pm	Live-Online	REMOTE
Mon, 04-13-2026	6:30pm	9:30pm	Live-Online	REMOTE
Mon, 04-20-2026	6:30pm	9:30pm	Live-Online	REMOTE
Mon, 04-27-2026	6:30pm	9:30pm	Live-Online	REMOTE
Mon, 05-04-2026	6:30pm	9:30pm	Live-Online	REMOTE
Mon, 05-11-2026	6:30pm	9:30pm	Live-Online	REMOTE
Mon, 05-18-2026	6:30pm	9:30pm	Live-Online	REMOTE
Mon, 06-01-2026	6:30pm	9:30pm	Live-Online	REMOTE
Mon, 06-08-2026	6:30pm	9:30pm	Live-Online	REMOTE

Due to the advanced nature of this course, students must complete the "Deep Learning and Artificial Intelligence" course, or have prior instructor approval to register. Please inquire with any questions.

This class is offered in an online synchronous format. Students are expected to log into this course via Canvas at the start time of scheduled meetings and participate via Zoom, for the duration of each scheduled class meeting.

No meeting on May 25, 2026. To see all meeting dates, click "Full Schedule" below.

Students are required to have computers with Python 3 installed.

You will be granted access in Canvas to your course site and course materials approximately 24 hours prior to the published start date of the course.

Recommended Text(s): Reinforcement Learning, second edition, Authors: Richard S. Sutton, Andrew G. Barto, Publisher: MIT Press, Publication Date: 2018-11-13, ISBN: 9780262352703

Prerequisites / Skills Needed

Prerequisites:

AISV.X401: Deep Learning and Artificial Intelligence

Deep Reinforcement Learning

Skills you will gain

Course Description

Topics

Prerequisites / Skills Needed

Prerequisites / Skills Needed

This course applies to these programs:

Artificial Intelligence

Artificial Intelligence Application Development

Deep Reinforcement Learning

Skills you will gain

Course Description

Topics

Prerequisites / Skills Needed

Prerequisites / Skills Needed

This course applies to these programs:

Artificial Intelligence

Artificial Intelligence Application Development

Ask A Question