This article walks through how to get started quickly with OpenAI Gym environment which is a platform for training RL agents. Later, we will use Gym to test intelligent agents implemented with TensorFlow.

To fully install OpenAI Gym and be able to use it on a notebook environment like Google Colaboratory we need to install a set of dependencies:

xvfb an X11 display server that will let us render Gym environemnts on Notebook
gym (atari) the Gym environment for Arcade games
atari-py is an interface for Arcade Environment. We will use it to load Atari games' Roms into Gym
gym-notebook-wrapper A rendering helper that we will use to display OpenAI Gym games a Notebook
Note: atari-py was depreacated and is replaced with ale-py. However we can still use it.

%%capture
%%bash

apt install xvfb
pip install gym[atari]
pip install gym-notebook-wrapper
pip install atari-py

After installation we can check if Gym was installed properly and list names of all available environments sorted alphabetically:

from gym import envs
env_names = [spec.id for spec in envs.registry.all()]
for name in sorted(env_names[:10]):
    print(name)

CartPole-v0
CartPole-v1
Copy-v0
DuplicatedInput-v0
MountainCar-v0
MountainCarContinuous-v0
RepeatCopy-v0
Reverse-v0
ReversedAddition-v0
ReversedAddition3-v0

Next, we need to install Atari Arcade ROMs so that we could load those games into Gym.

We need to download the Roms.rar file that contains the games
We load the Roms to make them accessible to Gym

%%capture
%%bash

curl -O http://www.atarimania.com/roms/Roms.rar
mkdir roms
yes | unrar e Roms.rar roms/
python -m atari_py.import_roms roms/

Now, we are ready to play with Gym using one of the available games (e.g. Alien-v4). We will start the display server, then for multiple times we execute a sampled actions for our agent and check the result. If the agent dies we start a new episode.

%%bash

rm -rf game/*
mkdir -p game

import gnwrapper
import gym

# Start the display server
env = gnwrapper.Monitor(gym.make('Alien-v4'), directory="./game")

o = env.reset()

# Take 1000 actions by randomly sampling from the action space
for _ in range(1000):
    action = env.action_space.sample()
    observation, reward, done, info = env.step(action)
    if done:
        env.reset()

# display saved display images as movies
env.display()

'openaigym.video.0.1166.video000000.mp4'

'openaigym.video.0.1166.video000001.mp4'

Notice that there are more then one displayed video. This is because when the episode finishes (i.e. agent dies) we reset the environment with env.reset() to start a new episode. i.e. each video displayed corresponds to one episode in the game.

The followig explains the variables returned as part of the result of env.step(action) in the previous script:

observation (Object): Observation returned by the environment. The object could be the RGB pixel data from the screen/camera, RAM contents, join angles and join velocities of a robot, and so on, depending on the environment.
reward (Float): Reward for the previous action that was sent to the environment. The range of the Float value varies with each environment, but irrespective of the environment, a higher reward is always better and the goal of the agent should be to maximize the total reward.
done (Boolean): Indicates whether the environment is going to be reset in the next step. When the Boolean value is true, it most likely means that the episode has ended (due to loss of like of the agent, timeout, or some other episode termination criteria).
info (Dict): Some additional information that can optionally be sent out by an environment as a dictionary of arbitrary key-value pairs. The agent we develop should not rely on any of the information in this dictionary for taking action. It may be used (if available) for debugging purposes.

Here are some links I found useful:

Run and Render OpenAI Gym on Google Colab (Gym-Notebook-Wrapper) - link
T81-558: Applications of Deep Neural Networks - link

I hope you enjoyed this article, feel free to leave a comment or reach out on twitter @bachiirc