Synthesize shields for safe control

Package designed to synthesize permissive shields for controllers of various RL environments. Based on Shielded Reinforcement Learning for Hybrid Systems which has a Julia-implementation here.

Usage

Instantiate an environment (some pre-build environments are supplied in pyshield/envs.py):

Then instantiate a shield with the environment, the granularity of the discretization and the number of supporting points to sample per axis.

Synthesizing the shield is done via a call to shield.make_shield(verbosity=1). You can set verbosity to 0 if you don't want any status output.

from pyshield.models import Shield
from pyshield.envs import RandomWalk

# The parameter `unlucky` enforces a worst-case stochasticity on the environment
env = RandomWalk(obs_low=[0,0], obs_high=[1.2,1.2], unlucky)

shield = Shield(env, 0.005, samples_per_axis=4)
shield.make_shield(verbosity=1)

After this is done, shield.safe_actions store information on which actions are safe in which partitions. If the environment is 2-dimensional, you can draw the shield, if you provide the names of the actions, the axis labels and a colormap.

# the names of action 0 and action 1
action_names = ['slow', 'fast']

# the colors of the partitions depending on allowed actions
cmap = { '()': 'r', '(slow)': 'y', '(fast)': 'g', '(slow, fast)': 'w' }

# labels of x and y axis
labels = ('x', 't')

shield.draw(cmap, axis_labels=labels, actions=acton_names)

For the example here, the output should look like this:

Custom environments

Currently, two environments are supplied in pyshield/envs.py. You can use your own environments if they implement the following API (largely inherited from Gymnasium:

They have an attribute observation_space, which should be a Box space with non-infinite bounds
They have an attribute action_space, which should be a Discrete space
They have a function called is_safe(s), which takes a state s and returns True if s is a safe state and False otherwise
They have a function called allowed_actions(s), which takes a state s and returns a list of allowed actions in this state (can just be all states for any value of s)
They have a function step_from(s, a), which takes a state s and an action a and returns a tuple (next_s, reward, terminated) that comes from performing a single step from s by taking action a

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
pyshield		pyshield
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
rw_example.png		rw_example.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Synthesize shields for safe control

Usage

Custom environments

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

andreashhpetersen/pyshield

Folders and files

Latest commit

History

Repository files navigation

Synthesize shields for safe control

Usage

Custom environments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages