Diferencia entre revisiones de «Borja-videos»

De Grupo de Inteligencia Computacional (GIC)
Línea 1: Línea 1:
=Modular Multiagent Reinforcement Learning approach to L-MCRS systems=
=Q-Learning and the Hose Tranport Application: Videos=
 
A set of robots is attached to a hose modeled as a line segment between them. Agents are first trained using Q-Learning not to strech the hose above its nominal maximum length and not to collide with each other. The robot carrying the tip of the hose is desired to reach the goal, which is represented as a green dot.
 
 
==Round-Robin Cooperative Multi-Agent Q-Learning (2011)==
 
In this case, the state is fully observable but the agents do not explicitly coordinate. The reward signal is shared by both agents. Instead of using typical Q-Learning, we use our Round-Robin Cooperative Multi-Agent Q-Learning variation of the algorithm, which forces agents to take actions one by one. After 1000 episodes for each configuration, the system reached an optimal policy fo all configurations.
 
 
 
 
==Modular Multi-Agent Reinforcement Learning approach to L-MCRS systems (2010-2011)==


In this section, agents are only aware of their neighbors' positions and no coordination mechanishm is used.
In this section, agents are only aware of their neighbors' positions and no coordination mechanishm is used.
Results from our simulations with 6 physicaly-linked robots using a Modular Reinforcement Learning system.
Results from our simulations with 6 physicaly-linked robots using a Modular Reinforcement Learning system.


==Local goals==
===Local goals===


From an initial position of the hose, the agents must reach a final configuration (green). Each of the robots has its own local goal.
From an initial position of the hose, the agents must reach a final configuration (green). Each of the robots has its own local goal.


===Succesful episodes===
====Succesful episodes====


*Episode #10,000: [[media:Episode10000.avi]]
*Episode #10,000: [[media:Episode10000.avi]]
Línea 18: Línea 30:
*Episode #10,007: [[media:Episode10007.avi]]
*Episode #10,007: [[media:Episode10007.avi]]


===Failed episodes===
====Failed episodes====


*Episode #10,003: [[media:Episode10003.avi]]
*Episode #10,003: [[media:Episode10003.avi]]


==Team goal==
===Team goal===
 
All agents share the reward signal. They all receive a positive reinforcement when the tip of the hose is carried to the goal.


The robot more distant from the source of the hose (center of the grid) is desired to reach the goal, which is represented as a green dot, and they are all attached to a hose which is represented as blue segments.


===Succesful episodes===
====Succesful episodes====


*Episode #80,001: [[media:Episode80001.avi]]
*Episode #80,001: [[media:Episode80001.avi]]
Línea 35: Línea 48:
*Episode #80,010: [[media:Episode80010.avi]]
*Episode #80,010: [[media:Episode80010.avi]]


===Failed episodes===
====Failed episodes====


*Episode #80,004: [[media:Episode80004.avi]]
*Episode #80,004: [[media:Episode80004.avi]]

Revisión del 19:05 24 may 2011

Q-Learning and the Hose Tranport Application: Videos

A set of robots is attached to a hose modeled as a line segment between them. Agents are first trained using Q-Learning not to strech the hose above its nominal maximum length and not to collide with each other. The robot carrying the tip of the hose is desired to reach the goal, which is represented as a green dot.


Round-Robin Cooperative Multi-Agent Q-Learning (2011)

In this case, the state is fully observable but the agents do not explicitly coordinate. The reward signal is shared by both agents. Instead of using typical Q-Learning, we use our Round-Robin Cooperative Multi-Agent Q-Learning variation of the algorithm, which forces agents to take actions one by one. After 1000 episodes for each configuration, the system reached an optimal policy fo all configurations.



Modular Multi-Agent Reinforcement Learning approach to L-MCRS systems (2010-2011)

In this section, agents are only aware of their neighbors' positions and no coordination mechanishm is used. Results from our simulations with 6 physicaly-linked robots using a Modular Reinforcement Learning system.

Local goals

From an initial position of the hose, the agents must reach a final configuration (green). Each of the robots has its own local goal.

Succesful episodes

Failed episodes

Team goal

All agents share the reward signal. They all receive a positive reinforcement when the tip of the hose is carried to the goal.


Succesful episodes

Failed episodes

Consensus-based approach to L-MCRS systems

These are some examples of real life experiences on the hose transportation problem. Robot detection and control software is run on a PC. Red dots represent the references (where robots "should be") and green dots the posture given by the camera (where "they are"). Commands are sent to robots using radio transceivers:

A) Non-Linked Robots

No physical links are used and robots perform relatively well. Due to communication errors, delays, servo inaccuracies and nature of PI controllers, robots oscillate around the path.

B) Linked Robots

Steering behaves worse as the physical link introduces some traction effects on the system. For the same reason, it takes longer for the robots to catch the references.

  • B.2 Max. tangential speed for last robot was limited (50%). References move full-speed. (media:2010.5.run5.avi)

The last robot is forced to move slower than the rest and, because of this, the robots aren't capable of catching the references. Error spreads among the system.

  • B.3 Max. tangential speed for last robot was limited (50%). References move at 75% speed.(media:2010.5.run6.avi)

The last robot is forced again to move at half-speed and references move at 75% speed, yet the robots aren't able to follow the path in an acceptable way.

  • B.4 Max. tangential speed for last robot was limited (50%). References move at 50% speed.(media:2010.5.run7.avi)

The last robot is running at half-speed and the references move at half-speed too, showing that if all the robots move faster or equally fast as the references, the overall system behavior is better, no matter the maximum speed differences between the robots. Near the end of the path, traction forces between robots are higher than the forces applied by the robots and they are not capable of steering correctly.

One interesting application of physically-linked multicomponent robotic systems is the fail-tolerance. In this run, last robot remains switched-off and the robots still follow the path acceptably good. The robot switched off makes following the path harder to the rest.

This time the references move slower allowing the robots to catch them faster. The last robot is switched off and that makes the rest behave worse.