Guide(s): Dr. Henny Admoni, Dr. Aaron Steinfeld
Team Size: 1
Time Period: January'19 onwards
Robots operating in human environments must be careful when executing their manipulation skills. This requires robots toreason about the repercussions of their actions on other objects in the environment. Humans can visually inspect their surroundings and gain a physical intuition about how likely it is that a particular object can be safely manipulated (i.e., cause no disruption in the rest of the scene). Existing work has shown the ability of deep convolutional neural networks to learn intuitive physics over images generated in simulation and determine the stability of the scene. In this paper, we extend these physics intuition models to the task of assessing safe object extraction by conditioning the visual images on specific objects in the scene during training. We further explore methods for aggregating multiple views of a scene to increase the model’s accuracy for scenes that contain either a large number of objects or unstructured object arrangements. Our results in a simulated object extraction task show that with our proposed method, physics intuition models can beused to accurately inform a robot of which objects can be safely extracted and from which direction to extract them.