npsraka.blogg.se - Observation learning

#Observation learning how to

A neural net is trained that takes as input one demonstration and the current state (which initially is the initial state of the other demonstration of the pair), and outputs an action with the goal that the resulting sequence of states and actions matches as closely as possible with the second demonstration. At training time, our algorithm is presented with pairs of demonstrations for a subset of all tasks. In each case, different instances of the task would consist of different sets of blocks with different initial states. For example, a task could be to stack all blocks on a table into a single tower, another task could be to place all blocks on a table into two-block towers, etc. Specifically, we consider the setting where there is a very large set of tasks, and each task has many instantiations. In this paper, we propose a meta-learning framework for achieving such capability, which we call one-shot imitation learning.

This is far from what we desire: ideally, robots should be able to learn from very few demonstrations of any given task, and instantly generalize to new situations of the same task, without requiring task-specific engineering. This usually requires either careful feature engineering, or a significant number of samples. Imitation learning has been commonly applied to solve different tasks in isolation.

#Observation learning how to

We aim to spur discussion about how to ensure continued progress in the field, by minimizing wasted effort stemming from results that are non-reproducible and easily misinterpreted. We illustrate the variability in reported metrics and results when comparing against common baselines, and suggest guidelines to make future results in deep RL more reproducible. In this paper, we investigate challenges posed by reproducibility, proper experimental techniques, and reporting procedures. Without significance metrics and tighter standardization of experimental reporting, it is difficult to determine whether improvements over the prior state-of-the-art are meaningful. In particular, non-determinism in standard benchmark environments, combined with variance intrinsic to the methods, can make reported results difficult to interpret. Unfortunately, reproducing results for state-of-the-art deep RL methods is seldom straightforward. Reproducing existing work and accurately judging the improvements offered by novel methods is vital to maintaining this rapid progress. In recent years, significant progress has been made in solving challenging problems across various domains using deep reinforcement learning (RL).