The Google team has developed an AI that is capable of predicting the machine learning models that will be leading in producing the result. The newly published blog post and paper named as an Off-Policy Evaluation via off-policy classification said that a team of Google AI researchers propose the need for the off-policy classification or OPC. It evaluates the AI-performance driven by agents by treating evaluation as a classification problem. The added variant reinforcement learning that employs rewards to drive the software policies towards the goals consistently works with the image inputs and scales to tasks, including the vision-based robotic grasping.
Alex Irpan Google Software engineer in robotics added that complete off-policy reinforcements are learning that an agent learns entirely from the older data. With fully off-policy RL, one can train several models on the same fixed dataset collected by previous agents and then select the best one. The team, however, concluded that the off-policy reinforcements are learning to enable AI model training that says as a robot and no evaluation. Off Policy Classification(OPC) addresses it by assuming the tasks in hand that has little or no randomness involved in how states change and assuming the agents either succeeded or fail at the end of experiment trials.
The binary nature of the second of the two assumptions has allowed the assignment with classification labels. The team trained in machine learning policies tend to be simulated using the complete off-policies reinforcement learning and then evaluated by using the off-policy scores tabulated from previous real-world data. The gasping task that they report with one variant of OPC in particular that can be used with SoftOPC performed best at predicting final success rates. Given the 15 models of varying robustness, seven are purely being trained just for the simulation purpose. The research teams intend to explore tasks that are noisier and non-binary with real-world RL problems.