Looking Stanford video with explanation of Deep Reinforcement Learning.
Initial thought is that they just explain me reinforcement learning + some critic techniques which correct already learning method.
There some links that suggests that I somewhat on the right track.
Here examples that such techniques are very dated. At least initial research from 199X. See here for list of publications
Examples in the Internet, from Stanford mostly related to Computer Vision. In that area all tasks very parallelizable (See Gaming as the driver for GPU, now image recognition tasks)
Neural Networks (NN) very easily parallelizable on GPU. So I understand why they are used in Deep Learning, but that's too simple to be breakthrough. NN itself looks for me like generalized polynomials.
This is what my common sense told me. But, since this is not mine strong area of knowledge, I don't trust my instincts here. I suspect me have some errors in my logic by lacking of knowledge. Maybe breakthrough is that they actually take well known pieces which was moderately working earlier and glue them together, and suddenly synergy effect + GPU magic make things works.
Other possibility, that I should look not on Stanford video, but on other peoples lectures.
Now my question is my following understand correct:
We have two agents which learn using reinforcement loop. One agent act as a actor which perform actions. Second act as a critique. All reward functions, and policy functions could be represented as NN.
Initial thought is that they just explain me reinforcement learning + some critic techniques which correct already learning method.
There some links that suggests that I somewhat on the right track.
Here examples that such techniques are very dated. At least initial research from 199X. See here for list of publications
Examples in the Internet, from Stanford mostly related to Computer Vision. In that area all tasks very parallelizable (See Gaming as the driver for GPU, now image recognition tasks)
Neural Networks (NN) very easily parallelizable on GPU. So I understand why they are used in Deep Learning, but that's too simple to be breakthrough. NN itself looks for me like generalized polynomials.
This is what my common sense told me. But, since this is not mine strong area of knowledge, I don't trust my instincts here. I suspect me have some errors in my logic by lacking of knowledge. Maybe breakthrough is that they actually take well known pieces which was moderately working earlier and glue them together, and suddenly synergy effect + GPU magic make things works.
Other possibility, that I should look not on Stanford video, but on other peoples lectures.
Now my question is my following understand correct:
We have two agents which learn using reinforcement loop. One agent act as a actor which perform actions. Second act as a critique. All reward functions, and policy functions could be represented as NN.