c*******v 发帖数: 2599 | 1 上次openAI的人说dota是genetic algorithm + RL。
“At the beginning it is worth noting that OpenAI’s artificial intelligence
learns to play with itself. All the strategies noted by the researchers are
the result of many hours of sessions, during which two independent
instances are fighting each other. One of them is still learning and the
other one is blocked. When the learning bot achieves an advantage, it is
cloned and the researchers continue the process. The genetic algorithms work
underneath all the time, which on the basis of the results achieved
determine which behaviours bring the intended effect, and which are
meaningless and translate into a failure. In the following video, OpenAI
employees present strategies that their artificial intelligence used when
playing with real Dota 2 players”
Deepmind的方案现在有报道了吗? | C*****l 发帖数: 1 | 2 https://deepmind.com/blog/alphastar-mastering-real-time-strategy-game-
starcraft-ii/
deepmind自己的主页就有,但是没有太多实现的细节。计算消耗巨大,他们用了一个不
同agent的联赛,每个agent后面16个TPU. 这个版本可能还有缺陷。
intelligence
are
work
【在 c*******v 的大作中提到】 : 上次openAI的人说dota是genetic algorithm + RL。 : “At the beginning it is worth noting that OpenAI’s artificial intelligence : learns to play with itself. All the strategies noted by the researchers are : the result of many hours of sessions, during which two independent : instances are fighting each other. One of them is still learning and the : other one is blocked. When the learning bot achieves an advantage, it is : cloned and the researchers continue the process. The genetic algorithms work : underneath all the time, which on the basis of the results achieved : determine which behaviours bring the intended effect, and which are : meaningless and translate into a failure. In the following video, OpenAI
| C*****l 发帖数: 1 | 3 dm在arxiv上面放出了一个短文,提到了一个外循环是拉马克算法,算是遗传算法的训
练过程。agent自己是BP,agent会学习胜者的网络权重和超参数, 其他细节不多,没
有披露神经网络的细节 |
|