a***m 发帖数: 5037 | 1 http://hunch.net/?p=3692542
Congratulations are in order for the folks at Google Deepmind who have
mastered Go.
However, some of the discussion around this seems like giddy overstatement.
Wired says Machines have conquered the last games and Slashdot says We know
now that we don’t need any big new breakthroughs to get to true AI. The
truth is nowhere close.
For Go itself, it’s been well-known for a decade that Monte Carlo tree
search (i.e. valuation by assuming randomized playout) is unusually
effective in Go. Given this, it’s unclear that the AlphaGo algorithm
extends to other board games where MCTS does not work so well. Maybe? It
will be interesting to see.
Delving into existing computer games, the Atari results (see figure 3) are
very fun but obviously unimpressive on about ¼ of the games. My
hypothesis for why is that their solution does only local (epsilon-greedy
style) exploration rather than global exploration so they can only learn
policies addressing either very short credit assignment problems or with
greedily accessible polices. Global exploration strategies are known to
result in exponentially more efficient strategies in general for
deterministic decision process(1993), Markov Decision Processes (1998), and
for MDPs without modeling (2006).
The reason these strategies are not used is because they are based on
tabular learning rather than function fitting. That’s why I shifted to
Contextual Bandit research after the 2006 paper. We’ve learned quite a bit
there, enough to start tackling a Contextual Deterministic Decision Process,
but that solution is still far from practical. Addressing global
exploration effectively is only one of the significant challenges between
what is well known now and what needs to be addressed for what I would
consider a real AI.
This is generally understood by people working on these techniques but seems
to be getting lost in translation to public news reports. That’s dangerous
because it leads to disappointment. The field will be better off without an
overpromise/bust cycle so I would encourage people to keep and inform a
balanced view of successes and their extent. Mastering Go is a great
accomplishment, but it is quite far from everything. |
|