Кажись ГТО для 6макса допилили.
The core of Pluribus's strategy was computed via self-play, in which the AI plays against copies of itself, without any human gameplay data used as input. The AI starts from scratch by playing randomly and gradually improves as it determines which actions, and which probability distribution over those actions, lead to better outcomes against earlier versions of its strategy. The version of self-play used in Pluribus is an improved variant of the iterative Monte Carlo CFR (MCCFR) algorithm.
5 AIs + 1 humanThis experiment was conducted with Ferguson, Elias, and Linus Loeliger. Loeliger is considered by many to be the best player in the world at six-player no-limit Hold’em cash games. Each human played 5,000 hands of poker with five copies of Pluribus at the table. Pluribus does not adapt its strategy to its opponents, so intentional collusion among the bots was not an issue. In aggregate, the humans lost by 2.3 bb/100. Elias was down 4.0 bb/100 (standard error of 2.2 bb/100), Ferguson was down 2.5 bb/100 (standard error of 2.0 bb/100), and Loeliger was down 0.5 bb/100 (standard error of 1.0 bb/100).
Лузрейты людей конечно немного смущают.
Всё таки 5 ботам как будто больше можно сажать.
Ровно как и 5 людей + 1 бот, у бота всего 5бб/100.
Когда в зуме 2012-2014 залоченные ботовские страты рвали в 10-12евбб/100 зум200-500.
Но вот этот момент конечно рассраивает:
"Pluribus also uses new, faster self-play algorithms for games with hidden information. Combined, these advances made it possible to train Pluribus using very little processing power and memory — the equivalent of less than $150 worth of cloud computing resources. "
То, что им удалось посчитать 6макс бота который в 0.5бб/100 будет Лайнуса обыгрывать в облаке за 150$ - это грустно.
Сильно ли сократят дисперсию или без пула всё равно самоубийство?