Codes and Notes
“Success is not final, failure is not fatal: it is the courage to continue that counts.”
“Success is not final, failure is not fatal: it is the courage to continue that counts.”
Basic:
Basic Algorithms in Multi-Armed Bandit / Monte Carlo in MDP
/ Policy Iteration and Value Iteration with Dynamic Programming
/ Temporal Difference (TD) - SARSA and Q-learning / Dyna-Q
/ REINFORCE-1 / REINFORCE-2 / Actor-Critic (AC) - 1 / AC - 2
Advanced:
DQN / Double DQN / Dueling DQN / h-DQN / DDPG
/ A3C / TRPO / PPO / TD3 / Soft AC
Multi-Agent RL:
Environments:
OpenAI Gym / MPE / PySC2 / GRF / MA MuJoCo
Softwares:
PyTorch (Best!) / Keras (suitable for beginners in Machine Learning) / TensorFlow2 (for hard-core TensorFlow1 users and PyTorch non-likers)
Reinforcement Learning Notes
"He who refuses to do arithmetic is doomed to talk nonsense."
--John McCarthy--
"If multi-agent learning is the answer, what is the question?"
--Yoav Shoham, Rob Powers & Trond Grenager--