Results for the inverted pendulum analysis using MCTS

Now two sets of results are available - for the single and double pendulum!

Single pendulum
Double pendulum

Rewards used in the investigations

\begin{equation} R=1 \label{eq: reward 1} \end{equation}

\begin{equation} R = 1 - w\frac{1}{n}\sum_{i=1}^n \left( \frac{|\theta_i|}{\theta_{\max}} \right)^{p_{\theta}} - (1-w)\left( \frac{|x|}{x_{\max}} \right)^{p_x} \label{eq: reward polynomial angle} \end{equation}

\begin{equation} R = 1 - w \left(\frac{ | y^{e} | }{y^{e}_{\max}}\right)^{p_y} - (1-w)\left(\frac{ | x^{e} | }{x_{\max}}\right)^{p_x} \label{eq: reward polynomial tip} \end{equation}

\begin{equation} R = w \frac{1}{n}\sum_{i=1}^n \exp{\left[ -\left(\frac{\theta_i}{q_{\theta}\theta_{\max}}\right)^2 \right]} + (1-w) \exp{\left[ -\left(\frac{x}{q_{x}x_{\max}}\right)^2 \right]} \label{eq: exponential reward angle} \end{equation}

\begin{equation} R = w \exp{\left[ -\left(\frac{y^{e}}{q_{y}y^{e}_{\max}}\right)^2 \right]} + (1-w) \exp{\left[ -\left(\frac{x^{e}}{q_{x}x_{\max}}\right)^2 \right]} \label{eq: exponential reward tip} \end{equation}

Results for the single pendulum

A default parameter range is γ = 0.5:0.05:1.0 and Cₚ = [0, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024]

ID	Name	Steps [-]	Eq.	\(w\)	\(p_\theta\) or \(p_y\)	\(p_x\)	\(q_\theta\) or \(q_y\)	\(q_x\)
A	Constant	200	\eqref{eq: reward 1}	–	–	–	–	–
B	Constant	500	\eqref{eq: reward 1}	–	–	–	–	–
C	Linear to \(x\)	200	\eqref{eq: reward polynomial angle}	0	–	1	–	–
D	Linear to \(\theta_1\)	200	\eqref{eq: reward polynomial angle}	1	1	–	–	–
E	Linear to \(\theta_1\) and \(x\) I	200	\eqref{eq: reward polynomial angle}	0.5	1	1	–	–
F	Linear to \(\theta_1\) and \(x\) II	200	\eqref{eq: reward polynomial angle}	0.75	1	1	–	–
G	Linear to \(\theta_1\) and \(x\) III	200	\eqref{eq: reward polynomial angle}	0.25	1	1	–	–
H	Linear to \(\theta_1\) and \(x\) II	500	\eqref{eq: reward polynomial angle}	0.75	1	1	–	–
I	Quadratic to \(\theta_1\)	200	\eqref{eq: reward polynomial angle}	1	2	–	–	–
J	Quadratic to \(x\)	200	\eqref{eq: reward polynomial angle}	0	–	2	–	–
K	Quadratic to \(\theta_1\) and \(x\)	200	\eqref{eq: reward polynomial angle}	0.75	2	2	–	–
L	Square root to \(\theta_1\)	200	\eqref{eq: reward polynomial angle}	1	0.5	–	–	–
M	Polynomial to \(\theta_1\) and \(x\)	200	\eqref{eq: reward polynomial angle}	0.75	2	6	–	–
N	Linear to \(y^e\) and \(x^e\) I	200	\eqref{eq: reward polynomial tip}	0.5	1	1	–	–
O	Linear to \(y^e\) and \(x^e\) II	200	\eqref{eq: reward polynomial tip}	0.75	1	1	–	–
P	Exponential to \(\theta_1\) I	200	\eqref{eq: exponential reward angle}	1	–	–	5/12	–
Q	Exponential to \(\theta_1\) II	200	\eqref{eq: exponential reward angle}	1	–	–	0.25	–
R	Exponential to \(\theta_1\) III	200	\eqref{eq: exponential reward angle}	1	–	–	2/3	–
S	Exponential to \(\theta_1\) IV	200	\eqref{eq: exponential reward angle}	1	–	–	1/6	–
T	Exponential to \(\theta_1\) V	200	\eqref{eq: exponential reward angle}	1	–	–	1/12	–
U	Exponential to \(\theta_1\) II	500	\eqref{eq: exponential reward angle}	1	–	–	0.25	–
V	Exponential to \(\theta_1\) and \(x\) I	200	\eqref{eq: exponential reward angle}	0.75	–	–	0.25	0.25
W	Exponential to \(\theta_1\) and \(x\) II	200	\eqref{eq: exponential reward angle}	0.75	–	–	0.25	0.2
X	Exponential to \(\theta_1\) and \(x\) III	200	\eqref{eq: exponential reward angle}	0.75	–	–	0.25	0.3
Y	Exponential to \(\theta_1\) and \(x\) IV	200	\eqref{eq: exponential reward angle}	0.85	–	–	0.25	0.25
Z	Exponential to \(\theta_1\) and \(x\) V	200	\eqref{eq: exponential reward angle}	0.75	–	–	0.25	0.1
AA	Exponential to \(\theta_1\) and \(x\) VI	200	\eqref{eq: exponential reward angle}	0.75	–	–	0.25	0.4
AB	Exponential to \(\theta_1\) and \(x\) IV	500	\eqref{eq: exponential reward angle}	0.85	–	–	0.25	0.25
AC	Exponential to \(\theta_1\) and \(x\) VII	500	\eqref{eq: exponential reward angle}	0.95	–	–	0.25	0.25
AD	Exponential to \(y^e\) I	200	\eqref{eq: exponential reward tip}	1	–	–	0.25	–
AE	Exponential to \(y^e\) II	200	\eqref{eq: exponential reward tip}	1	–	–	0.1	–
AF	Exponential to \(y^e\) III	200	\eqref{eq: exponential reward tip}	1	–	–	0.4	–
AG	Exponential to \(y^e\) II	500	\eqref{eq: exponential reward tip}	1	–	–	0.1	–

Results for the double pendulum

A default parameter range is γ = 0.7:0.05:1.0 and Cₚ = [0, 2, 4, 8, 16, 32, 64, 128, 256]

ID	Name	Steps [-]	Equation	\(w\)	\(q_\theta\) or \(q_y\)	\(q_x\)
A	Constant	200	\eqref{eq: reward 1}	–	–	–
B	Constant	500	\eqref{eq: reward 1}	–	–	–
C	Exponential to \(\theta_1\) I	200	\eqref{eq: exponential reward angle}	1	0.25	–
D	Exponential to \(\theta_1\) II	200	\eqref{eq: exponential reward angle}	1	0.1	–
E	Exponential to \(\theta_1\) III	200	\eqref{eq: exponential reward angle}	1	0.4	–
F	Exponential to \(\theta_1\) IV	200	\eqref{eq: exponential reward angle}	1	0.55	–
G	Exponential to \(\theta_1\) V	200	\eqref{eq: exponential reward angle}	1	0.7	–
H	Exponential to \(\theta_1\) VI	200	\eqref{eq: exponential reward angle}	1	0.85	–
I	Exponential to \(\theta_1\) V	500	\eqref{eq: exponential reward angle}	1	0.7	–
J	Exponential to \(\theta_1\) and \(x\) I	200	\eqref{eq: exponential reward angle}	0.75	0.25	0.25
K	Exponential to \(\theta_1\) and \(x\) II	200	\eqref{eq: exponential reward angle}	0.85	0.7	0.7
L	Exponential to \(\theta_1\) and \(x\) III	200	\eqref{eq: exponential reward angle}	0.7	0.7	0.7
M	Exponential to \(\theta_1\) and \(x\) IV	200	\eqref{eq: exponential reward angle}	0.85	0.7	0.4
N	Exponential to \(\theta_1\) and \(x\) II	500	\eqref{eq: exponential reward angle}	0.85	0.7	0.7
O	Exponential to \(\theta_1\) and \(x\) III	500	\eqref{eq: exponential reward angle}	0.7	0.7	0.7
P	Exponential to \(\theta_1\) and \(x\) IV	500	\eqref{eq: exponential reward angle}	0.85	0.7	0.4
Q	Exponential to \(y^e\) I	200	\eqref{eq: exponential reward tip}	1	0.25	–
R	Exponential to \(y^e\) II	200	\eqref{eq: exponential reward tip}	1	0.1	–
S	Exponential to \(y^e\) III	200	\eqref{eq: exponential reward tip}	1	0.4	–
T	Exponential to \(y^e\) I	500	\eqref{eq: exponential reward tip}	1	0.25	–
U	Exponential to \(y^e\) II	500	\eqref{eq: exponential reward tip}	1	0.1	–
V	Exponential to \(y^e\) III	500	\eqref{eq: exponential reward tip}	1	0.4	–
W	Exponential to \(y^e\) and \(x^e\) I	200	\eqref{eq: exponential reward tip}	0.95	0.25	0.25
X	Exponential to \(y^e\) and \(x^e\) II	200	\eqref{eq: exponential reward tip}	0.8	0.25	0.25
Y	Exponential to \(y^e\) and \(x^e\) I	500	\eqref{eq: exponential reward tip}	0.95	0.25	0.25
Z	Exponential to \(y^e\) and \(x^e\) II	500	\eqref{eq: exponential reward tip}	0.8	0.25	0.25

Summary of MCTS results

Complementary results to the paper on control of the n-pole upright pendulum on the cart using Monte Carlo Tree Search algorithm.

Results for the inverted pendulum analysis using MCTS

Rewards used in the investigations

Results for the single pendulum

Results for the double pendulum