Skip to the content.

Results for the inverted pendulum analysis using MCTS

Now two sets of results are available - for the single and double pendulum!

Rewards used in the investigations

\begin{equation} R=1 \label{eq: reward 1} \end{equation}

\begin{equation} R = 1 - w\frac{1}{n}\sum_{i=1}^n \left( \frac{|\theta_i|}{\theta_{\max}} \right)^{p_{\theta}} - (1-w)\left( \frac{|x|}{x_{\max}} \right)^{p_x} \label{eq: reward polynomial angle} \end{equation}

\begin{equation} R = 1 - w \left(\frac{ | y^{e} | }{y^{e}_{\max}}\right)^{p_y} - (1-w)\left(\frac{ | x^{e} | }{x_{\max}}\right)^{p_x} \label{eq: reward polynomial tip} \end{equation}

\begin{equation} R = w \frac{1}{n}\sum_{i=1}^n \exp{\left[ -\left(\frac{\theta_i}{q_{\theta}\theta_{\max}}\right)^2 \right]} + (1-w) \exp{\left[ -\left(\frac{x}{q_{x}x_{\max}}\right)^2 \right]} \label{eq: exponential reward angle} \end{equation}

\begin{equation} R = w \exp{\left[ -\left(\frac{y^{e}}{q_{y}y^{e}_{\max}}\right)^2 \right]} + (1-w) \exp{\left[ -\left(\frac{x^{e}}{q_{x}x_{\max}}\right)^2 \right]} \label{eq: exponential reward tip} \end{equation}

Results for the single pendulum

A default parameter range is γ = 0.5:0.05:1.0 and Cₚ = [0, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024]

ID Name Steps [-] Eq. \(w\) \(p_\theta\) or \(p_y\) \(p_x\) \(q_\theta\) or \(q_y\) \(q_x\)
A Constant 200 \eqref{eq: reward 1}
B Constant 500 \eqref{eq: reward 1}
C Linear to \(x\) 200 \eqref{eq: reward polynomial angle} 0 1
D Linear to \(\theta_1\) 200 \eqref{eq: reward polynomial angle} 1 1
E Linear to \(\theta_1\) and \(x\) I 200 \eqref{eq: reward polynomial angle} 0.5 1 1
F Linear to \(\theta_1\) and \(x\) II 200 \eqref{eq: reward polynomial angle} 0.75 1 1
G Linear to \(\theta_1\) and \(x\) III 200 \eqref{eq: reward polynomial angle} 0.25 1 1
H Linear to \(\theta_1\) and \(x\) II 500 \eqref{eq: reward polynomial angle} 0.75 1 1
I Quadratic to \(\theta_1\) 200 \eqref{eq: reward polynomial angle} 1 2
J Quadratic to \(x\) 200 \eqref{eq: reward polynomial angle} 0 2
K Quadratic to \(\theta_1\) and \(x\) 200 \eqref{eq: reward polynomial angle} 0.75 2 2
L Square root to \(\theta_1\) 200 \eqref{eq: reward polynomial angle} 1 0.5
M Polynomial to \(\theta_1\) and \(x\) 200 \eqref{eq: reward polynomial angle} 0.75 2 6
N Linear to \(y^e\) and \(x^e\) I 200 \eqref{eq: reward polynomial tip} 0.5 1 1
O Linear to \(y^e\) and \(x^e\) II 200 \eqref{eq: reward polynomial tip} 0.75 1 1
P Exponential to \(\theta_1\) I 200 \eqref{eq: exponential reward angle} 1 5/12
Q Exponential to \(\theta_1\) II 200 \eqref{eq: exponential reward angle} 1 0.25
R Exponential to \(\theta_1\) III 200 \eqref{eq: exponential reward angle} 1 2/3
S Exponential to \(\theta_1\) IV 200 \eqref{eq: exponential reward angle} 1 1/6
T Exponential to \(\theta_1\) V 200 \eqref{eq: exponential reward angle} 1 1/12
U Exponential to \(\theta_1\) II 500 \eqref{eq: exponential reward angle} 1 0.25
V Exponential to \(\theta_1\) and \(x\) I 200 \eqref{eq: exponential reward angle} 0.75 0.25 0.25
W Exponential to \(\theta_1\) and \(x\) II 200 \eqref{eq: exponential reward angle} 0.75 0.25 0.2
X Exponential to \(\theta_1\) and \(x\) III 200 \eqref{eq: exponential reward angle} 0.75 0.25 0.3
Y Exponential to \(\theta_1\) and \(x\) IV 200 \eqref{eq: exponential reward angle} 0.85 0.25 0.25
Z Exponential to \(\theta_1\) and \(x\) V 200 \eqref{eq: exponential reward angle} 0.75 0.25 0.1
AA Exponential to \(\theta_1\) and \(x\) VI 200 \eqref{eq: exponential reward angle} 0.75 0.25 0.4
AB Exponential to \(\theta_1\) and \(x\) IV 500 \eqref{eq: exponential reward angle} 0.85 0.25 0.25
AC Exponential to \(\theta_1\) and \(x\) VII 500 \eqref{eq: exponential reward angle} 0.95 0.25 0.25
AD Exponential to \(y^e\) I 200 \eqref{eq: exponential reward tip} 1 0.25
AE Exponential to \(y^e\) II 200 \eqref{eq: exponential reward tip} 1 0.1
AF Exponential to \(y^e\) III 200 \eqref{eq: exponential reward tip} 1 0.4
AG Exponential to \(y^e\) II 500 \eqref{eq: exponential reward tip} 1 0.1

Results for the double pendulum

A default parameter range is γ = 0.7:0.05:1.0 and Cₚ = [0, 2, 4, 8, 16, 32, 64, 128, 256]

ID Name Steps [-] Equation \(w\) \(q_\theta\) or \(q_y\) \(q_x\)
A Constant 200 \eqref{eq: reward 1}
B Constant 500 \eqref{eq: reward 1}
C Exponential to \(\theta_1\) I 200 \eqref{eq: exponential reward angle} 1 0.25
D Exponential to \(\theta_1\) II 200 \eqref{eq: exponential reward angle} 1 0.1
E Exponential to \(\theta_1\) III 200 \eqref{eq: exponential reward angle} 1 0.4
F Exponential to \(\theta_1\) IV 200 \eqref{eq: exponential reward angle} 1 0.55
G Exponential to \(\theta_1\) V 200 \eqref{eq: exponential reward angle} 1 0.7
H Exponential to \(\theta_1\) VI 200 \eqref{eq: exponential reward angle} 1 0.85
I Exponential to \(\theta_1\) V 500 \eqref{eq: exponential reward angle} 1 0.7
J Exponential to \(\theta_1\) and \(x\) I 200 \eqref{eq: exponential reward angle} 0.75 0.25 0.25
K Exponential to \(\theta_1\) and \(x\) II 200 \eqref{eq: exponential reward angle} 0.85 0.7 0.7
L Exponential to \(\theta_1\) and \(x\) III 200 \eqref{eq: exponential reward angle} 0.7 0.7 0.7
M Exponential to \(\theta_1\) and \(x\) IV 200 \eqref{eq: exponential reward angle} 0.85 0.7 0.4
N Exponential to \(\theta_1\) and \(x\) II 500 \eqref{eq: exponential reward angle} 0.85 0.7 0.7
O Exponential to \(\theta_1\) and \(x\) III 500 \eqref{eq: exponential reward angle} 0.7 0.7 0.7
P Exponential to \(\theta_1\) and \(x\) IV 500 \eqref{eq: exponential reward angle} 0.85 0.7 0.4
Q Exponential to \(y^e\) I 200 \eqref{eq: exponential reward tip} 1 0.25
R Exponential to \(y^e\) II 200 \eqref{eq: exponential reward tip} 1 0.1
S Exponential to \(y^e\) III 200 \eqref{eq: exponential reward tip} 1 0.4
T Exponential to \(y^e\) I 500 \eqref{eq: exponential reward tip} 1 0.25
U Exponential to \(y^e\) II 500 \eqref{eq: exponential reward tip} 1 0.1
V Exponential to \(y^e\) III 500 \eqref{eq: exponential reward tip} 1 0.4
W Exponential to \(y^e\) and \(x^e\) I 200 \eqref{eq: exponential reward tip} 0.95 0.25 0.25
X Exponential to \(y^e\) and \(x^e\) II 200 \eqref{eq: exponential reward tip} 0.8 0.25 0.25
Y Exponential to \(y^e\) and \(x^e\) I 500 \eqref{eq: exponential reward tip} 0.95 0.25 0.25
Z Exponential to \(y^e\) and \(x^e\) II 500 \eqref{eq: exponential reward tip} 0.8 0.25 0.25