Results for the inverted pendulum analysis using MCTS
Now two sets of results are available - for the single and double pendulum!
Rewards used in the investigations
\begin{equation} R=1 \label{eq: reward 1} \end{equation}
\begin{equation} R = 1 - w\frac{1}{n}\sum_{i=1}^n \left( \frac{|\theta_i|}{\theta_{\max}} \right)^{p_{\theta}} - (1-w)\left( \frac{|x|}{x_{\max}} \right)^{p_x} \label{eq: reward polynomial angle} \end{equation}
\begin{equation} R = 1 - w \left(\frac{ | y^{e} | }{y^{e}_{\max}}\right)^{p_y} - (1-w)\left(\frac{ | x^{e} | }{x_{\max}}\right)^{p_x} \label{eq: reward polynomial tip} \end{equation}
\begin{equation} R = w \frac{1}{n}\sum_{i=1}^n \exp{\left[ -\left(\frac{\theta_i}{q_{\theta}\theta_{\max}}\right)^2 \right]} + (1-w) \exp{\left[ -\left(\frac{x}{q_{x}x_{\max}}\right)^2 \right]} \label{eq: exponential reward angle} \end{equation}
\begin{equation} R = w \exp{\left[ -\left(\frac{y^{e}}{q_{y}y^{e}_{\max}}\right)^2 \right]} + (1-w) \exp{\left[ -\left(\frac{x^{e}}{q_{x}x_{\max}}\right)^2 \right]} \label{eq: exponential reward tip} \end{equation}
Results for the single pendulum
A default parameter range is γ = 0.5:0.05:1.0 and Cₚ = [0, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024]
ID | Name | Steps [-] | Eq. | \(w\) | \(p_\theta\) or \(p_y\) | \(p_x\) | \(q_\theta\) or \(q_y\) | \(q_x\) |
---|---|---|---|---|---|---|---|---|
A | Constant | 200 | \eqref{eq: reward 1} | – | – | – | – | – |
B | Constant | 500 | \eqref{eq: reward 1} | – | – | – | – | – |
C | Linear to \(x\) | 200 | \eqref{eq: reward polynomial angle} | 0 | – | 1 | – | – |
D | Linear to \(\theta_1\) | 200 | \eqref{eq: reward polynomial angle} | 1 | 1 | – | – | – |
E | Linear to \(\theta_1\) and \(x\) I | 200 | \eqref{eq: reward polynomial angle} | 0.5 | 1 | 1 | – | – |
F | Linear to \(\theta_1\) and \(x\) II | 200 | \eqref{eq: reward polynomial angle} | 0.75 | 1 | 1 | – | – |
G | Linear to \(\theta_1\) and \(x\) III | 200 | \eqref{eq: reward polynomial angle} | 0.25 | 1 | 1 | – | – |
H | Linear to \(\theta_1\) and \(x\) II | 500 | \eqref{eq: reward polynomial angle} | 0.75 | 1 | 1 | – | – |
I | Quadratic to \(\theta_1\) | 200 | \eqref{eq: reward polynomial angle} | 1 | 2 | – | – | – |
J | Quadratic to \(x\) | 200 | \eqref{eq: reward polynomial angle} | 0 | – | 2 | – | – |
K | Quadratic to \(\theta_1\) and \(x\) | 200 | \eqref{eq: reward polynomial angle} | 0.75 | 2 | 2 | – | – |
L | Square root to \(\theta_1\) | 200 | \eqref{eq: reward polynomial angle} | 1 | 0.5 | – | – | – |
M | Polynomial to \(\theta_1\) and \(x\) | 200 | \eqref{eq: reward polynomial angle} | 0.75 | 2 | 6 | – | – |
N | Linear to \(y^e\) and \(x^e\) I | 200 | \eqref{eq: reward polynomial tip} | 0.5 | 1 | 1 | – | – |
O | Linear to \(y^e\) and \(x^e\) II | 200 | \eqref{eq: reward polynomial tip} | 0.75 | 1 | 1 | – | – |
P | Exponential to \(\theta_1\) I | 200 | \eqref{eq: exponential reward angle} | 1 | – | – | 5/12 | – |
Q | Exponential to \(\theta_1\) II | 200 | \eqref{eq: exponential reward angle} | 1 | – | – | 0.25 | – |
R | Exponential to \(\theta_1\) III | 200 | \eqref{eq: exponential reward angle} | 1 | – | – | 2/3 | – |
S | Exponential to \(\theta_1\) IV | 200 | \eqref{eq: exponential reward angle} | 1 | – | – | 1/6 | – |
T | Exponential to \(\theta_1\) V | 200 | \eqref{eq: exponential reward angle} | 1 | – | – | 1/12 | – |
U | Exponential to \(\theta_1\) II | 500 | \eqref{eq: exponential reward angle} | 1 | – | – | 0.25 | – |
V | Exponential to \(\theta_1\) and \(x\) I | 200 | \eqref{eq: exponential reward angle} | 0.75 | – | – | 0.25 | 0.25 |
W | Exponential to \(\theta_1\) and \(x\) II | 200 | \eqref{eq: exponential reward angle} | 0.75 | – | – | 0.25 | 0.2 |
X | Exponential to \(\theta_1\) and \(x\) III | 200 | \eqref{eq: exponential reward angle} | 0.75 | – | – | 0.25 | 0.3 |
Y | Exponential to \(\theta_1\) and \(x\) IV | 200 | \eqref{eq: exponential reward angle} | 0.85 | – | – | 0.25 | 0.25 |
Z | Exponential to \(\theta_1\) and \(x\) V | 200 | \eqref{eq: exponential reward angle} | 0.75 | – | – | 0.25 | 0.1 |
AA | Exponential to \(\theta_1\) and \(x\) VI | 200 | \eqref{eq: exponential reward angle} | 0.75 | – | – | 0.25 | 0.4 |
AB | Exponential to \(\theta_1\) and \(x\) IV | 500 | \eqref{eq: exponential reward angle} | 0.85 | – | – | 0.25 | 0.25 |
AC | Exponential to \(\theta_1\) and \(x\) VII | 500 | \eqref{eq: exponential reward angle} | 0.95 | – | – | 0.25 | 0.25 |
AD | Exponential to \(y^e\) I | 200 | \eqref{eq: exponential reward tip} | 1 | – | – | 0.25 | – |
AE | Exponential to \(y^e\) II | 200 | \eqref{eq: exponential reward tip} | 1 | – | – | 0.1 | – |
AF | Exponential to \(y^e\) III | 200 | \eqref{eq: exponential reward tip} | 1 | – | – | 0.4 | – |
AG | Exponential to \(y^e\) II | 500 | \eqref{eq: exponential reward tip} | 1 | – | – | 0.1 | – |
Results for the double pendulum
A default parameter range is γ = 0.7:0.05:1.0 and Cₚ = [0, 2, 4, 8, 16, 32, 64, 128, 256]
ID | Name | Steps [-] | Equation | \(w\) | \(q_\theta\) or \(q_y\) | \(q_x\) |
---|---|---|---|---|---|---|
A | Constant | 200 | \eqref{eq: reward 1} | – | – | – |
B | Constant | 500 | \eqref{eq: reward 1} | – | – | – |
C | Exponential to \(\theta_1\) I | 200 | \eqref{eq: exponential reward angle} | 1 | 0.25 | – |
D | Exponential to \(\theta_1\) II | 200 | \eqref{eq: exponential reward angle} | 1 | 0.1 | – |
E | Exponential to \(\theta_1\) III | 200 | \eqref{eq: exponential reward angle} | 1 | 0.4 | – |
F | Exponential to \(\theta_1\) IV | 200 | \eqref{eq: exponential reward angle} | 1 | 0.55 | – |
G | Exponential to \(\theta_1\) V | 200 | \eqref{eq: exponential reward angle} | 1 | 0.7 | – |
H | Exponential to \(\theta_1\) VI | 200 | \eqref{eq: exponential reward angle} | 1 | 0.85 | – |
I | Exponential to \(\theta_1\) V | 500 | \eqref{eq: exponential reward angle} | 1 | 0.7 | – |
J | Exponential to \(\theta_1\) and \(x\) I | 200 | \eqref{eq: exponential reward angle} | 0.75 | 0.25 | 0.25 |
K | Exponential to \(\theta_1\) and \(x\) II | 200 | \eqref{eq: exponential reward angle} | 0.85 | 0.7 | 0.7 |
L | Exponential to \(\theta_1\) and \(x\) III | 200 | \eqref{eq: exponential reward angle} | 0.7 | 0.7 | 0.7 |
M | Exponential to \(\theta_1\) and \(x\) IV | 200 | \eqref{eq: exponential reward angle} | 0.85 | 0.7 | 0.4 |
N | Exponential to \(\theta_1\) and \(x\) II | 500 | \eqref{eq: exponential reward angle} | 0.85 | 0.7 | 0.7 |
O | Exponential to \(\theta_1\) and \(x\) III | 500 | \eqref{eq: exponential reward angle} | 0.7 | 0.7 | 0.7 |
P | Exponential to \(\theta_1\) and \(x\) IV | 500 | \eqref{eq: exponential reward angle} | 0.85 | 0.7 | 0.4 |
Q | Exponential to \(y^e\) I | 200 | \eqref{eq: exponential reward tip} | 1 | 0.25 | – |
R | Exponential to \(y^e\) II | 200 | \eqref{eq: exponential reward tip} | 1 | 0.1 | – |
S | Exponential to \(y^e\) III | 200 | \eqref{eq: exponential reward tip} | 1 | 0.4 | – |
T | Exponential to \(y^e\) I | 500 | \eqref{eq: exponential reward tip} | 1 | 0.25 | – |
U | Exponential to \(y^e\) II | 500 | \eqref{eq: exponential reward tip} | 1 | 0.1 | – |
V | Exponential to \(y^e\) III | 500 | \eqref{eq: exponential reward tip} | 1 | 0.4 | – |
W | Exponential to \(y^e\) and \(x^e\) I | 200 | \eqref{eq: exponential reward tip} | 0.95 | 0.25 | 0.25 |
X | Exponential to \(y^e\) and \(x^e\) II | 200 | \eqref{eq: exponential reward tip} | 0.8 | 0.25 | 0.25 |
Y | Exponential to \(y^e\) and \(x^e\) I | 500 | \eqref{eq: exponential reward tip} | 0.95 | 0.25 | 0.25 |
Z | Exponential to \(y^e\) and \(x^e\) II | 500 | \eqref{eq: exponential reward tip} | 0.8 | 0.25 | 0.25 |