Predicting Long-Term Skeletal Motions by a Spatio-Temporal Hierarchical Recurrent Network
Project maintained by p0werHu | ZaneCode6574
Introduction
The primary goal of skeletal motion prediction is to generate future motion by observing a sequence of 3D skeletons.A key challenge in motion prediction is the fact that a motion can often be performed in several different ways, with each consisting of its own configuration of poses and their spatio-temporal dependencies, and as a result, the predicted poses often converge to the motionless poses or non-human like motions in long-term prediction.This leads us to define a hierarchical recurrent network model that explicitly characterizes these internal configurations of poses and their local and global spatio-temporal dependencies. The model introduces a latent vector variable from the Lie algebra to represent spatial and temporal relations simultaneously. Furthermore, a structured stack LSTM-based decoder is devised to decode the predicted pose with a new loss function defined to estimate the quantized weight of each body part in the pose. Empirical evaluations on benchmark dataset suggest our approach significantly outperforms the state-of-the-art methods on both short-term and long-term motion prediction.
Visualization of long-term prediction
Walking
VIDEO
Posing
VIDEO
Eating
VIDEO
Results on short-term prediction
H3.6M
Methods
greeting
80ms
160ms
320ms
400ms
560ms
640ms
720ms
1000ms
ERD
1.15
1.32
1.58
1.69
1.91
1.92
1.94
2.01
LSTM-3LR
0.92
1.12
1.39
1.51
1.76
1.76
1.81
1.91
SRNN
0.74
1.07
1.48
1.67
2.14
2.11
2.19
2.42
Res-GRU
0.57
0.92
1.28
1.44
1.75
1.76
1.82
1.95
Zero-velocity
0.54
0.89
1.30
1.49
1.76
1.74
1.77
1.80
MHU
0.54
0.87
1.27
1.45
1.75
1.71
1.74
1.87
HMR
0.55
0.91
1.27
1.41
1.66
1.65
1.69
1.72
Ours
0.54
0.86
1.23
1.37
1.59
1.55
1.60
1.66
Methods
walking
80ms
160ms
320ms
400ms
560ms
640ms
720ms
1000ms
ERD
1.06
1.12
1.22
1.26
1.31
1.34
1.41
1.51
LSTM-3LR
0.88
0.95
1.02
1.05
1.10
1.12
1.14
1.21
SRNN
0.64
0.83
1.08
1.22
1.46
1.51
1.55
1.58
Res-GRU
0.34
0.55
0.77
0.87
1.07
1.14
1.23
1.35
Zero-velocity
0.39
0.68
0.99
1.15
1.35
1.37
1.37
1.32
MHU
0.32
0.53
0.69
0.77
0.90
0.94
0.97
1.06
HMR
0.36
0.55
0.79
0.85
0.95
0.98
1.04
1.11
Ours
0.30
0.42
0.68
0.76
0.85
0.89
0.94
0.98
Methods
posing
80ms
160ms
320ms
400ms
560ms
640ms
720ms
1000ms
ERD
1.35
1.41
1.69
1.86
2.06
2.12
2.18
2.57
LSTM-3LR
1.22
1.25
1.54
1.71
1.93
2.01
2.09
2.73
SRNN
0.96
1.14
1.70
2.04
2.48
2.47
2.69
3.50
Res-GRU
0.40
0.74
1.39
1.66
1.98
2.12
2.23
2.67
Zero-velocity
0.28
0.57
1.13
1.37
1.81
2.14
2.23
2.78
MHU
0.33
0.64
1.22
1.47
1.82
2.11
2.17
2.51
HMR
0.24
0.51
1.06
1.31
1.64
1.80
1.94
2.49
Ours
0.23
0.49
1.06
1.30
1.63
1.84
1.99
2.58
Methods
purchases
80ms
160ms
320ms
400ms
560ms
640ms
720ms
1000ms
ERD
1.16
1.30
1.49
1.52
1.81
1.86
1.85
2.34
LSTM-3LR
1.03
1.13
1.35
1.42
1.81
1.88
1.84
2.30
SRNN
0.69
1.09
1.48
1.67
1.92
1.99
1.91
2.48
Res-GRU
0.54
0.79
1.10
1.20
1.61
1.69
1.71
2.16
Zero-velocity
0.62
0.88
1.19
1.27
1.64
1.68
1.62
2.45
MHU
-
-
-
-
-
-
-
-
HMR
0.51
0.78
1.05
1.15
1.60
1.67
1.61
2.11
Ours
0.51
0.78
1.04
1.11
1.49
1.54
1.49
2.11
Methods
directions
discussion
80ms
160ms
320ms
400ms
80ms
160ms
320ms
400ms
ERD
1.06
1.17
1.19
1.24
0.99
1.07
1.22
1.26
LSTM-3LR
0.97
1.07
1.18
1.25
0.93
1.06
1.23
1.28
SRNN
0.67
0.93
1.02
1.16
0.81
1.03
1.43
1.57
Res-GRU
0.48
0.69
0.90
1.03
0.34
0.62
0.94
1.03
Zero-velocity
0.39
0.59
0.79
0.89
0.31
0.67
0.94
1.04
MHU
-
-
-
-
0.31
0.66
0.93
1.00
HMR
0.41
0.60
0.82
0.91
0.30
0.58
0.84
0.92
Ours
0.41
0.58
0.78
0.86
0.29
0.55
0.82
0.89
Methods
eating
phoning
80ms
160ms
320ms
400ms
80ms
160ms
320ms
400ms
ERD
0.97
1.04
1.10
1.20
1.18
1.27
1.50
1.57
LSTM-3LR
0.84
0.90
1.00
1.11
1.13
1.22
1.45
1.53
SRNN
0.65
0.81
1.02
1.13
0.92
1.37
1.82
1.97
Res-GRU
0.29
0.46
0.69
0.86
0.52
0.81
1.22
1.37
Zero-velocity
0.27
0.48
0.73
0.86
0.64
1.21
1.65
1.83
MHU
-
-
-
-
-
-
-
-
HMR
0.22
0.37
0.60
0.75
0.51
0.80
1.19
1.29
Ours
0.21
0.34
0.56
0.73
0.51
0.79
1.15
1.26
Methods
sitting
sittingdown
80ms
160ms
320ms
400ms
80ms
160ms
320ms
400ms
ERD
1.45
1.56
1.78
1.89
1.91
2.05
2.26
2.36
LSTM-3LR
1.38
1.51
1.74
1.84
1.73
1.87
2.05
2.13
SRNN
1.05
1.37
1.92
2.18
1.08
1.48
2.03
2.23
Res-GRU
0.49
0.76
1.25
1.48
0.52
0.87
1.38
1.59
Zero-velocity
0.40
0.63
1.02
1.18
0.39
0.74
1.07
1.19
MHU
-
-
-
-
-
-
-
-
HMR
0.40
0.64
1.06
1.22
0.40
0.73
1.10
1.25
Ours
0.40
0.63
1.03
1.17
0.39
0.74
1.10
1.25
Methods
smoking
takingphoto
80ms
160ms
320ms
400ms
80ms
160ms
320ms
400ms
ERD
1.12
1.22
1.41
1.47
0.99
1.11
1.25
1.32
LSTM-3LR
1.12
1.23
1.42
1.47
0.87
0.99
1.22
1.36
SRNN
0.66
0.94
1.33
1.52
0.63
0.83
1.21
1.35
Res-GRU
0.37
0.65
1.02
1.14
0.34
0.65
0.98
1.14
Zero-velocity
0.26
0.48
0.97
0.95
0.25
0.51
0.79
0.92
MHU
-
-
-
-
0.27
0.54
0.84
0.96
HMR
0.27
0.52
0.88
0.97
0.24
0.52
0.86
1.01
Ours
0.26
0.50
0.85
0.93
0.24
0.50
0.78
0.93
Methods
waiting
walkingdog
80ms
160ms
320ms
400ms
80ms
160ms
320ms
400ms
ERD
1.07
1.19
1.46
1.58
1.20
1.31
1.53
1.64
LSTM-3LR
0.97
1.12
1.46
1.61
1.07
1.23
1.44
1.53
SRNN
0.59
0.81
1.28
1.44
0.67
1.01
1.70
1.87
Res-GRU
0.37
0.67
1.14
1.36
0.55
0.87
1.25
1.43
Zero-velocity
0.34
0.67
1.22
1.47
0.60
0.98
1.36
1.50
MHU
0.56
0.88
1.21
1.37
-
-
-
-
HMR
0.32
0.64
1.15
1.35
0.57
0.88
1.26
1.41
Ours
0.32
0.62
1.14
1.37
0.55
0.83
1.17
1.34
Methods
walkingtogether
average
80ms
160ms
320ms
400ms
80ms
160ms
320ms
400ms
ERD
1.01
1.11
1.22
1.27
1.18
1.28
1.46
1.54
LSTM-3LR
0.92
1.05
1.16
1.18
1.07
1.18
1.38
1.47
SRNN
0.64
0.86
1.18
1.28
0.76
1.04
1.45
1.62
Res-GRU
0.31
0.58
0.83
0.92
0.43
0.71
1.08
1.23
Zero-velocity
0.33
0.66
0.94
0.99
0.40
0.71
1.07
1.21
MHU
-
-
-
-
0.39
0.69
1.03
1.17
HMR
0.30
0.54
0.78
0.84
0.37
0.64
0.98
1.11
Ours
0.27
0.46
0.66
0.72
0.36
0.61
0.93
1.07
Mouse
Methods
mouse
80ms
160ms
320ms
400ms
560ms
640ms
720ms
1000ms
ERD
0.50
0.48
0.63
0.69
0.72
0.68
0.69
0.81
LSTM-3LR
0.53
0.49
0.66
0.68
0.67
0.62
0.70
0.75
Res-GRU
0.41
0.47
0.62
0.69
0.70
0.64
0.70
0.70
Zero-velocity
0.40
0.53
0.73
0.95
1.03
0.94
1.07
1.13
HMR
0.42
0.44
0.64
0.71
0.73
0.71
0.73
0.72
Ours
0.41
0.43
0.53
0.52
0.57
0.50
0.67
0.72