Predicting Long-Term Skeletal Motions by a Spatio-Temporal Hierarchical Recurrent Network


Project maintained by p0werHu | ZaneCode6574

Introduction

The primary goal of skeletal motion prediction is to generate future motion by observing a sequence of 3D skeletons.A key challenge in motion prediction is the fact that a motion can often be performed in several different ways, with each consisting of its own configuration of poses and their spatio-temporal dependencies, and as a result, the predicted poses often converge to the motionless poses or non-human like motions in long-term prediction.This leads us to define a hierarchical recurrent network model that explicitly characterizes these internal configurations of poses and their local and global spatio-temporal dependencies. The model introduces a latent vector variable from the Lie algebra to represent spatial and temporal relations simultaneously. Furthermore, a structured stack LSTM-based decoder is devised to decode the predicted pose with a new loss function defined to estimate the quantized weight of each body part in the pose. Empirical evaluations on benchmark dataset suggest our approach significantly outperforms the state-of-the-art methods on both short-term and long-term motion prediction.

Visualization of long-term prediction

  

Walking

 

Posing

 

Eating

 

Results on short-term prediction

H3.6M

Methods
greeting
80ms 160ms 320ms 400ms 560ms 640ms 720ms 1000ms
ERD 1.15 1.32 1.58 1.69 1.91 1.92 1.94 2.01
LSTM-3LR 0.92 1.12 1.39 1.51 1.76 1.76 1.81 1.91
SRNN 0.74 1.07 1.48 1.67 2.14 2.11 2.19 2.42
Res-GRU 0.57 0.92 1.28 1.44 1.75 1.76 1.82 1.95
Zero-velocity 0.54 0.89 1.30 1.49 1.76 1.74 1.77 1.80
MHU 0.54 0.87 1.27 1.45 1.75 1.71 1.74 1.87
HMR 0.55 0.91 1.27 1.41 1.66 1.65 1.69 1.72
Ours 0.54 0.86 1.23 1.37 1.59 1.55 1.60 1.66
Methods
walking
80ms 160ms 320ms 400ms 560ms 640ms 720ms 1000ms
ERD 1.06 1.12 1.22 1.26 1.31 1.34 1.41 1.51
LSTM-3LR 0.88 0.95 1.02 1.05 1.10 1.12 1.14 1.21
SRNN 0.64 0.83 1.08 1.22 1.46 1.51 1.55 1.58
Res-GRU 0.34 0.55 0.77 0.87 1.07 1.14 1.23 1.35
Zero-velocity 0.39 0.68 0.99 1.15 1.35 1.37 1.37 1.32
MHU 0.32 0.53 0.69 0.77 0.90 0.94 0.97 1.06
HMR 0.36 0.55 0.79 0.85 0.95 0.98 1.04 1.11
Ours 0.30 0.42 0.68 0.76 0.85 0.89 0.94 0.98
Methods
posing
80ms 160ms 320ms 400ms 560ms 640ms 720ms 1000ms
ERD 1.35 1.41 1.69 1.86 2.06 2.12 2.18 2.57
LSTM-3LR 1.22 1.25 1.54 1.71 1.93 2.01 2.09 2.73
SRNN 0.96 1.14 1.70 2.04 2.48 2.47 2.69 3.50
Res-GRU 0.40 0.74 1.39 1.66 1.98 2.12 2.23 2.67
Zero-velocity 0.28 0.57 1.13 1.37 1.81 2.14 2.23 2.78
MHU 0.33 0.64 1.22 1.47 1.82 2.11 2.17 2.51
HMR 0.24 0.51 1.06 1.31 1.64 1.80 1.94 2.49
Ours 0.23 0.49 1.06 1.30 1.63 1.84 1.99 2.58
Methods
purchases
80ms 160ms 320ms 400ms 560ms 640ms 720ms 1000ms
ERD 1.16 1.30 1.49 1.52 1.81 1.86 1.85 2.34
LSTM-3LR 1.03 1.13 1.35 1.42 1.81 1.88 1.84 2.30
SRNN 0.69 1.09 1.48 1.67 1.92 1.99 1.91 2.48
Res-GRU 0.54 0.79 1.10 1.20 1.61 1.69 1.71 2.16
Zero-velocity 0.62 0.88 1.19 1.27 1.64 1.68 1.62 2.45
MHU - - - - - - - -
HMR 0.51 0.78 1.05 1.15 1.60 1.67 1.61 2.11
Ours 0.51 0.78 1.04 1.11 1.49 1.54 1.49 2.11
Methods
directions
discussion
80ms 160ms 320ms 400ms 80ms 160ms 320ms 400ms
ERD 1.06 1.17 1.19 1.24 0.99 1.07 1.22 1.26
LSTM-3LR 0.97 1.07 1.18 1.25 0.93 1.06 1.23 1.28
SRNN 0.67 0.93 1.02 1.16 0.81 1.03 1.43 1.57
Res-GRU 0.48 0.69 0.90 1.03 0.34 0.62 0.94 1.03
Zero-velocity 0.39 0.59 0.79 0.89 0.31 0.67 0.94 1.04
MHU - - - - 0.31 0.66 0.93 1.00
HMR 0.41 0.60 0.82 0.91 0.30 0.58 0.84 0.92
Ours 0.41 0.58 0.78 0.86 0.29 0.55 0.82 0.89
Methods
eating
phoning
80ms 160ms 320ms 400ms 80ms 160ms 320ms 400ms
ERD 0.97 1.04 1.10 1.20 1.18 1.27 1.50 1.57
LSTM-3LR 0.84 0.90 1.00 1.11 1.13 1.22 1.45 1.53
SRNN 0.65 0.81 1.02 1.13 0.92 1.37 1.82 1.97
Res-GRU 0.29 0.46 0.69 0.86 0.52 0.81 1.22 1.37
Zero-velocity 0.27 0.48 0.73 0.86 0.64 1.21 1.65 1.83
MHU - - - - - - - -
HMR 0.22 0.37 0.60 0.75 0.51 0.80 1.19 1.29
Ours 0.21 0.34 0.56 0.73 0.51 0.79 1.15 1.26
Methods
sitting
sittingdown
80ms 160ms 320ms 400ms 80ms 160ms 320ms 400ms
ERD 1.45 1.56 1.78 1.89 1.91 2.05 2.26 2.36
LSTM-3LR 1.38 1.51 1.74 1.84 1.73 1.87 2.05 2.13
SRNN 1.05 1.37 1.92 2.18 1.08 1.48 2.03 2.23
Res-GRU 0.49 0.76 1.25 1.48 0.52 0.87 1.38 1.59
Zero-velocity 0.40 0.63 1.02 1.18 0.39 0.74 1.07 1.19
MHU - - - - - - - -
HMR 0.40 0.64 1.06 1.22 0.40 0.73 1.10 1.25
Ours 0.40 0.63 1.03 1.17 0.39 0.74 1.10 1.25
Methods
smoking
takingphoto
80ms 160ms 320ms 400ms 80ms 160ms 320ms 400ms
ERD 1.12 1.22 1.41 1.47 0.99 1.11 1.25 1.32
LSTM-3LR 1.12 1.23 1.42 1.47 0.87 0.99 1.22 1.36
SRNN 0.66 0.94 1.33 1.52 0.63 0.83 1.21 1.35
Res-GRU 0.37 0.65 1.02 1.14 0.34 0.65 0.98 1.14
Zero-velocity 0.26 0.48 0.97 0.95 0.25 0.51 0.79 0.92
MHU - - - - 0.27 0.54 0.84 0.96
HMR 0.27 0.52 0.88 0.97 0.24 0.52 0.86 1.01
Ours 0.26 0.50 0.85 0.93 0.24 0.50 0.78 0.93
Methods
waiting
walkingdog
80ms 160ms 320ms 400ms 80ms 160ms 320ms 400ms
ERD 1.07 1.19 1.46 1.58 1.20 1.31 1.53 1.64
LSTM-3LR 0.97 1.12 1.46 1.61 1.07 1.23 1.44 1.53
SRNN 0.59 0.81 1.28 1.44 0.67 1.01 1.70 1.87
Res-GRU 0.37 0.67 1.14 1.36 0.55 0.87 1.25 1.43
Zero-velocity 0.34 0.67 1.22 1.47 0.60 0.98 1.36 1.50
MHU 0.56 0.88 1.21 1.37 - - - -
HMR 0.32 0.64 1.15 1.35 0.57 0.88 1.26 1.41
Ours 0.32 0.62 1.14 1.37 0.55 0.83 1.17 1.34
Methods
walkingtogether
average
80ms 160ms 320ms 400ms 80ms 160ms 320ms 400ms
ERD 1.01 1.11 1.22 1.27 1.18 1.28 1.46 1.54
LSTM-3LR 0.92 1.05 1.16 1.18 1.07 1.18 1.38 1.47
SRNN 0.64 0.86 1.18 1.28 0.76 1.04 1.45 1.62
Res-GRU 0.31 0.58 0.83 0.92 0.43 0.71 1.08 1.23
Zero-velocity 0.33 0.66 0.94 0.99 0.40 0.71 1.07 1.21
MHU - - - - 0.39 0.69 1.03 1.17
HMR 0.30 0.54 0.78 0.84 0.37 0.64 0.98 1.11
Ours 0.27 0.46 0.66 0.72 0.36 0.61 0.93 1.07

Mouse

Methods
mouse
80ms 160ms 320ms 400ms 560ms 640ms 720ms 1000ms
ERD 0.50 0.48 0.63 0.69 0.72 0.68 0.69 0.81
LSTM-3LR 0.53 0.49 0.66 0.68 0.67 0.62 0.70 0.75
Res-GRU 0.41 0.47 0.62 0.69 0.70 0.64 0.70 0.70
Zero-velocity 0.40 0.53 0.73 0.95 1.03 0.94 1.07 1.13
HMR 0.42 0.44 0.64 0.71 0.73 0.71 0.73 0.72
Ours 0.41 0.43 0.53 0.52 0.57 0.50 0.67 0.72