Improving gesture recognition through spatial focus of attention
Gestures are a common form of human communication and important for human computer interfaces (HCI). Most recent approaches to gesture recognition use deep learning within multi- channel architectures. We show that when spatial attention is focused on the hands, gesture recognition improves significantly, particularly when the channels are fused using a sparse network. We propose an architecture (FOANet) that divides processing among four modalities (RGB, depth, RGB flow, and depth flow), and three spatial focus of attention regions (global, left hand, and right hand). The resulting 12 channels ...
(For more, see "View full record.")