Abstract
We present Hand ArticuLated Occupancy (HALO), a novel representation of articulated hands that bridges the advantages of 3D keypoints and neural implicit surfaces and can be used in end-to-end trainable architectures. Unlike existing statistical parametric hand models (e.g. MANO), HALO is interpretable and directly leverages 3D joint skeleton defined in Euclidean space as input and produces a neural occupancy volume representing the posed hand surface. The key benefits of HALO are (1) it requires only 3D keypoints as input, which have benefits in terms of accuracy and are easier to learn for neural networks than a set of latent hand-model parameters; (2) it naturally provides a differentiable volumetric occupancy representation of the posed hand; (3) it can be trained end to end, allowing the formulation of losses on the hand surface that benefit the learning of 3D keypoints. We demonstrate the applicability of the HALO model to the task of conditional generation of hands that grasp 3D objects. In this setting, the differentiable nature of HALO is shown to improve the quality of the synthesized hands both in terms of physical plausibility and user preference.