Kundu, JN and Ganeshan, A and Rahul, MV and Prakash, A and Venkatesh Babu, R (2018) ISPA-Net: Iterative semantic pose alignment network. In: 26th ACM Multimedia conference, MM 2018, 22 - 26 October 2018, Seoul, pp. 967-975.
PDF
ACM_ACMMC_MM_967-975_2018.pdf - Published Version Restricted to Registered users only Download (3MB) | Request a copy |
||
|
PDF
supplementary_ACM_ACMMC_MM_967-975_2018.pdf - Published Supplemental Material Download (1MB) | Preview |
Abstract
Understanding and extracting 3D information of objects from monocular 2D images is a fundamental problem in computer vision. In the task of 3D object pose estimation, recent data driven deep neural network based approaches suffer from scarcity of real images with 3D keypoint and pose annotations. Drawing inspiration from human cognition, where the annotators use a 3D CAD model as structural reference to acquire ground-truth viewpoints for real images; we propose an iterative Semantic Pose Alignment Network, called iSPA-Net. Our approach focuses on exploiting semantic 3D structural regularity to solve the task of fine-grained pose estimation by predicting viewpoint difference between a given pair of images. Such image comparison based approach also alleviates the problem of data scarcity and hence enhances scalability of the proposed approach for novel object categories with minimal annotation. The fine-grained object pose estimator is also aided by correspondence of learned spatial descriptor of the input image pair. The proposed pose alignment framework enjoys the faculty to refine its initial pose estimation in consecutive iterations by utilizing an online rendering setup along with effectiveness of a nonuniform bin classification of pose-difference. This enables iSPA-Net to achieve state-of-the-art performance on various real image viewpoint estimation datasets. Further, we demonstrate effectiveness of the approach for multiple applications. First, we show results for active object viewpoint localization to capture images from similar pose considering only a single image as pose reference. Second, we demonstrate the ability of the learned semantic correspondence to perform unsupervised part-segmentation transfer using only a single part-annotated 3D template model per object class. To encourage reproducible research, we have released the codes for our proposed algorithm.
Item Type: | Conference Paper |
---|---|
Publication: | MM 2018 - Proceedings of the 2018 ACM Multimedia Conference |
Publisher: | Association for Computing Machinery, Inc |
Additional Information: | The copyright for this article belongs to the Association for Computing Machinery. |
Keywords: | Alignment; Computer aided design; Deep neural networks; Iterative methods; Semantic Web; Semantics, Multiple applications; Network-based approach; Pose estimation; Pose-invarient representation; Reproducible research; Semantic correspondence; State-of-the-art performance; Structural regularity, Image enhancement |
Department/Centre: | Division of Interdisciplinary Sciences > Computational and Data Sciences Others |
Date Deposited: | 03 Aug 2022 06:29 |
Last Modified: | 03 Aug 2022 06:29 |
URI: | https://eprints.iisc.ac.in/id/eprint/75203 |
Actions (login required)
View Item |