Text2Place: Affordance-Aware Text Guided Human Placement

Parihar, R and Gupta, H and VS, S and Babu, RV (2025) Text2Place: Affordance-Aware Text Guided Human Placement. In: 18th European Conference on Computer Vision, ECCV 2024, 29 September 2024 through 4 October 2024, Milan, pp. 57-77.

Full text not available from this repository. (Request a copy)

Official URL: https://doi.org/10.1007/978-3-031-72646-0_4

Abstract

For a given scene, humans can easily reason for the locations and pose to place objects. Designing a computational model to reason about these affordances poses a significant challenge, mirroring the intuitive reasoning abilities of humans. This work tackles the problem of realistic human insertion in a given background scene termed as Semantic Human Placement. This task is extremely challenging given the diverse backgrounds, scale, and pose of the generated person and, finally, the identity preservation of the person. We divide the problem into the following two stages i) learning semantic masks using text guidance for localizing regions in the image to place humans and ii) subject-conditioned inpainting to place a given subject adhering to the scene affordance within the semantic masks. For learning semantic masks, we leverage rich object-scene priors learned from the text-to-image generative models and optimize a novel parameterization of the semantic mask, eliminating the need for large-scale training. To the best of our knowledge, we are the first ones to provide an effective solution for realistic human placements in diverse real-world scenes. The proposed method can generate highly realistic scene compositions while preserving the background and subject identity. Further, we present results for several downstream tasks - scene hallucination from a single or multiple generated persons and text-based attribute editing. With extensive comparisons against strong baselines, we show the superiority of our method in realistic human placement. Â© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

Item Type:	Conference Paper
Publication:	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Publisher:	Springer Science and Business Media Deutschland GmbH
Additional Information:	The copyright for this article belongs to Springer Science and Business Media Deutschland GmbH
Keywords:	Adversarial machine learning; Contrastive Learning, Affordances; Background scenes; Computational modelling; Human inpainting; Inpainting; Intuitive reasoning; Learning semantics; Reasoning ability; Spatial relations; Stage I, Semantics
Department/Centre:	Division of Interdisciplinary Sciences > Computational and Data Sciences
Date Deposited:	12 Dec 2024 23:01
Last Modified:	12 Dec 2024 23:01
URI:	http://eprints.iisc.ac.in/id/eprint/87086

Actions (login required)

View Item


	Powered by EPrints		A service from The J.R.D. Tata Memorial Library Indian Institute of Science, Bengaluru-560012, India