Adaptive Mixing of Auxiliary Losses in Supervised Learning

Sivasubramanian, D and Maheshwari, A and Prathosh, AP and Shenoy, P and Ramakrishnan, G (2023) Adaptive Mixing of Auxiliary Losses in Supervised Learning. In: 37th AAAI Conference on Artificial Intelligence, AAAI 2023, 7-14 February 2023, Washington, pp. 9855-9863.

PDF
AAAI2023_37_9855-9863_2023.pdf - Published Version
Restricted to Registered users only
Download (321kB) | Request a copy

Official URL: https://ojs.aaai.org/index.php/AAAI/article/view/2...

Abstract

In several supervised learning scenarios, auxiliary losses are used in order to introduce additional information or constraints into the supervised learning objective. For instance, knowledge distillation aims to mimic outputs of a powerful teacher model; similarly, in rule-based approaches, weak labeling information is provided by labeling functions which may be noisy rule-based approximations to true labels. We tackle the problem of learning to combine these losses in a principled manner. Our proposal, AMAL, uses a bi-level optimization criterion on validation data to learn optimal mixing weights, at an instance-level, over the training data. We describe a meta-learning approach towards solving this bilevel objective and show how it can be applied to different scenarios in supervised learning. Experiments in a number of knowledge distillation and rule denoising domains show that AMAL provides noticeable gains over competitive baselines in those domains. We empirically analyze our method and share insights into the mechanisms through which it provides performance gains. The code for AMAL is at: https://github.com/durgas16/AMAL.git. Copyright Â© 2023, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.

Item Type:	Conference Paper
Publication:	Proceedings of the 37th AAAI Conference on Artificial Intelligence, AAAI 2023
Publisher:	AAAI Press
Additional Information:	The copyright for this article belongs to the AAAI Press.
Keywords:	Distillation; Mixing, Bi-level optimization; Instance knowledge; Labeling functions; Labelings; Learning objectives; Learning scenarios; Optimization criteria; Rule based; Rule-based approach; Teacher models, Supervised learning
Department/Centre:	Division of Electrical Sciences > Electrical Communication Engineering
Date Deposited:	08 Nov 2023 09:47
Last Modified:	08 Nov 2023 09:47
URI:	https://eprints.iisc.ac.in/id/eprint/83060

Actions (login required)

View Item


	Powered by EPrints		A service from The J.R.D. Tata Memorial Library Indian Institute of Science, Bengaluru-560012, India