A Scalable GPT-2 Inference Hardware Architecture on FPGA

Yemme, A and Garani, SS (2023) A Scalable GPT-2 Inference Hardware Architecture on FPGA. In: 2023 International Joint Conference on Neural Networks, IJCNN 2023, 18-23 June 2023, Gold Coast.

PDF
IJCNN-2023_2023.pdf - Published Version
Restricted to Registered users only
Download (2MB) | Request a copy

Official URL: https://doi.org/10.1109/IJCNN54540.2023.10191067

Abstract

Transformer-based architectures using attention mechanisms are a class of learning architectures for sequence processing tasks. These include architectures such as the generative pretrained transformer (GPT) and the bidirectional encoder representations from transformers (BERT). GPT-2 is a popular sequence learning architecture that uses transformer architecture. GPT-2 is trained on text prediction, and the network parameters obtained during this training process can be used in various other tasks like text classification and premise-hypothesis testing. Edge computing is an recent trend in which training is done on cloud or server with multiple GPUs, but inference is done on edge devices like mobile phones to reduce latency and improve privacy. This necessitates a study of GPT-2 performance and complexity to distill hardware-based architectures for their usability on edge devices. In this paper, a single layer of GPT-2 based inference architecture is implemented on Virtex-7 xc7vx485tffg1761-2 FPGA board. The inference engine has model dimensionality of 128 and latency of 1.637 ms while operating at 142.44 MHz, consuming 85.6K flip-flops and 96.8K lookup tables, achieving 1.73x speedup compared to previously reported work on transformer-based architecture. The approach proposed in this paper is scalable to models of higher dimensionality. Â© 2023 IEEE.

Item Type:	Conference Paper
Publication:	Proceedings of the International Joint Conference on Neural Networks
Publisher:	Institute of Electrical and Electronics Engineers Inc.
Additional Information:	The copyright for this article belongs to the Institute of Electrical and Electronics Engineers Inc.
Keywords:	Classification (of information); Computer hardware; Field programmable gate arrays (FPGA); Flip flop circuits; Network architecture; Neural networks; Program processors; Text processing, Attention mechanisms; Generative pretrained transformer; Hardware architecture; Learning architectures; Network parameters; Neural-networks; Sequence learning; Text prediction; Training process; Transformer, Table lookup
Department/Centre:	Division of Electrical Sciences > Electronic Systems Engineering (Formerly Centre for Electronic Design & Technology)
Date Deposited:	28 Oct 2023 06:36
Last Modified:	28 Oct 2023 06:36
URI:	https://eprints.iisc.ac.in/id/eprint/83168

Actions (login required)

View Item


	Powered by EPrints		A service from The J.R.D. Tata Memorial Library Indian Institute of Science, Bengaluru-560012, India