Thursday, October 26, 2023
[WiCAS Speeches] 12:25~13:30
Quantization and Accelerator Design for Large Language Models
Prof. Yulhwa Kim
Yulhwa Kim received the B.S. and PhD. degrees in Convergence IT Engineering from Pohang University of Science and Technology, in 2016 and 2022, respectively. She spent the summer of 2019 and 2020 at the IBM Thomas J. Watson Research Center and Qualcomm, respectively. She is currently a postdoctoral researcher in Inter-University Semiconductor Research Center (ISRC) at Seoul National University. Her current research interests include hardware-software co-design of efficient AI systems and in-memory computing.
Large language models (LLMs) are artificial intelligence (AI) models specifically designed to generate novel text content. Recently, LLMs have gained popularity due to their remarkable ability to produce high-quality content that closely resembles human-generated texts. The success of LLMs is closely tied to the advancement in deep neural networks (DNNs). Over the years, the sizes of DNNs have grown exponentially to capture complex patterns within the training data. While these larger DNNs have indeed enhanced the quality of generated content, their inherent inefficiencies have introduced significant challenges when it comes to practical application.
In this talk, I will cover quantization and accelerator designs tailored for LLMs. More specifically, this talk will delve into the weight-only quantization techniques for LLMs, which effectively reduce model size and address memory-related issues encountered during LLM inference. Subsequently, we will delve into the design of the integer unit-based accelerators for processing weight-only quantized LLMs will be discussed.
Design of Low-Dropout Regulator (LDO) with Improved PSRR and Transient Response
Prof. Hyunsun Mo
Hyunsun Mo received the B.S. and M.S. degrees in electrical engineering from Kookmin University in 1993 and 2011, respectively, and her Ph.D. degree in electronics engineering from Kookmin University, Seoul, Korea, in 2014. From 1993 to 2008, She worked at Samsung Electronics, Hwaseong, Gyeonggi-do, as a Senior Engineer for SRAM & Flash memory. Since 2018, she has been a professor in school of electrical engineering at Kookmin University. Her current research interests include analog circuits and power management (ICs), and memory for artificial intelligence systems.
As switching speed increases to reduce the size of passive components in DC-DC converters, efforts are continuing to increase the PSRR at high frequencies by widening the bandwidth of LDOs. Methods to expand the bandwidth include using on-chip capacitor LDOs that eliminate external capacitors or flipped voltage follower (FVF) using a local feedback loop. Additionally, since the PSRR in a closed-loop system is proportional to the open-loop gain, efforts to improve it are also continuing.
LDOs integrated into recent SoCs require fast transient response characteristics to respond to rapid changes in load currents. To this end, we propose a mixed-mode on-chip LDO that combines analog driving by a pass transistor and the digital driving, and a buffered cascoded FVF structure to increase the open-loop gain and to realize higher bandwidth. Additionally, the characteristics of LDO’s current driving and FVF’s voltage driving are compared.