Large generative models such as ChatGPT, which caused a global sensation, require large amounts of data and computational resources that are not easily accessible. Professor Jaejun Yoo presented a solution to this issue through recent research. He revealed the existence of a powerful sub-network within the generative model. This opened up the possibility for research on lightweight generative models.
The enthusiasm for “ChatGPT” is not cooling down since becoming the hottest word in the world as soon as it was first launched on Nov. 30, 2022. However, behind the large generative models such as ChatGPT hides an inconvenient truth. It is that in order to learn and deduce massive amounts of data, that much of energy is required. This means that huge amounts of carbon are emitted. The problem is that the use of generative AI will continue to increase.
“In the case of transformer models, which are the basic element of GPT currently used, learning a model that is about 10,000 times smaller than the actual GPT level would emit as much carbon as five cars for a lifetime. In addition, because it requires a large investment cost, there are less than 10 companies that can research and develop large models at the level of ChatGPT. I thought that it was not sustainable to learn or research large models by allocating a lot of resources as we do now.”
Professor Yoo worked as an AI researcher at Naver for two years from 2018 to 2019, leading the generative model research team. Although it was before a large model at the level of ChatGPT was introduced to the world, but he was able to conduct research without worrying about the size of computing resources at the time. However, as an individual researcher, it was difficult to access large generative model research that requires large-scale data and computing resources. Therefore, he sought ways to develop a more efficient learning method while maintaining the performance, to make the model itself lighter, or to completely innovate with a different framework. Then, what caught his attention was the lightweighting of the generative model. In fact, lightweighting was not Professor Yoo’s main research field.
“Because I wasn’t familiar with the field, I was looking for help. And I found out that one of the speakers at an online seminar held by our department was conducting research on lightweighting classification models. I reached out to him actively, and we carried out collaborative research together.”
Professor Yoo’s team discovered for the first time in the world, through this research, powerful subnetworks (strong lottery tickets) that exhibit performance similar to or better than the original model without additional learning exist in generative models also. They use only 10% of the parameters of the original generative model and at the same time demonstrate the same level of performance.
“Most of the existing lightweighting methods had the problem of degrading the performance in order to reduce the model size. Typically, lightweighting algorithms are based on a specific quantitative learning performance criteria in the larger framework and maintain it while eliminating parameters that have little influence on this process.”
The solution to lightweighting classification models can be found through such methods. However, generative models are difficult to quantify because the evaluation criteria for generated data are subjective due to the nature of the field of generation. The learning of generative models themselves is very unstable, and if parameters are removed, the instability may worsen. Therefore, research on lightweighting generative models is challenging and is not conducted actively. Nevertheless, Professor Yoo did not hesitate to take on the challenge. He thought it was a crucial and achievable research project for society.
“I believe that university researchers should research areas that companies cannot venture into. Achieving lightweight generative models will also help industry, allowing companies to improve their services. Although it may be difficult, we can always find a way. I started the research thinking that it was OK to fail.”
The powerful subnetwork method that Professor Yoo’s research team developed is significant in that it can stably find a lighter model with generative performance similar or way better. In addition, he also proposed an algorithm that finds strong subnetworks and presented it at AAAI, one of the three major artificial intelligence conferences. Following the submission of a domestic patent application, international patent applications are also in progress.
“This research demonstrated a possibility for lightweight generative models. I hope this will serve as an opportunity to actively promote related research.”
As mentioned earlier, there are not many companies in the world that can research and provide services for large generative models.
“Most of them are American companies, and the Korean company Naver possesses a model that is of global standard and provides services. This alone shows Korea’s competitiveness. I believe this not only shows the competitiveness of the company, but also demonstrates its academic strength. Judging by the quality of papers that are presented at major AI conferences, Korea is leading the way, alongside the United States and China.”
Professor Yoo would like to continuously contribute to industry and society, contemplating research topics that can be tackled only in academia. For example, he wants to develop a new paradigm-shifting generative model.
“Interestingly, and perhaps naturally, models that shifted the paradigms in generative models have always come from academia rather than industry. The GAN model, which was presented in 2014 and is still under continuous research, and the diffusion model that replaced GAN model is currently widely used in image generation, both came from academia. Certainly, this is not the end. I want to be a researcher to develop a novel approach to generative models for the first time.”
At the end of the interview, Professor Yoo left some advice for UNIST students who will become talents in the artificial intelligence era. As AI is also only a tool for humans, he emphasized that it is important to acquire the ability to use it properly and communicate effectively.
“While there are countless things one can learn at university and graduate school, in broad terms, they all pertain to reading well, writing well, thinking well, and communicating well. As a professor, my role is to show students good examples, instead of just giving them answers; to help them with the process of finding, defining, and solving problems; and to guide them on how to think better and deeper for themselves. I believe that developing one’s expertise and gaining insights into one's own position is the right way to grow into a key talent in the era of artificial intelligence.”