Korean media: Not only HBM! In the future, HBF rises, NAND stacks into new storage energy for AI

According to the Korean media report, Kim Joung-ho, professor of the Department of Electrical Engineering at the Korean Academy of Science and Technology (KAIST), said that high-bandwidth Flash (HBF) is expected to become an important memory technology in the next generation of AI era, and will develop with high-bandwidth memory (HBM) and jointly promote the growth of chip factories.
HBF's design concept is similar to that of HBM, both of which are connected to multilayer wafer stacks through silicon perforation (TSV). The difference is that HBM is centered on DRAM, while HBF uses NAND flash memory for stacking, with the characteristics of "larger capacity and better cost". Kim Joung-ho pointed out that although NAND is not as fast as DRAM, its capacity is often more than 10 times higher. If constructed in hundreds or even thousands of layers of stacking, it will effectively meet the AI model's demand for large storage and is expected to become a NAND version of HBM.
Currently, the generative AI model is being rapidly expanded, and the input token of a single model has reached millions of levels, and TB of data is required for processing. During thousands of readings per second, if the memory frequency is insufficient, the bottle will appear, resulting in a significant decrease in the response speed of large language models (LLMs) such as ChatGPT and Google Gemini.
Kim Joung-ho emphasizes that this limitation comes from the current Eunuimer architecture. Since the GPU and memory are designed separately, the data transmission frequency width determines the performance limit, "even if the GPU scale is doubled, there is no intent if the frequency width is insufficient."
He predicted that in the future, GPUs will simultaneously load HBM and HBF to form an intercomponent structure: HBM is used for high-speed fast fetching and is responsible for real-time computing data, while HBF will undertake large capacity storage and directly store a complete AI model. This will help break through memory bottlenecks, allowing the GPU to handle larger generative AI and even cover complex content such as long videos. "In the future, AI will not only be limited to text and images, but will be able to generate movie-like long films, and the memory capacity required at a time will be more than 1,000 times that of the existing one."
김정호 KAIST 교수 “HBF가 메모리 승자 가르는 시대 온다 "