image credit: benzoix / Freepik

Partitioning an LLM between cloud and edge

May 28, 2024

Via: InfoWorld

Historically, large language models (LLMs) have required substantial computational resources. This means development and deployment are confined mainly to powerful centralized systems, such as public cloud providers. However, although many people believe that we need massive amounts of GPUs bound to vast amounts of storage to run generative AI, in truth, there are methods to use a tier or partitioned architecture to drive value for specific business use cases.

Somehow, it’s in the generative AI zeitgeist that edge computing won’t work. This is given the processing requirements of generative AI models and the need to drive high-performing inferences. I’m often challenged when I suggest “knowledge at the edge” architecture due to this misperception. We’re missing a huge opportunity to be innovative, so let’s take a look.

Read More on InfoWorld