# LLMApi: Keeping It Lively and Alive, Chunking and Caching # Introduction In my LLMApi project I wanted to be able to support asking for LOTS of data; however LLMs have a very limited amount of data they can output at one time; and aren't particularly fast about doing it. So I needed to add a clever way of pre-generting that data and deliver it in 'chunks' so you CAN get chunky data back quickly. Along with that there was the problem of contexts, a great feature but until now a context lasted as long as the app so this adds full support for a sliding cache to eliminate that potental source of memory leaks if you leave the simulator running.