[{"data":1,"prerenderedAt":608},["ShallowReactive",2],{"blog-building-forever-llm":3,"blog-post-nav":570},{"_path":4,"_dir":5,"_draft":6,"_partial":6,"_locale":7,"title":8,"description":9,"date":10,"body":11,"_type":564,"_id":565,"_source":566,"_file":567,"_stem":568,"_extension":569},"/blog/building-forever-llm","blog",false,"","Building forever-llm: three takes on a model that never stops","What I learned building an LLM that generates indefinitely while people watch and interrupt it in real time, across three rewrites.","2026-05-20",{"type":12,"children":13,"toc":558},"root",[14,22,29,51,64,131,136,142,163,243,248,254,274,292,496,517,523,535,547,552],{"type":15,"tag":16,"props":17,"children":18},"element","p",{},[19],{"type":20,"value":21},"text","I had a simple, slightly weird idea: what if a language model just kept generating, forever? Not a chatbot waiting for your turn, but an unbroken stream of thought that a few invited people could watch live and occasionally nudge. I built it three times over two days, and each rewrite taught me something about the gap between a fun prototype and something you can actually share.",{"type":15,"tag":23,"props":24,"children":26},"h2",{"id":25},"v1-one-file-no-dependencies",[27],{"type":20,"value":28},"v1: one file, no dependencies",{"type":15,"tag":16,"props":30,"children":31},{},[32,34,41,43,49],{"type":20,"value":33},"The first version (",{"type":15,"tag":35,"props":36,"children":38},"code",{"className":37},[],[39],{"type":20,"value":40},"forever-llm",{"type":20,"value":42},") is a single ",{"type":15,"tag":35,"props":44,"children":46},{"className":45},[],[47],{"type":20,"value":48},"stream-server.js",{"type":20,"value":50},", plain Node, no npm install, served alongside a static HTML page. It talks to a local Ollama instance and runs a perpetual loop: ask Ollama to generate, stream the tokens to every connected browser over Server-Sent Events, and the moment generation finishes, immediately start again.",{"type":15,"tag":16,"props":52,"children":53},{},[54,56,62],{"type":20,"value":55},"The interesting bits were all about keeping an ",{"type":15,"tag":57,"props":58,"children":59},"em",{},[60],{"type":20,"value":61},"infinite",{"type":20,"value":63}," generation coherent and interruptible:",{"type":15,"tag":65,"props":66,"children":67},"ul",{},[68,88,98],{"type":15,"tag":69,"props":70,"children":71},"li",{},[72,78,80,86],{"type":15,"tag":73,"props":74,"children":75},"strong",{},[76],{"type":20,"value":77},"Injections.",{"type":20,"value":79}," When you type something, it gets queued as ",{"type":15,"tag":35,"props":81,"children":83},{"className":82},[],[84],{"type":20,"value":85},"[USER: ...]",{"type":20,"value":87}," and woven into the stream. The system prompt frames these as \"a thought arriving from outside\" so the model absorbs them instead of breaking into chat mode. A pending injection aborts the in-flight request so it lands promptly.",{"type":15,"tag":69,"props":89,"children":90},{},[91,96],{"type":15,"tag":73,"props":92,"children":93},{},[94],{"type":20,"value":95},"Context trimming.",{"type":20,"value":97}," Context can't grow forever, so once the accumulated text passes ~8000 chars it keeps the trailing slice and drops the rest behind a marker.",{"type":15,"tag":69,"props":99,"children":100},{},[101,106,108,114,116,122,123,129],{"type":15,"tag":73,"props":102,"children":103},{},[104],{"type":20,"value":105},"Stop-token scrubbing.",{"type":20,"value":107}," Models love to emit ",{"type":15,"tag":35,"props":109,"children":111},{"className":110},[],[112],{"type":20,"value":113},"\u003C/s>",{"type":20,"value":115},", ",{"type":15,"tag":35,"props":117,"children":119},{"className":118},[],[120],{"type":20,"value":121},"\u003C|eot_id|>",{"type":20,"value":115},{"type":15,"tag":35,"props":124,"children":126},{"className":125},[],[127],{"type":20,"value":128},"User:",{"type":20,"value":130}," and similar. I strip those so the monologue never visibly \"ends.\"",{"type":15,"tag":16,"props":132,"children":133},{},[134],{"type":20,"value":135},"It worked, and it was genuinely fun to watch. But everything lived in module-scope variables, auth didn't exist, and it was hardwired to Ollama on my own machine. Not something I could send to friends.",{"type":15,"tag":23,"props":137,"children":139},{"id":138},"v15-the-nuxt-rewrite",[140],{"type":20,"value":141},"v1.5: the Nuxt rewrite",{"type":15,"tag":16,"props":143,"children":144},{},[145,147,153,155,161],{"type":20,"value":146},"The second version (",{"type":15,"tag":35,"props":148,"children":150},{"className":149},[],[151],{"type":20,"value":152},"forever_llm",{"type":20,"value":154},") is the same concept rebuilt properly as a Nuxt 3 app on the Nitro Bun preset, with ",{"type":15,"tag":35,"props":156,"children":158},{"className":157},[],[159],{"type":20,"value":160},"bun:sqlite",{"type":20,"value":162}," for persistence. This is where it became shareable:",{"type":15,"tag":65,"props":164,"children":165},{},[166,192,218],{"type":15,"tag":69,"props":167,"children":168},{},[169,174,176,182,184,190],{"type":15,"tag":73,"props":170,"children":171},{},[172],{"type":20,"value":173},"Magic-link invites.",{"type":20,"value":175}," On first boot with an empty DB it prints a bootstrap admin link to stdout. From ",{"type":15,"tag":35,"props":177,"children":179},{"className":178},[],[180],{"type":20,"value":181},"/admin",{"type":20,"value":183}," you mint further invite links. Tokens are SHA-256 hashed in the DB, sessions are ",{"type":15,"tag":35,"props":185,"children":187},{"className":186},[],[188],{"type":20,"value":189},"httpOnly",{"type":20,"value":191}," cookies.",{"type":15,"tag":69,"props":193,"children":194},{},[195,200,202,208,210,216],{"type":15,"tag":73,"props":196,"children":197},{},[198],{"type":20,"value":199},"Two providers behind one interface.",{"type":20,"value":201}," I added OpenRouter alongside Ollama, hidden behind a single ",{"type":15,"tag":35,"props":203,"children":205},{"className":204},[],[206],{"type":20,"value":207},"streamCompletion",{"type":20,"value":209},"/",{"type":15,"tag":35,"props":211,"children":213},{"className":212},[],[214],{"type":20,"value":215},"chatCompletion",{"type":20,"value":217}," API. The provider, model, and temperature are all switchable per session from a setup screen, so I could run a free hosted model for friends or a local one for myself.",{"type":15,"tag":69,"props":219,"children":220},{},[221,226,228,234,236,241],{"type":15,"tag":73,"props":222,"children":223},{},[224],{"type":20,"value":225},"Backoff that respects the provider.",{"type":20,"value":227}," Free tiers rate-limit you. The loop reads ",{"type":15,"tag":35,"props":229,"children":231},{"className":230},[],[232],{"type":20,"value":233},"Retry-After",{"type":20,"value":235}," (header ",{"type":15,"tag":57,"props":237,"children":238},{},[239],{"type":20,"value":240},"and",{"type":20,"value":242}," the JSON error body OpenRouter uses) and otherwise falls back to capped exponential backoff with jitter.",{"type":15,"tag":16,"props":244,"children":245},{},[246],{"type":20,"value":247},"The hardest part was honestly the prompt, not the code. Getting a model to produce flowing, turn-less prose (and to never say \"in conclusion\") took far more iteration on the system prompt than on the loop.",{"type":15,"tag":23,"props":249,"children":251},{"id":250},"v2-continuous-but-a-real-chat",[252],{"type":20,"value":253},"v2: continuous, but a real chat",{"type":15,"tag":16,"props":255,"children":256},{},[257,259,265,267,272],{"type":20,"value":258},"The third version (",{"type":15,"tag":35,"props":260,"children":262},{"className":261},[],[263],{"type":20,"value":264},"forever-llmv2",{"type":20,"value":266},") reframed the whole thing. Instead of one global stream, it's a multi-conversation chat (sidebar, multiple threads, the lot) where each conversation has a ",{"type":15,"tag":73,"props":268,"children":269},{},[270],{"type":20,"value":271},"Continuous",{"type":20,"value":273}," toggle. Flip it on and the assistant doesn't stop: when it would normally finish a turn, it starts another. You can send a message mid-generation and it joins naturally.",{"type":15,"tag":16,"props":275,"children":276},{},[277,279,290],{"type":20,"value":278},"The architectural shift was moving from a single global loop to ",{"type":15,"tag":73,"props":280,"children":281},{},[282,284],{"type":20,"value":283},"per-conversation state in a ",{"type":15,"tag":35,"props":285,"children":287},{"className":286},[],[288],{"type":20,"value":289},"Map",{"type":20,"value":291},", each with its own abort controller and loop promise:",{"type":15,"tag":293,"props":294,"children":298},"pre",{"className":295,"code":296,"language":297,"meta":7,"style":7},"language-ts shiki shiki-themes github-dark","async function runLoop(s: ConvState) {\n  s.running = true\n  do {\n    if (global.killswitch) break\n    const messages = loadHistory(s.conversationId)\n    // ...stream tokens, persisting a \"partial\" assistant message as it goes\n  } while (s.continuous && !global.killswitch && s.running)\n}\n","ts",[299],{"type":15,"tag":35,"props":300,"children":301},{"__ignoreMap":7},[302,352,372,386,405,434,444,487],{"type":15,"tag":303,"props":304,"children":307},"span",{"class":305,"line":306},"line",1,[308,314,319,325,331,337,342,347],{"type":15,"tag":303,"props":309,"children":311},{"style":310},"--shiki-default:#F97583",[312],{"type":20,"value":313},"async",{"type":15,"tag":303,"props":315,"children":316},{"style":310},[317],{"type":20,"value":318}," function",{"type":15,"tag":303,"props":320,"children":322},{"style":321},"--shiki-default:#B392F0",[323],{"type":20,"value":324}," runLoop",{"type":15,"tag":303,"props":326,"children":328},{"style":327},"--shiki-default:#E1E4E8",[329],{"type":20,"value":330},"(",{"type":15,"tag":303,"props":332,"children":334},{"style":333},"--shiki-default:#FFAB70",[335],{"type":20,"value":336},"s",{"type":15,"tag":303,"props":338,"children":339},{"style":310},[340],{"type":20,"value":341},":",{"type":15,"tag":303,"props":343,"children":344},{"style":321},[345],{"type":20,"value":346}," ConvState",{"type":15,"tag":303,"props":348,"children":349},{"style":327},[350],{"type":20,"value":351},") {\n",{"type":15,"tag":303,"props":353,"children":355},{"class":305,"line":354},2,[356,361,366],{"type":15,"tag":303,"props":357,"children":358},{"style":327},[359],{"type":20,"value":360},"  s.running ",{"type":15,"tag":303,"props":362,"children":363},{"style":310},[364],{"type":20,"value":365},"=",{"type":15,"tag":303,"props":367,"children":369},{"style":368},"--shiki-default:#79B8FF",[370],{"type":20,"value":371}," true\n",{"type":15,"tag":303,"props":373,"children":375},{"class":305,"line":374},3,[376,381],{"type":15,"tag":303,"props":377,"children":378},{"style":310},[379],{"type":20,"value":380},"  do",{"type":15,"tag":303,"props":382,"children":383},{"style":327},[384],{"type":20,"value":385}," {\n",{"type":15,"tag":303,"props":387,"children":389},{"class":305,"line":388},4,[390,395,400],{"type":15,"tag":303,"props":391,"children":392},{"style":310},[393],{"type":20,"value":394},"    if",{"type":15,"tag":303,"props":396,"children":397},{"style":327},[398],{"type":20,"value":399}," (global.killswitch) ",{"type":15,"tag":303,"props":401,"children":402},{"style":310},[403],{"type":20,"value":404},"break\n",{"type":15,"tag":303,"props":406,"children":408},{"class":305,"line":407},5,[409,414,419,424,429],{"type":15,"tag":303,"props":410,"children":411},{"style":310},[412],{"type":20,"value":413},"    const",{"type":15,"tag":303,"props":415,"children":416},{"style":368},[417],{"type":20,"value":418}," messages",{"type":15,"tag":303,"props":420,"children":421},{"style":310},[422],{"type":20,"value":423}," =",{"type":15,"tag":303,"props":425,"children":426},{"style":321},[427],{"type":20,"value":428}," loadHistory",{"type":15,"tag":303,"props":430,"children":431},{"style":327},[432],{"type":20,"value":433},"(s.conversationId)\n",{"type":15,"tag":303,"props":435,"children":437},{"class":305,"line":436},6,[438],{"type":15,"tag":303,"props":439,"children":441},{"style":440},"--shiki-default:#6A737D",[442],{"type":20,"value":443},"    // ...stream tokens, persisting a \"partial\" assistant message as it goes\n",{"type":15,"tag":303,"props":445,"children":447},{"class":305,"line":446},7,[448,453,458,463,468,473,478,482],{"type":15,"tag":303,"props":449,"children":450},{"style":327},[451],{"type":20,"value":452},"  } ",{"type":15,"tag":303,"props":454,"children":455},{"style":310},[456],{"type":20,"value":457},"while",{"type":15,"tag":303,"props":459,"children":460},{"style":327},[461],{"type":20,"value":462}," (s.continuous ",{"type":15,"tag":303,"props":464,"children":465},{"style":310},[466],{"type":20,"value":467},"&&",{"type":15,"tag":303,"props":469,"children":470},{"style":310},[471],{"type":20,"value":472}," !",{"type":15,"tag":303,"props":474,"children":475},{"style":327},[476],{"type":20,"value":477},"global.killswitch ",{"type":15,"tag":303,"props":479,"children":480},{"style":310},[481],{"type":20,"value":467},{"type":15,"tag":303,"props":483,"children":484},{"style":327},[485],{"type":20,"value":486}," s.running)\n",{"type":15,"tag":303,"props":488,"children":490},{"class":305,"line":489},8,[491],{"type":15,"tag":303,"props":492,"children":493},{"style":327},[494],{"type":20,"value":495},"}\n",{"type":15,"tag":16,"props":497,"children":498},{},[499,501,507,509,515],{"type":20,"value":500},"Persisting each assistant message as ",{"type":15,"tag":35,"props":502,"children":504},{"className":503},[],[505],{"type":20,"value":506},"partial: 1",{"type":20,"value":508}," while it streams, then flipping it to ",{"type":15,"tag":35,"props":510,"children":512},{"className":511},[],[513],{"type":20,"value":514},"0",{"type":20,"value":516}," on completion, meant a refresh mid-generation didn't lose anything. SSE events fan out two ways: to viewers of a specific conversation, and globally so the sidebar can show which threads are live.",{"type":15,"tag":23,"props":518,"children":520},{"id":519},"what-id-keep-and-what-id-change",[521],{"type":20,"value":522},"What I'd keep and what I'd change",{"type":15,"tag":16,"props":524,"children":525},{},[526,528,533],{"type":20,"value":527},"The single best decision was the ",{"type":15,"tag":73,"props":529,"children":530},{},[531],{"type":20,"value":532},"provider abstraction",{"type":20,"value":534}," in v1.5, being able to develop against local Ollama and demo on a hosted free model without touching the loop. The killswitch and per-token speed throttle (server-side, so it's enforced for everyone) also earned their place.",{"type":15,"tag":16,"props":536,"children":537},{},[538,540,545],{"type":20,"value":539},"The honest limitation, written right into the README, is that it's ",{"type":15,"tag":73,"props":541,"children":542},{},[543],{"type":20,"value":544},"single-replica only",{"type":20,"value":546},": SQLite plus in-memory stream state don't shard. That's fine for a handful of invited people, which is exactly the audience, but it's the first thing I'd have to tear out to make this real.",{"type":15,"tag":16,"props":548,"children":549},{},[550],{"type":20,"value":551},"If I did a v3, I'd lift the loop state out of module scope into something durable so a restart doesn't kill an in-flight stream, and I'd unify the v1.5 \"stream of consciousness\" and the v2 \"continuous chat\" into one mode you toggle rather than two separate codebases. Three rewrites in, the idea is still the fun part, the engineering was mostly about making \"never stop\" behave.",{"type":15,"tag":553,"props":554,"children":555},"style",{},[556],{"type":20,"value":557},"html .default .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}",{"title":7,"searchDepth":354,"depth":354,"links":559},[560,561,562,563],{"id":25,"depth":354,"text":28},{"id":138,"depth":354,"text":141},{"id":250,"depth":354,"text":253},{"id":519,"depth":354,"text":522},"markdown","content:blog:building-forever-llm.md","content","blog/building-forever-llm.md","blog/building-forever-llm","md",[571,575,576,580,584,588,592,596,600,604],{"_path":572,"title":573,"date":574},"/blog/deploying-nuxt-to-cloudflare-workers","Deploying this Nuxt site to Cloudflare Workers","2026-06-06",{"_path":4,"title":8,"date":10},{"_path":577,"title":578,"date":579},"/blog/laravel-cortex-adhd-productivity","Building Cortex: An ADHD Productivity App in Laravel + Inertia","2026-05-12",{"_path":581,"title":582,"date":583},"/blog/eink-spotify-weather-clock","Building an E-Ink Clock That Shows Spotify, Weather and the Time","2026-05-05",{"_path":585,"title":586,"date":587},"/blog/hetzner-k8s-cluster","Building a K3s Cluster on Hetzner with Terraform and GitOps","2026-04-20",{"_path":589,"title":590,"date":591},"/blog/arduino-uno-q-forza-rev-gauge","A Forza Rev Gauge on the Arduino UNO Q's LED Matrix","2026-04-15",{"_path":593,"title":594,"date":595},"/blog/modular-go-echo-gorm","A Modular Go Web App Pattern with Echo, GORM and golang-migrate","2026-04-05",{"_path":597,"title":598,"date":599},"/blog/self-hosted-ios-web-push","Self-hosting iOS push notifications with Web Push and PWAs","2026-03-25",{"_path":601,"title":602,"date":603},"/blog/lit-web-components-astro-docs","Building a framework-free web component library with Lit and Astro","2026-03-10",{"_path":605,"title":606,"date":607},"/blog/welcome-to-my-blog","About this site","2024-01-15",1781294950570]