LLM Controlled NPCs

A fun little Unity app that enables interactive and dynamic conversations with NPCs. The objective of this project is to ensure it runs locally within the constraint of just 8GB of VRAM.

Player input is sent to Oobabooga text-generation-webui, which runs a fine-tuned Mistral 24B model. The model's response is parsed in Unity, then passed to ComfyUI to generate voice clips using KokoroTTS. These clips are returned to Unity and played back for the NPC's dialogue. At the same time, the Oobabooga response is parsed for animation data, triggering the appropriate NPC animations in sync with the spoken lines.

To improve responsiveness, response tokens are streamed directly to Unity as they are generated. Once enough words accumulate at natural breakpoints, the partial sentence is sent to produce voice clips. This streaming approach allows players to begin reading text and hearing audio even before the full response has finished generating.

Version 1

22/2/2026 Update: Version 2 - Leo the Adventurer

Changed to a 12B model to make text generation faster. Slight degradation to output and instruction following, but still within acceptable limits considering the speed increase.

Added lipsync and implemented emotions and facial expressions using IOS ARKit blendshapes. Spent one weekend using Face Capture app to project my face into Unity editor and recorded the blendshape values for each visme and facial expression - a tedious process.

Emotions, actions and speech are all driven by the LLM - running locally on 8GB VRAM

Version 2

1/3/2026 Update: Version 3 - Leo is now aware of his surroundings and can interact and move within the environment.

NPCs are now aware of their surroundings. Since LLMs excel at processing text, my approach is to to let NPCs learn and interact with their environment as if they were players in a text-based adventure game like Zork.

After discovering that the LLM's reasoning capabilities allow it to plan multiple steps toward a goal, I updated the code to support handling several actions within a single response. As a result, Leo can now execute multi-step instructions. Also notice how Leo relocated the cheese from the barrel to the well and demonstrated the ability to retrieve it from the new location. One step closer to Skynet.

Version 3

8/3/2026 Update: Version 4 - Qwen 3.5

With the release of Qwen 3.5, I wanted to test whether a smaller model could handle my use case. I switched from Oobabooga to LM Studio since Oobabooga doesn't yet support Qwen 3.5. Unfortunately, the Qwen 3.5 4B model hasn't delivered strong enough results.

To improve performance, I tried implementing a Task Manager that uses the LLM to evaluate whether an NPC's assigned task was completed. After each response, the Task Manager checks if a task was given, then determines whether it was completed, requires additional steps, or is impossible to finish. However, when running on the 4B model, the Task Manager itself produced faulty logic. This was disappointing, since the 4B model could generate responses almost instantly.

By contrast, Qwen 3.5 9B runs at about 1.5x the speed of a 12B model, with a slightly lower capability of following instructions compared to 12B.

Update: Oobabooga 4.0 just released with support for Qwen 3.5. It also feels like 12B is also running faster on this new version.

Version 4