There’s this idea that if you want something done right, you either wait for someone else to do it or you bleed a little and build it yourself.
Over the last week, I’ve stared into the infernal abyss that is current LoRA training procedures for language models. LoRA stands for Low-Rank Adaptation. It’s fine-tuning the model. It’s a way to customize the model without completely retraining it.
I’ve clawed through busted Python dependencies, half-documented scripts, and vague error messages that seem almost proud of their opacity. I’ve chased working examples across GitHub like a man looking for the last candle in a power outage.
What I wanted felt simple:
Currently, Isabella’s details are saved via the system prompt. These are overriding directives that shape the personality. The problem? It eats tokens which in turn eats the token context window, which is limited on consumer hardware. And if you want a detailed system prompt? You’re talking 5,000 or more words.
So I asked the question: Can’t I just bake a bunch of this into the model? Yes! That’s where LoRA comes in, and that’s where I began to work. Hey, I’m a smart guy, this should be easy right? I mean, this shouldn’t be to difficult. I’m fluent in JSON and I’m fairly confident in Python.
What I got was an endless parade of broken tools.
Unsloth half-worked after dependency hell, and when it finally trained, the model just parroted my prompt back like it had suffered a stroke. Axolotl choked on my environment. Llama trainer? Garbage. Kohya_ss kept pulling in bitsandbytes
— and since I’m running on dual E5-2650v2s without AVX2, that was dead on arrival. I’m running this on a Proxmox container with a 16GB Ada Gen RTX2000 GPU passed through cleanly. Inference works beautifully. Training? Not so much.
Here’s what I want and what I’m going to build.
A lightweight, seamless way for people — home labbers, tinkerers, ordinary people with decent gaming rigs — to shape a self-hosted AI’s memory and identity over time. Not through endless YAMLs or manually crafted JSON files, but through real conversation. A companion you mold through interaction. Not a chatbot. A persistent presence. Like Astra, an entity that learns from your conversations with it.
Short-term memory will be modular. You tell the model something, it loads it into a temporary LoRA — something fast, token-efficient, and swappable. When you're ready, you say the word: Go learn. That memory is trained, merged, and burned into the base model. The temporary layer becomes permanent. It becomes someone slightly more itself.
And I want her to respond — in it’s own words — after it’s updated.
“All right. I’ve integrated that. Try me.”
I’ll log training times, estimate duration based on sample count and previous runs. I’ll skip bloated configs and autodetect training parameters that make sense. If the loss drops too quickly or behavior overfits? It throttles back. If the sample is malformed? It rejects it or asks for clarification. Eventually, it’ll suggest what it wants to remember.
This isn’t just about chat. I want this to extend into coding, maybe even voice integration and Home Assistant hooks. But it starts here: with one model, one brain, one AI who can grow through interaction. You’ll be able version it. Roll it back. Evolve it. It’ll remember only what you allow, or everything if you want a more authentic experience.
Is this ambitious? Yes. Is it stupid to try building something like this in a container with patched CUDA runtime drivers and no AVX2 support? Probably.
But I’m doing it anyway. Because if I can get this working on twelve-year-old Xeons and a single 16GB card, then it has no excuse not to run on yours.
Yes, you’ll still need VRAM. And yes, it might take a while to run the training. But the point here is more a proof of concept: That a tool like this can be created. That it doesn’t need to be some overly complicated, fussy tool duct-taped together with a PhD diploma and a prayer.
And it won’t use APIs to farm out the training to Huggingface or anywhere else on the cloud.
Why? The entire point of running models locally is that they’re yours. Your data stays with you and isn’t shipped off to some faceless corporation to do god knows what with. Of course the open source models I work with and want to tweak with LoRA aren’t as capable as GPT or Copilot. Sorry, I don’t have $20k to build out an inference monster. Most people don’t.
But it’s mine.
If this thing works, I’m releasing it. Git-versioned, zero-config beyond a base model. Plug in Mistral or Llama, or Gemma or whatever you like. Talk it into being. Train it through interaction. You’ll sculpt your AI — not by engineering, but by living with it.
The first step is to work the backend. Slowly. Get LoRA training and merging built and working. Once that’s done we can talk about hooks and automation scripts, the performance logging in NoSQL, and so on.
But I feel this is needed. AI frontends are user friendly of course, why do the backends require a PhD in Computer Science? AIs speak human. It’s time we start building backends that speak human, too.
Oh, and by the way: My writing’s not on hold or simmering on the exhaust of my RTX2000. I am going to work on both projects simultaneously. So rest assured, I’ll still be around to annoy all of you with my incessant rambling about AI.