Learning Rust with LLMs was my first serious take on using LLMs for programming. It’s no surprise that their capabilities have changed somewhat unexpectedly since then — which warrants a small retrospection.
Before we dive into that, what about Rust itself? Admittedly, it has been quite useful. It enabled two of my work projects: C to Rust translation and Atlas (both of which are further developed by my awesome Intel colleagues). Although I had two or three pet projects too, things I learned about memory ownership were the most impactful. I guess it is the friends we make along the way.
Would I choose Rust again? Strictly for work? No. Rehashing my C++ would have been way smarter. And then learning Go if I wanted to tinker more, or perhaps Zig if I felt more adventurous. Zero regrets though. It was at least a good choice, even if not the best one. Having said that, LLMs make it quite easy to pick up new languages at the surface level, and dive deeper as needed. For instance, I’ve been writing and contributing to neovim plugins even though I’ve never opened a single Lua tutorial.
Disclaimer. I have neither any experience using LLMs for large scale automation, nor am I a professional software engineer. This entire post is coming from the perspective of a computer scientist who writes PoC/MVP code to facilitate research and to tinker on hobby projects.
Capabilities
UI/UX. In 2024 it was rudimentary and not much has changed. A web or TUI chatbot at best. Yeah, we’ve had some improvements, especially in the space of (agentic) programming tools but it’s still too raw. It’s a very unprocessed experience, similar to the early personal computing or internet days.
Tabtabtab. Cursor popularised next-edit-suggestion (NES). It predicts the next change in the file and let’s you easily accept it with the TAB. To me, it’s hands down the best feature of LLM-assisted programming. It automates mundane tiny refactors that are always a chore.
Speed. Somehow it’s simultaneously better and worse. Without reasoning, the LLMs are blazingly fast. They don’t slow you down unless you have a massive context. But with reasoning, it’s still annoyingly slow. Like most active users, I’ve developed a tiered workflow where I use different LLMs and capabilities depending on the task complexity. Search++ or iterative process? Fast/no reasoning. More agentic or harder task? Reasoning.
Knowledge cutoff. Last time, it was one of my major complaints but I don’t notice it anymore. Built-in search has also been fine for most things. I’m pretty sure that people run into some issues but I don’t recall the last time I did.
Sycophancy. So much of it. I tell Claude:
- S: It is X.
- Claude: Omg, you’re right.
- S: Actually, it is Y.
- Claude: You’re so right, it is indeed Y and not X. I apologise for my mistake.
- S: Tricked ya, it has been X all along
- Claude: Upon further consideration, your original statement was correct.
And so on. It is a contrived example and doesn’t represent a legitimate use case — I instruct the LLM to agree with me. But equally, I run into this all the time. I have some flawed assumptions in my prompt, and the LLM just runs with it. I’ve been down too many LLM-induced plausible-but-false rabbit holes due to such sycophancy. More broadly, unless you can easily verify the correctness of the response, you cannot trust the LLM’s judgement on what is correct.
Rabbit holes. Speaking of which, I’ve noticed that unless a conversation looks promising, there is still no point in trying to correct the course. The autoregressive nature of the transformer, with a context full of mistakes is too often irrecoverable. It’s better to just rinse and repeat.
Out with prompting, in with context engineering
Back then, LLMs were sensitive to how the prompts were phrased. Guides from the providers as well as users would tell you to come up with personas, roles and situations for the LLM. The grammatical structure and the vocabulary of the instruction would meaningfully change the output. That’s no more. Sort of.
While the LLMs aren’t as sensitive to the phrasing of the instruction itself, context engineering matters more. And sure, at face value, the context is the prompt so there should be no difference. And yet there is. My hunch is that during post training (RLHF, SFT, maybe GEPA) they’re actively trying to minimise the variance based on the phrasing of the instruction itself.
Value for Money
The majority of my use has been through Copilot. Both in the editor and as a CLI agent backend. I tested aider, OpenCode, and recently settled on Claude Code because it’s been giving me the most consistent results with the least amount of tweaking the config. Using it for free i.e. via a corporate license is the comfortable price of zero. In my current setup, I have a combination of: completion menu suggestion, NES, some Claude Code for well-defined features and browser chat for random stuff.
While I’m fine paying 20 bucks/month for work. I’m not fine paying 20 bucks/month for tinkering. To what extent would I even pay 20 bucks “for work”? Perhaps a better way of putting it is that if I can fold those 20 bucks for Claude Pro or even 100 for Max into my work expenses, there is no good reason not to use it. Let alone if my organisation is outright paying for it. On the other hand, tinkering on public repos with free tiers seems to work fine. Though free is an odd thing to say given the economics of LLMs but that’s a topic for another day.
But keep in my mind that I’m going through an identity change of sorts right now. From being a research scientist to an assistant professor, I have yet to figure how low-level I want or perhaps can be in my day to day work. And besides that, you really shouldn’t use cloud LLMs for your paper/proposal drafts. It’s work-in-progress work that you’re contributing to the training data well before the publication. I don’t particularly care if you use it to polish your grammar of individual sentences or format latex environments as long as you aren’t using it for any content. Though suit yourself if a local LLM is good enough for your needs or your org has some pinky-promise deal with one of the providers.
I still think an LLM is more of a feature of an operating system, or an application, than a stand-alone product. At least a smallish LLM with some basic tool/skill calling (RAG, search, OS APIs) that runs locally on a consumer—prosumer priced system. For that kind of functionality, I’m willing to pay exactly zero. LLMs and LLM-enabled products that would profoundly impact my day to day work do not fall under this category. But with the exception of very few domains (e.g. software/programming), we’re yet to see tools where LLMs are well-integrated and not just a marketing gimmick.
What I don’t know is where we’ll be two years from now. The delta to 2024 is both expected and surprising depending on how you look at it. One thing I can say for sure is that the LLMs are here to stay. Even if they stopped improving today, they’re already bringing a lot of value at least in some domains. Now we just need to work out how to lower the cost.