This article is part of a miniseries:
- Learning Rust with Large Language Models (Part I): a Project for 2024
- You’re reading it.
- Learning Rust with Large Language Models (Part III): Finding a Needle in a Haystack.
There’s a GitHub repo associated with this series. You can find more fine-grained examples of the interactions there.
The web is flooded with tutorials written by beginners. Beginners who still are or have just recently finished learning something.
For the most part, they’re revisions of examples or exercises found in coursebooks or some documentation. Bootstrap a web-server in Typescript with NodeJS, train an MNIST model using PyTorch, corrupt all races using the Rings of Power and conquer Middle Earth. Basic stuff.
Such exercises are great for learning. It’s less great that there are billion copy-paste YouTube videos and Medium articles on, literally, the same beginner topic. And far from great that they all end up in the training set of a large language model (LLM).
Disclaimer: findings based on ChatGPT-3.5!
How did it go then?
What’s your relationship to ChatGPT?
It’s worth reiterating that I’m not using ChatGPT to solve anything directly. For me, it’s a combined search-tutor-senior-colleague that I ask for guidance and explanations.
It reflects the intent of this series — are LLMs effective tutors in 2023/24? What I am not trying to do is ask the LLM for solutions to programming exercises. I know that it can give me (mostly) working implementations of well-studied algorithms and solutions to common homework problems, and it’s likely to struggle with anything custom. It will give me a snippet of code that produces a simple landing page, or a web server with two routes that I described. I don’t want that.
Also, I originally intended to have one post on exploring the basics, and another one on the Advent of Code. Buuut, I just jumped headlong into it and they sort of merged. Hence, findings in this post are around 10% about the basics, and 90% about the Advent of Code.
Where it shines
I think the single, most differentiating factor is the fact that an LLM has a context. The entire conversation is stored, instead of starting from a blank page for each message — unless you start a whole new chat.
This means that you can refine your query because it’s contextualised — “you gave me an example of X but I actually meant Y”. You can think of it as a search query that is aware of the previous ones, instead of being completely separate. If I want to look up one more related thing, I can do just that “and how about Z?” It isn’t like I’ve just discovered that but I’ve never needed to use it that way. I really appreciate this feature now.
Oddly enough, the fact that I don’t need to prepend “rust” to every prompt, like I have to for every query in the search engine, is just one less source of friction too.
Where it lacks
Firstly, it’s just too slow. It’s like watching a download progress bar. It’s particularly annoying when I’m waiting for a code block to generate. What makes it interesting is that it might still be faster than search — but it doesn’t feel like it. With search, I hit enter, I get a couple of results, and I can quickly skim to figure out if any of them look promising. Then, I’d click the link, it would load, and I would get an answer; right or wrong. With an LLM, I have to wait for the completion to be able to tell — the upfront cost is too high. I think groq is a great example of how fast LLMs should be, to offer seamless user experience.
Then, it hallucinates; simple as that. Language features or things in libraries that don’t exist. Hallucinations are confusing because I don’t know if the LLM made a mistake, and the example could be fixed, or it’s coming up with stuff. Or perhaps, it’s just giving me outdated info that works differently on the latest version of the library that I’m trying to use today.
Which brings me to the next two points. So. Many. Mistakes. I don’t have to explain to you why mistakes are bad in a vacuum. Still you might be thinking “everyone makes mistakes”, or “if you copy code from stackoverflow or somewhere it might have mistakes”. You would be right. And that’s why my point of reference has been search-tutor-senior-colleague. Within this framing, the LLM is making an unacceptable amount of mistakes. Imagine that your child gets maths tutoring, and the tutor gives them factually wrong information — you’d send them to a different tutor. Or you’re pair-programming with a, supposedly, experienced colleague that can only scratch the surface of the subject matter. Finally, it’s more work to process incorrect information, and figure out if it is indeed wrong, then not to get an answer in the first place.
Subsequently, it has outdated information.
Old documentation examples, versions of libraries that went through API changes, completely lacking new information on new frameworks.
I understand why it happens, they have to cut off the training data at some point.
But from the user experience perspective, it’s underwhelming.
For instance, I wanted to use clap to parse CLI arguments, and the version that I got was from 2021.
Similarly, I’m trying to use ratatui-rs to build a TUI app.
ChatGPT isn’t aware of its existence because ratatui
is a fairly recent fork of an abandoned library tui-rs.
Lastly, cue the title, everything that I get from the LLM is very entry level. It’s all introductory examples after introductory examples. Even if it’s a complicated topic, it’s still introductory. Well, what does it mean not be introductory then? It’s about providing depth and reasoning. I don’t want a piece of code explained line-by-line — this isn’t depth. I want to know why things are there, why in that order, what the alternatives, and consequently trade-offs, are. To get there, I need to resort to documentation, read a book or a paper. I’m aware that with this point of criticism, I’m asking quite a lot. Many people are perfectly fine without that much detail; heck, I often don’t need it either. When I do, I want it available. That is what would be useful to me, in my work — I’d like to make my life easier, not just replace my workflow with shiny tech.
Takeaway for now
Unless you really value the conversational UX, don’t bother learning Rust with an LLM. Perhaps it’s better for other programming languages but I’m not planning to redo this project, say in Go.
I’m not done with it just yet though. For the remainder of it, I’m going to switch to GPT-4, the paid variant that is supposed to be better and faster. It might change my mind on some of the caveats. Same drill, trying to explore some frameworks, and build nifty stuff using GPT-4 as the primary source of information.
One thing that’s worth highlighting again is that contextualised refinement. It’s noticeably less friction than crafting standalone search engine queries.
PS on prompt magic & engineering
It’s entirely possible that some of my prompts could have been better, and thus, result in better responses. There’s a parallel to be drawn to search engines — “my grandma doesn’t know how to Google”. While true, I think it’s different. Effective “googling” has been about stripping the query down to the most essential keywords. Prompt engineering requires you to come up with weird, roleplaying incantations that prime the LLM. It’s a lot of trial and error until you get it right, which is made worse by the slow generation.