AI is great at producing documentation. Point it at a folder of text and images and get it to make sense of the contents and produce documentation, or better yet, as a Developer, point it at a folder of source code and get it to document it. You may even learn something about your own software that you had forgotten 🙂 It may even point out flaws or unexpected side effects, according to a friend, this has never happened to me … moving swiftly on …
I’ve developed a very simple prompt that I use which gives me great results and I usually use Claude Code with the Sonnet 4.6 model. However as local AI is improving rapidly I thought it would be fun and interesting to compare the output from Claude Code with that of a couple of local LLMs. It may currently only take a dollar or two to use Claude Code for this task but given that prompt costs are currently heavily subsidised, and even in 2026 it’s sometimes possible to be without a stable internet connection, it makes sense to try alternatives.
When using local LLMs for a job like this I am currently testing Cline (https://cline.bot/). This gives a Claude Code type experience within VS Code. A terminal version of Cline is also available but I had some TUI effects that I’ve not yet bothered to get to the bottom of. The VS Code installation however was flawless.
The Prompt
The prompt I used was this:
Write a professional and in depth combined user and developer documentation for this application. Output as a self contained html file named: appname_Documentation saved to the project/Documentation directory. Use graphs and charts if possible to make the document as visual as possible. Avoid dark backgrounds and if text is on a dark background make the contrast significant for easy readability.
The Project
The project I was using to test the documentation creation is a simple Python application;
PyTrain1 is a lightweight, command-line Python application that automates a three-stage data extraction pipeline. It connects to a Microsoft SQL Server database, runs a SELECT query against a customer table, and exports the results to a CSV file on the local machine, all in a single run with no user interaction required.
The application serves as a training exercise in combining SQLAlchemy, pyodbc, and pandas to extract and export relational data with minimal code. It runs as a single script with no web server, background process, or GUI component.
And yes, AI did write the above description, I couldn’t have done it better myself.
The Computer
2025 Mac Studio Max with 64 Gb RAM, 16 CPU cores and 40 GPU cores.
First Test: Sonnet 4.6

Second Test: Qwen3.6-35b-a3b-mlx
The image below shows the LLM loaded into LM Studio.

The image below is Cline running against Qwen in VS Code.

In this image, below, you can see the activity on my computer, with the memory usage and GPU usage shown plainly. Note I was also running a Windows 11 VM at the time.

Third Test: Gemma 4 3b
Fourth Test: Llama 3.3 70b Instruct
The Results

The results shown above don’t tell the whole story. The instant take away (as expected) is that Claude produced the highest quality documentation. What I didn’t expect is how good the output from Qwen was, and also how fast it was compared to Claude.
I expected better from Gemma. Gemma named the output file incorrectly, gave great feedback during the process, finished quickly but gave only a few lines of output. So, a failure. I will show some sample output later in this document, at least for those that created output … this brings me to Llama.Â
Llama was not happy. I left it running for 15 minutes or so during which time it had numerous issues and errors and produced some strange commentary including telling me the temperature and humidity in San Francisco! I’m in the UK. I don’t know whether it had an issue with Cline but a big difference is that unlike the other local LLMs it is only a 4 bit quantisation so I expected it to perform slightly worse despite it’s higher parameter count. However in the end it was a complete failure at this task.
Sample Output from Sonnet

Sample Output from Qwen

Total Output from Gemma

Summary and Conclusion
Claude Code with Sonnet produces the highest quality documentation of all the models I tested. There is a price and time penalty however. The right local LLM that costs nothing to run and performs quickly is capable of creating accurate and useful documentation with no fuss and drama, and it will run on your (reasonably specced) laptop or desktop.
Local LLMs were considered a joke for real work not so long ago. I’m convinced the future is hybrid, local LLMs for most tasks with cloud models for the really tough stuff. If you’re interested run your own tests and see for yourself.Â
Follow up Article
I have written a cross platform chat style application that allows me to switch between cloud and local models and see the difference between their outputs for the type of questions users often type in to ChatGPT or Claude. I can also experiment with pre-loading the prompt with a Speciality or a Personality as well as integrate a Prompt Library and pre-defined Information Layers. I’ll cover the results of these experiments in a future article. You can see the application in the image below.
