Database – stevencholerton.com

Creating Documentation with AI

AI is great at producing documentation. Point it at a folder of text and images and get it to make sense of the contents and produce documentation, or better yet, as a Developer, point it at a folder of source code and get it to document it. You may even learn something about your own software that you had forgotten 🙂 It may even point out flaws or unexpected side effects, according to a friend, this has never happened to me … moving swiftly on …

I’ve developed a very simple prompt that I use which gives me great results and I usually use Claude Code with the Sonnet 4.6 model. However as local AI is improving rapidly I thought it would be fun and interesting to compare the output from Claude Code with that of a couple of local LLMs. It may currently only take a dollar or two to use Claude Code for this task but given that prompt costs are currently heavily subsidised, and even in 2026 it’s sometimes possible to be without a stable internet connection, it makes sense to try alternatives.

When using local LLMs for a job like this I am currently testing Cline (https://cline.bot/). This gives a Claude Code type experience within VS Code. A terminal version of Cline is also available but I had some TUI effects that I’ve not yet bothered to get to the bottom of. The VS Code installation however was flawless.

The Prompt

The prompt I used was this:

Write a professional and in depth combined user and developer documentation for this application. Output as a self contained html file named: appname_Documentation saved to the project/Documentation directory. Use graphs and charts if possible to make the document as visual as possible. Avoid dark backgrounds and if text is on a dark background make the contrast significant for easy readability.

The Project

The project I was using to test the documentation creation is a simple Python application;

PyTrain1 is a lightweight, command-line Python application that automates a three-stage data extraction pipeline. It connects to a Microsoft SQL Server database, runs a SELECT query against a customer table, and exports the results to a CSV file on the local machine, all in a single run with no user interaction required.

The application serves as a training exercise in combining SQLAlchemy, pyodbc, and pandas to extract and export relational data with minimal code. It runs as a single script with no web server, background process, or GUI component.

And yes, AI did write the above description, I couldn’t have done it better myself.

The Computer

2025 Mac Studio Max with 64 Gb RAM, 16 CPU cores and 40 GPU cores.

First Test: Sonnet 4.6

Second Test: Qwen3.6-35b-a3b-mlx

The image below shows the LLM loaded into LM Studio.

The image below is Cline running against Qwen in VS Code.

In this image, below, you can see the activity on my computer, with the memory usage and GPU usage shown plainly. Note I was also running a Windows 11 VM at the time.

Third Test: Gemma 4 3b

Fourth Test: Llama 3.3 70b Instruct

The Results

The results shown above don’t tell the whole story. The instant take away (as expected) is that Claude produced the highest quality documentation. What I didn’t expect is how good the output from Qwen was, and also how fast it was compared to Claude.

I expected better from Gemma. Gemma named the output file incorrectly, gave great feedback during the process, finished quickly but gave only a few lines of output. So, a failure. I will show some sample output later in this document, at least for those that created output … this brings me to Llama.

Llama was not happy. I left it running for 15 minutes or so during which time it had numerous issues and errors and produced some strange commentary including telling me the temperature and humidity in San Francisco! I’m in the UK. I don’t know whether it had an issue with Cline but a big difference is that unlike the other local LLMs it is only a 4 bit quantisation so I expected it to perform slightly worse despite it’s higher parameter count. However in the end it was a complete failure at this task.

Sample Output from Sonnet

Sample Output from Qwen

Total Output from Gemma

Summary and Conclusion

Claude Code with Sonnet produces the highest quality documentation of all the models I tested. There is a price and time penalty however. The right local LLM that costs nothing to run and performs quickly is capable of creating accurate and useful documentation with no fuss and drama, and it will run on your (reasonably specced) laptop or desktop.

Local LLMs were considered a joke for real work not so long ago. I’m convinced the future is hybrid, local LLMs for most tasks with cloud models for the really tough stuff. If you’re interested run your own tests and see for yourself.

Follow up Article

I have written a cross platform chat style application that allows me to switch between cloud and local models and see the difference between their outputs for the type of questions users often type in to ChatGPT or Claude. I can also experiment with pre-loading the prompt with a Speciality or a Personality as well as integrate a Prompt Library and pre-defined Information Layers. I’ll cover the results of these experiments in a future article. You can see the application in the image below.

Using Claude.ai with your Database (Part 2: MCP)

In an earlier post I talked about how to educate Claude about your ERP system or any other database so that Claude is able to accurately generate SQL queries for you based on your plain English input.

Having SQL knowledge is not necessary, Claude has your back.

I also talked about additional advantages this can give you such as having Claude explain how a particular part of your database works, for example Stock Control, and also asking Claude to generate interactive dashboards or documentation based around the structure of your database.

I then did a follow up to this post where I discussed and made available the documentation for a script that extracts the schema information from your database and makes it available to Claude as a basic text or markdown file. This is the basis for educating Claude about your database.

At the end of the first article I mentioned that we are able to take this a step further and connect Claude directly to the data in the database. So instead of generating the queries, which you can then execute using the SQL query tool of your choice, you can instead actually retrieve the data output directly into your Claude chat session, making the use of a separate query tool unnecessary.

An additional advantage of doing this is that what Claude cannot infer from the database schema, it will try to infer or verify from the data values in the database in order to further increase its understanding, which in turn increases the quality of its output.

The key to this ability is down to something called an MCP server. Model Context Protocol is an open standard, introduced by Anthropic in late 2024, that provides a universal, standardised way for AI models to connect with external tools, applications, and other data sources. It’s a common language that allows an AI system to communicate consistently with a range of external services, regardless of who built them or how they work behind the scenes.

An MCP Server is essentially a simple, lightweight application that provides the pipe (adaptor) that connects Claude to something else, in this case, your database.

Installing and configuring MCP is beyond the scope of this article, however a few minutes with Google and the ability to execute a couple of commands in the Terminal followed by editing the Claude JSON (claude_desktop_config.json) file and the connection is made. You will of course also need to know an appropriate database logon and password and of course the hostname, database and port of your database server.

After setting up the MCP server, restart Claude and hopefully your configuration file changes are picked up and executed successfully by Claude. Backup your claude_desktop_config.json file before editing as if your changes are unsuccessful Claude will reset it to a default state, ie: you will lose any other changes you may have made.

Note that if setting up MCP for your users or colleagues, the database logon name and password you use should only allow access to the information within the database that is appropriate for those individuals. This can be configured within the database itself in the traditional way.

Assuming you have opened Claude after you’ve made the changes and Claude hasn’t complained, go to the Settings > Developer section of the Claude desktop application and hopefully you will see something like this:

So now that Claude is successfully connected to your database, give it a whirl. 🙂

Open a new chat and input something like this:

Claude will ask for your permission to use the tools it needs. Select ‘Always allow’ or ‘Allow once’ and Claude will proceed to give you some feedback as it generates an appropriate query and retrieves the data.

As always with any AI model, verify the results.

As you can see if you aren’t already doing this, it can be game changing for your business. Simple and fast answers to questions you have about your data using plain English instructions rather than SQL queries.

This doesn’t mean there is no longer a need for people who have that understanding or other programming or technical abilities. In fact those are the very people who are best placed to take the most advantage from the new tools that AI has made available.

Using AI and/or MCP in combination with a programming language is one of the best ways you can seamlessly introduce the power of AI into new or existing business applications. That way you can bring the benefits of AI to your users even if they don’t have access to an AI chatbot within their work environment, giving the Head of IT the ability to control what is sent to, and received from, AI. This level of control isn’t easily available using the standard chatbots.

If you want a high level of auditing and control over your AI and haven’t got the technical abilities in house to create your own software around AI, or if you don’t wish to go that route, and require additional security guarantees, Copilot from Microsoft can provide this level of control as well as full access to both sides of an AI conversation. Setting up a Copilot solution is something you can choose to do yourself or alternatively you can farm it out to an MSP. MS Copilot can now use Claude as well as OpenAI, however if you are choosing Copilot for its security and wish to use Claude, it’s worth getting up-to-date information on this as Claude can still process your information outside of UK/EU boundaries. Depending on the nature of your business this may or may not matter to you. The latest article I could find (a couple of days ago) is here.

Whether you are using Windows, macOS or Linux on your desktop you can setup MCP and gain the advantages I’ve talked about in this article. If any of the above even slightly interests you then have a go, as long as you start by setting up your database logon with Read Only permissions, you cannot do any harm and you never know, you might be impressed 👍

Future Articles

A forthcoming article will take this a step further and demonstrate how to get great quality, professional looking, downloadable, HTML dashboards instead of seeing the data output as text in the chatbot interface. I’ll demonstrate the techniques you can use to force the output into a design of your choice with appropriate headers and footers as well as your preferred type of charts, fonts etc. I’ll also give some tips on caching, images, models etc. in order to minimise the cost by reducing the amount of tokens used.

Another article will demonstrate some of the tools and techniques you can use to create a cross-platform compiled executable desktop application that leverages a cross-platform scripting language behind the scenes to interact with the AI. All the user sees is an app with a nice GUI.

If there is interest I will also write about how sometimes you can be inspired to create something with AI, but after a couple of hours you decide to do it all from code instead – a cheaper (to run) and sometimes better option.

Using Claude.ai with your Database (Part 1.1)

For those who messaged me asking for further information on the script that I used to create the database schema extract, I’ve made available below the documentation for that script. For those that are interested, AI was involved both in building the original script and in producing the documentation.

xSchema Documentation Download

Using Claude.ai with your Database (Part 1)

AI can answer a lot of questions with varying levels of accuracy, it is even pretty good at creating SQL queries, but if it doesn’t know your database then it’s a very detached and incomplete experience. In this post I’ll discuss how I overcame this and set Claude.ai up so that it understands my database and can easily create SQL queries that you can immediately try for yourself.

This has another set of advantages as well which I will talk about later in this document.

As with any AI the way to success is all about context. You need to find a way of telling the AI as much as you can about your database or in my case, our ERP system.

You can of course construct a document that contains this information, which is a good idea and you can reuse the same document over again for future similar queries. This however becomes unwieldy if you want to ask for queries that you have not previously documented as context for the AI.

The database itself contains all the information about tables, views, indexes, foreign keys, constraints etc. that Claude will need so that it can write accurate queries for you.

Our ERP system has 1000+ tables and eleventy million other objects, there way no way I was going to even attempt to do anything manually. So I wrote a Python script and ran it against the database and the script generates a schema file containing everything the database knows about itself.

The image shows a small section of the schema file. Obviously for often used queries you can extract just the necessary information from this file and save it as a much smaller specific schema.

Now I’ve got this I just need to upload the schema file to Claude and ask it to generate a SQL query for me.

‘using the uploaded schema generate me a sql query to show me top customers by order value in 2026’

It will then give you some feedback before building the required query.

This is the query that was built by Claude:

It also added some notes:

This query was accurate and pasted into SSMS it ran first time.

I mentioned that there are other advantages of showing Claude a complete picture of your database. Once Claude has that picture you can then ask it to explain how part of the database works, or ask it to create reference documents for you to use.

Below are a three screenshots of interactive HTML tools I asked Claude to generate from the uploaded schema:

Claude will not always be 100% accurate with SQL queries that it generates and it may make assumptions about things which are not obvious from the schema file and ideally needs access to the data itself.

I’ll go through this in the next post.

Databasics: 1 – Primary Keys

There is an ongoing debate among database experts regarding the design of a Primary Key. A debate that in my opinion should have been done and dusted a long time ago.

Note: A Primary Key is a piece of data contained in a database Column that uniquely identifies the database Row. This is the same as how a National Insurance Number uniquely identifies us to the authorities in the UK, or how a soldiers Service Number uniquely identifies then within the Military. If you need to View, Update or Delete an existing database record then it is essential that you can uniquely identify it.

Two Main Schools of Thought

The first says that the Primary Key should be a valid piece of information in it’s own right – not just an identifier. Like a name for example. In the West we use a Surname which identifies us when amongst other people, most of which will hopefully have a different surname. In situations where that is not true, for example family gatherings, the first name can be used as well as a means of narrowing this down. It can be difficult to build up a unique piece of information using valid information.

The second school of thought acknowledges the problems of the above solution and solves these issues by allowing a non meaningful Unique Identifier whose sole purpose is to be able to identify uniquely within any amount of similar items. This is basically what we have with Military Service Numbers and National Insurance Numbers.

My Preference

My preference is with the second school of thought and in fact you can easily adopt this strategy with most Database Engines using the Auto Increment option on the Column. This lets the Database Engine itself take care of generating a Unique, Non Reuseable Identifier.

I always use the first Column of my Database Table as my Primary Key and name it:

Many of my earlier databases used an incrementing number as the primary key, a number that was unique within the table. One particular system used a number that was unique everywhere within the whole database, the theory being that it would make it easy to see the order of inserts across multiple tables. I never found this to be needed however and I never used it again.

One of the downsides of using an incrementing numeric value as a primary key (if this key is used as a foreign keys elsewhere within the database) is if you have to export and reimport data following a database issue.

You also get issues when you are operating data on a remote database in an offline state, that then needs to be reconciled and synchronized back to the main database.

My current way of thinking is that instead of an incrementing numeric value, I would instead use a GUID instead.

Consistency and Structure

All my Database designs use the same structure in order to build consistency, something which is not fully appreciated until you have to work with legacy databases which haven’t been built with consistency, structure or maintainability in mind.

Another example of consistency and structure; the second column of every Database Table I design is always updguid.

This column contains another identifier, however this one changes with every edit or update of the database record. This is used so that I can find out if the data I am viewing on my screen has actually since been updated elsewhere by someone else.

A comparison between the value of the updguid I have in memory and the value of the one stored in the database is all that is needed to determine the validity of the information I am viewing. If the information is stale I have several options I can pursue. This all is part of my Record Locking strategy, covered in another Databasics post soon 🙂