Local AI: With LM Studio

Introduction

What is a local LLM? When people talk about AI they are often referring to the chatbots such as Claude, ChatGPT or Gemini. These chatbots are user interfaces for one or more Large Language Models. They are huge, advanced, highly trained AI models that sit in the cloud and serve thousands or more people at a time.

A local LLM is a smaller model that generally sits on your own computer or within your local network. These models are significantly smaller and therefore slower and less capable than their larger cousins. They do however have a few advantages that the larger models lack:

  • They are private and secure. They don’t generally send or receive information via the Internet.
  • They cost nothing to run, except for the electricity to power them.
  • They are yours and can be configured any way you want.

There are several different ways to run these local LLMs and several hardware platforms you can use. Software to run local LLMs include Ollama, vLLM and LM Studio. The hardware usually consists of a fast PC running Linux or Windows, a powerful GPU with at least 16GB of VRAM, or a Mac Studio. My preferred platform, and the one I bought for just this purpose is an M4 Max Mac Studio with 64GB of RAM. The fact that it uses very little electricity, is virtually silent, and doesn’t heat my office goes some way towards justifying the cost!

To be honest, go with your favourite platform, or what you already have. It is even possible to run smaller models on a laptop with some success. Keep your expectations realistic and you will have fun, learn a lot and maybe find that you don’t need to rely on subscriptions to the cloud models, or at least not as often.


Above you can see a high-spec PC configured for AI workloads, next to an Apple Mac Studio. Both great machines for what we need.

Getting Started

Once you have identified the hardware that you wish to use for your local LLM and confirmed you have enough space for the models you want to try (local LLM sizes vary from a few Mb to 100Gb+) you need to look at the software you are going to use. 

I started with Ollama and recently moved to LM Studio. I’ve not personally tried vLLM. I found LM Studio to be a good choice and for the time being I intend to stick with it. Screenshots and examples in this (and probably future articles) will be based on LM Studio.

All of the software choices are available to download for free and are all under active development. LM Studio can be downloaded from here:

https://lmstudio.ai/download

Select your operating system and download.

After a simple installation, you can run LM Studio and one of the first things you need to do is download one (or more) models. There are hundreds of different models available and you could start with those that you recognise either the model or the creator by name, or just pick one. This is the model selection window:

There is a lot of terminology relating to the file names of some of these models, often they include the manufacturer, the amount of parameters, the quantisation level etc. No need to worry about any of these for now, but going forward it would pay you to learn what these terms mean and how they could affect your experience.

The most important thing is the size of the model and whether it will operate comfortably within the amount of RAM (usually VRAM on Windows / Linux PCs) you have available. If the selected model will fit comfortably then you will be given the opportunity to download, if not LM Studio will warn you first, and you have the opportunity to download a different model.

The way RAM and VRAM is used differs significantly between PCs and Macs. Basically the Mac uses unified memory, meaning the total RAM is split by the operating system between GPU and CPU. On a PC you will have two values RAM and VRAM, usually you will have an adapter card containing your GPU and it will have a specific amount of RAM, your computer itself will have a separate amount of RAM. This is an important distinction at the moment but hopefully will become less relevant over time as some PC manufacturers are starting to introduce unified memory, or at least making hardware changes that give the appearance and functionality of unified memory.

Note: The GPU is the Graphics Processing Unit and the CPU is the computer’s Central Processing Unit. LLM calculations generally take place on the GPU because it is faster at performing the kind of numerical calculations required by the LLM.

Your First Prompt

Once you have downloaded a model, you need to select it before you can run a prompt against it.

Once you have a model selected, you can type a prompt in the same way that you would using ChatGPT or Claude.

Note that LM Studio is still in development and I have seen issues where you sometimes have to select the model twice in order for LM Studio to realise it’s loaded.

Below you can see two example chats that I tested against this model.


You have much more control over your models in LM Studio than you would have with most commercial models, so feel free to experiment. You can change parameters like temperature, context length, system prompts, and choice of model weights.

Summary

There are many additional things that you can do with your own hosted LLMs, including accessing them across your network or even running these large models across the internet using an integrated Tailscale VPN. In addition, this allows you to run larger models from a computer with much less VRAM, as the compute is carried out on your main computer, where the larger models reside. This technology is quite recent and is called LM Link.

There is also a fascinating technology called ‘Speculative Decoding’ which allows you to pair a smaller LLM (the Draft model) with a larger one. In many circumstances this can significantly increase performance.

AI Prompt: The Goal (Eli Goldratt)

Back in the day I purchased the The Goal by Eli Goldratt. I loved his work and read it several times as well as every other book he’s written. Yeh, I am that sad! I was developing an MRP system at the time so I did have a vested interest in learning this stuff.

I was thinking about this recently given all the opportunities available, where should my efforts be focused? What can I do *now* that will take me forward? I decided to develop a simple prompt and use Claude to evaluate some projects and opportunities and let it advise on how they would effect the bottom line according to the central thesis of The Goal.

First though, for those not familiar with Eli Goldratt and his work, I got Claude to summarise:

Prompt: Summarise The Goal by Eli Goldratt

The Goal by Eliyahu M. Goldratt (co-written with Jeff Cox, first published 1984) is one of the best-selling business books ever written. Here’s a concise summary:

The Story

The main character is Alex Rogo, a plant manager whose factory has an uncertain future. A company executive gives him three months to turn operations around from being unprofitable and unreliable, or face closure. Alex reconnects with a former physics teacher, Jonah, who helps him solve the company’s problems through a series of telephone calls and brief meetings, using the Socratic method — posing questions rather than giving answers.


The Core Idea

The central message is that the primary goal of any business is to make money, and everything within an organisation should contribute to this goal. All improvements should aim to optimise overall system performance, not just individual parts.

Goldratt introduces three key metrics:

Throughput — the rate at which the system generates money through salesInventory — all money invested in purchasing items that will be soldOperational expense — all money used to turn inventory into throughput.


The Theory of Constraints (TOC)

The book’s central framework is the Five Focusing Steps:

  1. Identify the bottleneck or limiting factor in your process.
  2. Exploit the constraint — maximise its output by improving efficiency and eliminating waste.
  3. Subordinate all other processes to support the bottleneck’s output.
  4. Elevate the constraint — if needed, invest to increase its capacity.
  5. Repeat — once one bottleneck is resolved, find and tackle the next.

The Key Insight

Productivity is the act of bringing a company closer towards its goal. The goal is to make money by increasing throughput while simultaneously reducing inventory and operational expense. Fluctuations in a process do not average out — they accumulate. You must optimise the whole system, not just individual processes.


Why It Matters

Although it describes manufacturing operations, The Goal is relevant for all types of situations because it is about learning what makes the world tick so that you can improve it. It’s been required reading on Jeff Bezos’ leadership team and is taught in MBA programmes worldwide.

In short: find your bottleneck, fix it, and keep repeating — everything else is secondary.


So I developed the following prompt to use against Claude:

Prompt: Using the central thesis of The Goal by Eli Goldratt, evaluate the following question and evaluate its effect on the bottom line, ie: Positive Effect, No Effect or Negative Effect. Ask questions if necessary in order to make a judgement: 

I ran this a few times with different scenarios and the questions and the responses were gold. It’s not that it’s difficult to arrive at the same conclusions yourself, but it’s more about Claude asking you questions forcing you to think deeply about the answers. Definitely worth playing with, just add your question or proposal after the : symbol in the prompt.

Using Claude.ai with your Database (Part 2: MCP)

In an earlier post I talked about how to educate Claude about your ERP system or any other database so that Claude is able to accurately generate SQL queries for you based on your plain English input. 

Having SQL knowledge is not necessary, Claude has your back. 

I also talked about additional advantages this can give you such as having Claude explain how a particular part of your database works, for example Stock Control, and also asking Claude to generate interactive dashboards or documentation based around the structure of your database.

I then did a follow up to this post where I discussed and made available the documentation for a script that extracts the schema information from your database and makes it available to Claude as a basic text or markdown file. This is the basis for educating Claude about your database.

At the end of the first article I mentioned that we are able to take this a step further and connect Claude directly to the data in the database. So instead of generating the queries, which you can then execute using the SQL query tool of your choice, you can instead actually retrieve the data output directly into your Claude chat session, making the use of a separate query tool unnecessary.

An additional advantage of doing this is that what Claude cannot infer from the database schema, it will try to infer or verify from the data values in the database in order to further increase its understanding, which in turn increases the quality of its output.

The key to this ability is down to something called an MCP server. Model Context Protocol is an open standard, introduced by Anthropic in late 2024, that provides a universal, standardised way for AI models to connect with external tools, applications, and other data sources. It’s a common language that allows an AI system to communicate consistently with a range of external services, regardless of who built them or how they work behind the scenes.

An MCP Server is essentially a simple, lightweight application that provides the pipe (adaptor) that connects Claude to something else, in this case, your database.

Installing and configuring MCP is beyond the scope of this article, however a few minutes with Google and the ability to execute a couple of commands in the Terminal followed by editing the Claude JSON (claude_desktop_config.json) file and the connection is made. You will of course also need to know an appropriate database logon and password and of course the hostname, database and port of your database server. 

After setting up the MCP server, restart Claude and hopefully your configuration file changes are picked up and executed successfully by Claude. Backup your claude_desktop_config.json file before editing as if your changes are unsuccessful Claude will reset it to a default state, ie: you will lose any other changes you may have made.

Note that if setting up MCP for your users or colleagues, the database logon name and password you use should only allow access to the information within the database that is appropriate for those individuals. This can be configured within the database itself in the traditional way.

Assuming you have opened Claude after you’ve made the changes and Claude hasn’t complained, go to the Settings > Developer section of the Claude desktop application and hopefully you will see something like this:

So now that Claude is successfully connected to your database, give it a whirl. 🙂

Open a new chat and input something like this:


Claude will ask for your permission to use the tools it needs. Select ‘Always allow’ or ‘Allow once’ and Claude will proceed to give you some feedback as it generates an appropriate query and retrieves the data. 

As always with any AI model, verify the results.

As you can see if you aren’t already doing this, it can be game changing for your business. Simple and fast answers to questions you have about your data using plain English instructions rather than SQL queries. 

This doesn’t mean there is no longer a need for people who have that understanding or other programming or technical abilities. In fact those are the very people who are best placed to take the most advantage from the new tools that AI has made available.

Using AI and/or MCP in combination with a programming language is one of the best ways you can seamlessly introduce the power of AI into new or existing business applications. That way you can bring the benefits of AI to your users even if they don’t have access to an AI chatbot within their work environment, giving the Head of IT the ability to control what is sent to, and received from, AI. This level of control isn’t easily available using the standard chatbots. 

If you want a high level of auditing and control over your AI and haven’t got the technical abilities in house to create your own software around AI, or if you don’t wish to go that route, and require additional security guarantees, Copilot from Microsoft can provide this level of control as well as full access to both sides of an AI conversation. Setting up a Copilot solution is something you can choose to do yourself or alternatively you can farm it out to an MSP. MS Copilot can now use Claude as well as OpenAI, however if you are choosing Copilot for its security and wish to use Claude, it’s worth getting up-to-date information on this as Claude can still process your information outside of UK/EU boundaries. Depending on the nature of your business this may or may not matter to you. The latest article I could find (a couple of days ago) is here.

Whether you are using Windows, macOS or Linux on your desktop you can setup MCP and gain the advantages I’ve talked about in this article. If any of the above even slightly interests you then have a go, as long as you start by setting up your database logon with Read Only permissions, you cannot do any harm and you never know, you might be impressed 👍

Future Articles

A forthcoming article will take this a step further and demonstrate how to get great quality, professional looking, downloadable, HTML dashboards instead of seeing the data output as text in the chatbot interface. I’ll demonstrate the techniques you can use to force the output into a design of your choice with appropriate headers and footers as well as your preferred type of charts, fonts etc. I’ll also give some tips on caching, images, models etc. in order to minimise the cost by reducing the amount of tokens used.

Another article will demonstrate some of the tools and techniques you can use to create a cross-platform compiled executable desktop application that leverages a cross-platform scripting language behind the scenes to interact with the AI. All the user sees is an app with a nice GUI.

If there is interest I will also write about how sometimes you can be inspired to create something with AI, but after a couple of hours you decide to do it all from code instead – a cheaper (to run) and sometimes better option.

Using Claude.ai with your Database (Part 1.1)

For those who messaged me asking for further information on the script that I used to create the database schema extract, I’ve made available below the documentation for that script. For those that are interested, AI was involved both in building the original script and in producing the documentation.

Using Claude.ai with your Database (Part 1)

AI can answer a lot of questions with varying levels of accuracy, it is even pretty good at creating SQL queries, but if it doesn’t know your database then it’s a very detached and incomplete experience. In this post I’ll discuss how I overcame this and set Claude.ai up so that it understands my database and can easily create SQL queries that you can immediately try for yourself.

This has another set of advantages as well which I will talk about later in this document.

As with any AI the way to success is all about context. You need to find a way of telling the AI as much as you can about your database or in my case, our ERP system. 

You can of course construct a document that contains this information, which is a good idea and you can reuse the same document over again for future similar queries. This however becomes unwieldy if you want to ask for queries that you have not previously documented as context for the AI.

The database itself contains all the information about tables, views, indexes, foreign keys, constraints etc. that Claude will need so that it can write accurate queries for you.

Our ERP system has 1000+ tables and eleventy million other objects, there way no way I was going to even attempt to do anything manually. So I wrote a Python script and ran it against the database and the script generates a schema file containing everything the database knows about itself.

The image shows a small section of the schema file. Obviously for often used queries you can extract just the necessary information from this file and save it as a much smaller specific schema.

Now I’ve got this I just need to upload the schema file to Claude and ask it to generate a SQL query for me.

‘using the uploaded schema generate me a sql query to show me top customers by order value in 2026’

It will then give you some feedback before building the required query.

This is the query that was built by Claude:

It also added some notes:

This query was accurate and pasted into SSMS it ran first time.

I mentioned that there are other advantages of showing Claude a complete picture of your database. Once Claude has that picture you can then ask it to explain how part of the database works, or ask it to create reference documents for you to use.

Below are a three screenshots of interactive HTML tools I asked Claude to generate from the uploaded schema:

Claude will not always be 100% accurate with SQL queries that it generates and it may make assumptions about things which are not obvious from the schema file and ideally needs access to the data itself.

I’ll go through this in the next post.

AI: Is Society Ready?

It’s interesting that despite the many differences of opinion regarding AI, few people acknowledge the fact that potentially it will turn society on its head, and we are not ready for it. 

I’m not talking about AGI and the dystopian future predicted by many, although I admit that is a possibility. I’m talking about the inevitable shift in jobs and society caused by the use of AI and AI Agents.

For many decades much of the traditional manual work has been taken over by machinery and automation, think car plants and their robot assembly lines. We were told that the way to get on in life and achieve long term job security was to become a knowledge worker. 

Knowledge workers don’t need overalls and hard hats, we need a computer keyboard and a mouse.

Some of these roles include:

  • Personal Assistant
  • Graphic Designer
  • Architect
  • Computer Programmer
  • Accountant
  • Marketing Person
  • Data Entry Clerk
  • Travel Agent
  • Customer Services Person
  • Call Centre Operative

AI is waiting in the wings to replace many of those jobs. Not totally, at least not initially, but in the near future the need for humans in those roles will be vastly decreased. A Customer Services Manager may find themselves in charge of 20 AI agents, rather than 8 humans.

Look at the current capability of human type robots. If you are an Electrician, a Gardener, a Plumber or a Builder then you will be safe for many years as robotics companies are finding it extremely difficult to create a robot that has the capability and dexterity to perform those roles. 

The tide has turned. Knowledge workers are the those ones that now have their jobs threatened by technology and the skilled manual workers are sitting pretty, for a while at least.

For those not convinced, remember that AI doesn’t need to be able to replace you, it just needs to convince your boss that it can …

Artificial Intelligence: Badly Named?

It’s often said that it’s all in the name. Naming is extremely important whether variable naming in Software Development or naming a particular technology itself.

Artificial Intelligence or Synthetic Intelligence as it is often called is in my opinion badly named, as the words Artificial or Synthetic imply that the noun Intelligence is achieved, only through a non natural process.

I believe a better name would be Simulated Intelligence, as this implies something that imitates intelligence, rather than achieves it.

I’ve not seen this articulated elsewhere so it might be that many, or most, would disagree. I stand by it however, but until we are able to adequately describe what intelligence actually is, it may well be a moot point.