Getting Started with Local AI: An Introduction to Ollama CLI and API
Explore Ollama: a comprehensive guide detailing installation, fundamental CLI commands, API usage, and a practical API demonstration for building chatbots, emphasizing local AI's capabilities and privacy.
Getting started with Ollama
What is Ollama?
Ollama is a lightweight, extensible framework that dramatically simplifies the process of downloading, setting up, and running LLMs on your local machine. It bundles model weights, configurations, and data into a single package, managed by a Modelfile. With Ollama, you can be up and running with open-source models like Llama, Mistral, and Phi in a matter of minutes (depending on your download speed, but you only need to download once).
Installation
Ollama provides a simple, one-click installation for macOS and Windows, and a single command for Linux.
Download the installer from the official website of Ollama https://ollama.com/download.
Run the following command in your terminal:
curl -fsSL https://ollama.com/install.sh | shOnce installed, Ollama will run in the background, ready to receive commands.
Ollama CLI
The Ollama Command Line Interface (CLI) is the primary way to interact with the framework.
Accessing the CLI
After installation, you can interact with Ollama through your system's command line interface.
Open the Command Prompt or PowerShell. You can find these by searching for them in the Start Menu.
Open the Terminal app. You can find it by searching for it using Spotlight/Application launcher.
If you're using Linux, you should already know how to open your terminal. You can enable and start Ollama as a systemd service for automatic startup and background operation.
sudo systemctl enable ollama --nowOr only start the service when you want to with:
sudo systemctl start ollamaOnce you have your terminal open, you can start using the ollama commands. Here are the essential commands to get you started.
Pulling a Model
Before you can use a model, you need to download it from the Ollama library. Let's pull phi, a good small model for getting started.
ollama pull phiYou can find a list of available models on the official website of Ollama https://ollama.com/search
Running a Model
To start a chat session with your downloaded model, use the run command.
ollama run phiYou can now chat with the model directly in your terminal. To exit, type /bye.
Listing Models
To see all the models you have downloaded, use the `list` command.
ollama lsThis will show you a table of your local models, their size, and when they were last updated.
Ollama API
While the interactive chat is useful, the real power of Ollama comes from its ability to be integrated into other applications. Ollama provides a local REST API that runs automatically on port 11434.
We can use a simple curl command to send a request to the API for a one-off text generation task.
A Note on `curl` for different Operating Systems
The following command should work as is in your terminal.
The command is the same.
You may need to escape the double quotes within the JSON payload or save the JSON to a file.
Once you have your terminal open, you can start using the ollama commands. Here are the essential commands to get you started.
curl http://localhost:11434/api/generate -d '{
"model": "phi",
"prompt": "What is the capital of the United Kingdom?",
"stream": false
}'This command sends a request to the /api/generate endpoint with a simple prompt. stream: false tells Ollama to wait for the full response before returning it. You will see a JSON response containing the generated text.
This is interesting, because, if we can send an api request from our termincal, we can do the same from our applications! (We will see an example below)
Customising Models with Modelfile
Ollama's most significant capability lies in the ability to customize and create your own models using a Modelfile. A Modelfile is essentially a blueprint that defines how a model should be built or modified, allowing for significant flexibility.
Modelfile Basics
A Modelfile is a simple text file, similar to a Dockerfile, that uses a set of instructions to define your model. Here are some common instructions:
FROM: Defines the base model to use when creating a model.PARAMETER: Sets model parameters liketemperature,top_k,top_p, etc.MESSAGE: Allows you to specify a message history for the model to use when responding. Use multiple iterations of the MESSAGE command to build up a conversation which will guide the model to answer in a similar way.SYSTEM: Specifies the system message to be used in the template, if applicable.ADAPTER: (Advanced) Specifies LoRA adapters for fine-tuning.
I'll be delivering an interactive session at the Leeds Artificial Intelligence Society soon, where we’ll get to explore many of these topics hands-on.