VS Code SetupIntermediate

How to Run Continue with a Local Model Using Ollama

Install Ollama, pull a code model, and connect it to Continue so your AI assistant runs fully offline.

10 minIntermediate

Running a model locally means your code never leaves your machine and there is no per-request bill. Ollama makes this practical by serving open models behind a simple local endpoint. This guide installs Ollama, pulls a coding model, and wires it into Continue for both chat and autocomplete.

What you need

The Continue extension installed
A machine with at least 8 GB of RAM, more for larger models
A few gigabytes of free disk space per model
About 10 minutes plus download time

Step 1: Install Ollama

Download Ollama from its website and install it, or use the install script on Linux. Once installed it runs as a background service listening on localhost port 11434. Confirm it is running from the terminal.

zsh - install ollama

$curl -fsSL https://ollama.com/install.sh | sh

>>> Installing ollama...

$ollama --version

ollama version is 0.5.0

Step 2: Pull a model

Use ollama pull to download a model. A general coding model like qwen2.5-coder works well for chat and edits. For autocomplete, a smaller model gives faster responses. The first pull downloads several gigabytes; later runs are instant.

zsh - pull models

$ollama pull qwen2.5-coder:7b

pulling manifest

success

$ollama pull qwen2.5-coder:1.5b-base

smaller base model is better for autocomplete

Step 3: Add the model to Continue

Open ~/.continue/config.yaml and add an entry with provider set to ollama and the model id matching the tag you pulled. No API key is needed because the model runs locally. Add a second entry with the autocomplete role pointing at the small base model.

~/.continue/config.yaml

models:
  - name: Qwen Coder (local)
    provider: ollama
    model: qwen2.5-coder:7b
    roles:
      - chat
      - edit
  - name: Qwen Autocomplete
    provider: ollama
    model: qwen2.5-coder:1.5b-base
    roles:
      - autocomplete

VS Code - Continue model picker

Model: Qwen Coder (local) v

----------------------------------------

Qwen Coder (local)

Qwen Autocomplete

The local model now appears in the dropdown.

Match the model to your hardware

A 7B model needs roughly 8 GB of RAM to run comfortably. If responses are slow or the machine swaps, switch to a smaller variant. Quality scales with size, but so does the memory and the wait.

Step 4: Test offline

Select the local model in the chat panel and ask a question. Then, to prove it is truly local, turn off your network and ask again. The response still arrives because everything runs on your own machine.

Result

Continue now uses a model served by Ollama on localhost, with a larger model for chat and a smaller one for autocomplete. You have an AI assistant that costs nothing per request and keeps your code on your machine.