VS Code SetupIntermediate

How to Run Continue with a Local Model Using Ollama

Install Ollama, pull a code model, and connect it to Continue so your AI assistant runs fully offline.

10 minIntermediate

Running a model locally means your code never leaves your machine and there is no per-request bill. Ollama makes this practical by serving open models behind a simple local endpoint. This guide installs Ollama, pulls a coding model, and wires it into Continue for both chat and autocomplete.

What you need

  • The Continue extension installed
  • A machine with at least 8 GB of RAM, more for larger models
  • A few gigabytes of free disk space per model
  • About 10 minutes plus download time

Step 1: Install Ollama

Download Ollama from its website and install it, or use the install script on Linux. Once installed it runs as a background service listening on localhost port 11434. Confirm it is running from the terminal.

zsh - install ollama
$curl -fsSL https://ollama.com/install.sh | sh
>>> Installing ollama...
$ollama --version
ollama version is 0.5.0
$

Step 2: Pull a model

Use ollama pull to download a model. A general coding model like qwen2.5-coder works well for chat and edits. For autocomplete, a smaller model gives faster responses. The first pull downloads several gigabytes; later runs are instant.

zsh - pull models
$ollama pull qwen2.5-coder:7b
pulling manifest
success
$ollama pull qwen2.5-coder:1.5b-base
smaller base model is better for autocomplete
$

Step 3: Add the model to Continue

Open ~/.continue/config.yaml and add an entry with provider set to ollama and the model id matching the tag you pulled. No API key is needed because the model runs locally. Add a second entry with the autocomplete role pointing at the small base model.

~/.continue/config.yaml
models:
  - name: Qwen Coder (local)
    provider: ollama
    model: qwen2.5-coder:7b
    roles:
      - chat
      - edit
  - name: Qwen Autocomplete
    provider: ollama
    model: qwen2.5-coder:1.5b-base
    roles:
      - autocomplete
VS Code - Continue model picker
Model: Qwen Coder (local) v
----------------------------------------
Qwen Coder (local)
Qwen Autocomplete
The local model now appears in the dropdown.
Match the model to your hardware
A 7B model needs roughly 8 GB of RAM to run comfortably. If responses are slow or the machine swaps, switch to a smaller variant. Quality scales with size, but so does the memory and the wait.

Step 4: Test offline

Select the local model in the chat panel and ask a question. Then, to prove it is truly local, turn off your network and ask again. The response still arrives because everything runs on your own machine.

Result

Continue now uses a model served by Ollama on localhost, with a larger model for chat and a smaller one for autocomplete. You have an AI assistant that costs nothing per request and keeps your code on your machine.

Watch related tutorials

Tags
#continue#ollama#local#offline#privacy