S
Claude Code logoClaude CodeRun a strong open model offline on a MacBook

Best Local LLM on a Mac (Apple Silicon)

setuproll@setuproll
91.0Overall score

The simplest way to get a capable open model running on Apple Silicon, using Ollama with a clean chat UI and the right quant for your unified memory. Built for Mac owners who want private, offline coding and chat without touching the terminal much.

91.0Score
3.1kVotes
5Components

Install this build

Export
terminal
brew install ollama && ollama run qwen3:32b

Components

Model

  • Qwen3 32B (Q4_K_M)
  • Llama 3.3 70B for 64GB+ Macs

Stack

  • Ollama (Metal backend)
  • Open WebUI

Hardware

  • M-series, 36GB+ unified memory
  • 70B needs 64GB+

Quantization

  • Q4_K_M for balance
  • Q5_K_M if memory allows

How it works

  • Install Ollama, it auto-uses the Metal GPU backend
  • Pull a quant sized to leave 8GB headroom for the OS
  • Run Open WebUI in Docker and point it at localhost:11434
  • Chat fully offline, no data leaves the machine

Summary

The simplest way to get a capable open model running on Apple Silicon, using Ollama with a clean chat UI and the right quant for your unified memory. Built for Mac owners who want private, offline coding and chat without touching the terminal much.

91.0 score 3.1k votes

0 Reviews

Your rating
Sign in to post

Loading discussion...