Image ToolsIntermediate

How to Fix Out-of-Memory and Slow Generations in ComfyUI

Diagnose CUDA out-of-memory errors and sluggish runs, then apply launch flags and workflow tweaks that let big models run on modest cards.

8 minIntermediate

The most common wall ComfyUI users hit is the CUDA out-of-memory error, usually with Flux or SDXL on an 8 GB card. The fix is rarely buying a new GPU; it is a mix of launch flags, lighter model files, and smaller batches. This guide works through the levers in order of effort.

What you need

A ComfyUI install that is throwing memory errors or running slowly
Access to your launch command or the run batch file

Step 1: Read the actual error

When a run fails, look at the console. A CUDA out of memory message tells you VRAM ran out, which is different from a missing-file error. Knowing which one you have decides the fix.

ComfyUI console - the error

torch.cuda.OutOfMemoryError: CUDA out of memory.

Tried to allocate 1.50 GiB. GPU 0 has a total capacity of 8.00 GiB

this is a VRAM problem, not a setup problem

Step 2: Lower the easy levers first

Drop batch_size back to 1, reduce the resolution (1024 instead of 1536), and close other GPU apps like games or a second browser running video. These cost nothing and often clear the error on their own.

Step 3: Switch to fp8 or GGUF model files

Large models have lighter versions. For Flux, use the fp8 unet and the t5xxl_fp8 encoder instead of the fp16 files, which roughly halves memory. For very tight cards, quantized GGUF model files with the GGUF custom nodes go even lower.

Quality cost is small

Moving from fp16 to fp8 makes a barely noticeable difference for most images while freeing several gigabytes. It is almost always worth trying before anything more drastic.

Step 4: Add a memory launch flag

ComfyUI offers flags that trade speed for lower VRAM use. Add --lowvram for cards that keep running out, or --normalvram if auto-detection guessed wrong. On the Windows portable build, edit the .bat file to append the flag.

run_nvidia_gpu.bat (edited)

.\python_embeded\python.exe -s ComfyUI\main.py --windows-standalone-build --lowvram
pause

Step 5: Address slowness separately

If runs work but crawl, slowness usually means the model is spilling out of VRAM into system RAM, or you are on an aggressive lowvram mode you no longer need. Once your other fixes free memory, remove --lowvram and let ComfyUI keep more of the model on the GPU.

Result: heavier models running on a modest card without crashing. Apply the cheap fixes first, reach for fp8 or GGUF next, and use launch flags as the final adjustment.