Image ToolsAdvanced

How to Control Image Composition with ControlNet in ComfyUI

Use a ControlNet to lock a pose, depth, or edge layout from a reference image so your generation follows a structure you choose.

9 minAdvanced

ControlNet conditions a generation on the structure of a reference image. You can lock a human pose, follow a depth map, or trace edges, so the output matches a layout you decide rather than whatever the model invents. This guide builds a basic ControlNet chain using a preprocessor and a ControlNet model.

What you need

A base checkpoint (SD 1.5 or SDXL)
A ControlNet model that matches your base (for example control_v11p_sd15_openpose)
A preprocessor node pack such as ComfyUI's ControlNet Aux nodes
A reference image to extract structure from

Step 1: Install the ControlNet model and aux nodes

Place the ControlNet model in models/controlnet. Install the ControlNet Aux preprocessor pack through ComfyUI Manager so you get nodes like OpenPose Pose and Depth Anything.

Step 2: Preprocess the reference

Load your reference image, then feed it into a preprocessor node that matches your ControlNet. For pose control use the OpenPose preprocessor, which turns the photo into a stick-figure skeleton. For layout use a depth or canny edge preprocessor instead.

ComfyUI - preprocessor output

[ Load Image ] --> [ OpenPose Pose ] --IMAGE--> [ Apply ControlNet ]

reference photo skeleton overlay

o o

/|\ ===> /|\

/ \ / \

OpenPose extracts a skeleton; that skeleton is what conditions the model.

Step 3: Add the Apply ControlNet node

Add an Apply ControlNet node and a Load ControlNet Model node. Feed the preprocessed image and your positive conditioning into the apply node, along with the loaded ControlNet model.

Step 4: Route conditioning into the sampler

The Apply ControlNet node outputs modified conditioning. Connect that into your KSampler's positive input in place of the raw text conditioning. Now the sampler honors both your prompt and the reference structure.

node connections (summary)

Load ControlNet Model -> Apply ControlNet (control_net)
OpenPose Pose          -> Apply ControlNet (image)
CLIP Text Encode (+)   -> Apply ControlNet (conditioning)
Apply ControlNet (out) -> KSampler (positive)

Strength and timing

The strength value sets how hard the ControlNet pulls. Around 0.6 to 0.8 keeps the pose while letting the model breathe. If results look stiff or copied, lower the strength or set end_percent so control only applies early in sampling.

Step 5: Generate and compare

Queue the prompt. The output should share the reference's pose or layout while taking its subject, style, and details from your prompt. Try a few seeds to find one where the structure and the content agree.

Result: a generation that obeys a pose or composition you chose. Swap the preprocessor and matching ControlNet model to control depth, edges, or scribbles instead.