How to Control Image Composition with ControlNet in ComfyUI
Use a ControlNet to lock a pose, depth, or edge layout from a reference image so your generation follows a structure you choose.
ControlNet conditions a generation on the structure of a reference image. You can lock a human pose, follow a depth map, or trace edges, so the output matches a layout you decide rather than whatever the model invents. This guide builds a basic ControlNet chain using a preprocessor and a ControlNet model.
What you need
- A base checkpoint (SD 1.5 or SDXL)
- A ControlNet model that matches your base (for example control_v11p_sd15_openpose)
- A preprocessor node pack such as ComfyUI's ControlNet Aux nodes
- A reference image to extract structure from
Step 1: Install the ControlNet model and aux nodes
Place the ControlNet model in models/controlnet. Install the ControlNet Aux preprocessor pack through ComfyUI Manager so you get nodes like OpenPose Pose and Depth Anything.
Step 2: Preprocess the reference
Load your reference image, then feed it into a preprocessor node that matches your ControlNet. For pose control use the OpenPose preprocessor, which turns the photo into a stick-figure skeleton. For layout use a depth or canny edge preprocessor instead.
Step 3: Add the Apply ControlNet node
Add an Apply ControlNet node and a Load ControlNet Model node. Feed the preprocessed image and your positive conditioning into the apply node, along with the loaded ControlNet model.
Step 4: Route conditioning into the sampler
The Apply ControlNet node outputs modified conditioning. Connect that into your KSampler's positive input in place of the raw text conditioning. Now the sampler honors both your prompt and the reference structure.
Load ControlNet Model -> Apply ControlNet (control_net)
OpenPose Pose -> Apply ControlNet (image)
CLIP Text Encode (+) -> Apply ControlNet (conditioning)
Apply ControlNet (out) -> KSampler (positive)Step 5: Generate and compare
Queue the prompt. The output should share the reference's pose or layout while taking its subject, style, and details from your prompt. Try a few seeds to find one where the structure and the content agree.
Result: a generation that obeys a pose or composition you chose. Swap the preprocessor and matching ControlNet model to control depth, edges, or scribbles instead.
Watch related tutorials
26:00
19:48
20:00
19:00
32:00
39:00