Earngenix Logo
Skip to main content

ComfyUI Tutorials · Prompting · All Levels

How to Write FLUX.2 Prompts in ComfyUI: Step-by-Step Guide

FLUX.2 reads prompts differently from every Stable Diffusion model before it. Word order matters, negative prompts actively hurt your results, and quality tags do nothing. This guide shows you exactly how to write prompts that get precise, consistent output.

RTX 3060

Tested on

8 GB min

VRAM

v0.3.10

ComfyUI

All levels

Skill level

By Earngenix Team · · Tested on RTX 3060 (12 GB) and RTX 4080 (16 GB)

⚡ Quick Answer

FLUX.2 reads prompts differently from older Stable Diffusion models — word order matters, negative prompts actively hurt your results, and vague descriptions produce flat images. In ComfyUI, your prompt goes into the CLIP Text Encode node connected to the positive input of your KSampler. This guide shows you exactly how to write prompts that get precise, consistent results from FLUX.2.

If your FLUX.2 images look generic or don't match what you typed, the prompt is almost always the problem — not the model. FLUX.2 follows detailed instructions better than any previous Stable Diffusion model, but it reads prompts in a specific way. Paste an SD1.5-style prompt full of quality tags and you'll get a mediocre result. Write a clear description using the structure in this guide and FLUX.2 responds with precision.

This tutorial covers FLUX.2 prompting in ComfyUI from first principles — which node takes your prompt, how FLUX.2 weighs your words, and the exact techniques that produce professional results.

What You Need Before You Start

Hardware & software checklist

  • Tested on: ComfyUI v0.3.10, RTX 3060 (12 GB VRAM) and RTX 4080 (16 GB VRAM)
  • Minimum VRAM: 8 GB (6 GB with fp8 quantised variant)

Models Needed

FLUX.2 uses a different set of model files from FLUX.1. You need four files — one diffusion model, two text encoder variants (pick fp8 for lower VRAM or bf16 for full precision), and one VAE. All are hosted on Hugging Face via the Comfy-Org repository.

FileFolderNotesDownload
flux2_dev_fp8mixed.safetensorsmodels/diffusion_models/Main model↗ Download
mistral_3_small_flux2_fp8.safetensorsmodels/text_encoders/Text encoder — fp8 (recommended)↗ Download
mistral_3_small_flux2_bf16.safetensorsmodels/text_encoders/Text encoder — bf16 (full precision)↗ Download
flux2-vae.safetensorsmodels/vae/Image decoder↗ Download
fp8 vs bf16 text encoder: Use mistral_3_small_flux2_fp8.safetensors if you have 8–12 GB VRAM — it loads faster and uses less memory with minimal quality loss. Use mistral_3_small_flux2_bf16.safetensors if you have 16 GB+ VRAM and want maximum prompt accuracy. You only need one of the two.

Where to Place Each File in ComfyUI

Once you've downloaded the files, place them in the correct subfolders inside your ComfyUI installation. Note that FLUX.2 uses diffusion_models/ and text_encoders/ — not the unet/ and clip/ folders used by FLUX.1:

ComfyUI / folder structure
ComfyUI/
└── models/
├── text_encoders/
│ ├── mistral_3_small_flux2_fp8.safetensorstext encoder (recommended)
│ └── mistral_3_small_flux2_bf16.safetensorstext encoder (full precision)
├── diffusion_models/
│ └── flux2_dev_fp8mixed.safetensorsmain model
└── vae/
└── flux2-vae.safetensorsimage decoder
flux2_dev_fp8mixed.safetensors → models/diffusion_models/mistral_3_small_flux2_*.safetensors → models/text_encoders/flux2-vae.safetensors → models/vae/

Not installed yet? Follow the how to install ComfyUI guide first, then come back here. You'll also need ComfyUI Manager for installing any missing nodes.

Warning: Do not load FLUX.2 using the Load Checkpoint node you'd use for SD1.5 or SDXL. FLUX.2 uses a UNETLoader (pointing at diffusion_models/) and a DualCLIPLoader (pointing at text_encoders/). Using the wrong node or wrong folder produces a red error and no output.

How Does FLUX.2 Read Your Prompt?

FLUX.2 was trained on a different architecture than SD1.5 or SDXL. It uses Mistral 3 Small as its text encoder — a compact but capable language model — which is why FLUX.2 can follow long, detailed, natural-language descriptions so precisely.

The side effect: FLUX.2 reads your prompt more like a human reads a sentence. What comes first matters most.

Word Order Is How FLUX.2 Decides What Matters

FLUX.2 weighs the first words of your prompt most heavily. Whatever you put at the start is treated as the primary focus of the image. Your subject must come first — not style tags, not quality words, not camera settings.

✗ SD1.5 habit (avoid)

ultra detailed, masterpiece, 8k uhd, photorealistic, dramatic light, bokeh, an old fisherman mending nets on a wooden dock

✓ FLUX.2 structure

An elderly fisherman with weathered hands and a grey beard mending a hemp net on a salt-bleached wooden dock, overcast coastal morning, documentary photography, 50mm lens, desaturated blues and off-whites
Side-by-side comparison: SD1.5-style quality tag prompt on the left vs 4-part FLUX.2 natural language prompt on the right — same seed and settings🔍 Click to zoom
Screenshot 2: Same seed and CFG. Left: SD1.5-style quality-tag prompt. Right: natural-language FLUX.2 prompt.

FLUX.2 Uses the Mistral 3 Small Text Encoder

Unlike FLUX.1 which used T5-XXL + CLIP-L, FLUX.2 uses the Mistral 3 Small language model as its text encoder. The output feeds into a CLIP Text Encode node — which is where you type your prompt. You write one prompt. You don't split it between encoders.

ComfyUI canvas showing DualCLIPLoader connected to CLIP Text Encode connected to KSampler positive input — nodes labelled with arrows🔍 Click to zoom
Screenshot 3: DualCLIPLoader → CLIP Text Encode → KSampler. The wire into the positive input on the KSampler is the one that matters.

Quality Tags from SD1.5 Do Nothing Here

Words like masterpiece, best quality, and 8k are SD1.5 training conventions. FLUX.2 was not trained on them. They consume token weight on meaningless words and push your actual subject description further down the priority list.

SD1.5 tag (avoid)FLUX.2 replacement
highly detailedsharp studio lighting, fine texture detail
best qualitycommercial photography, clean composition
cinematicanamorphic lens, 2.39:1 aspect ratio, film grain
masterpieceeditorial fashion photography (describe the style explicitly)

The 4-Part Prompt Formula for FLUX.2

Every strong FLUX.2 prompt contains four components. You don't need to label them or follow a rigid order, but your prompt should cover all four before you generate.

Part 1 — Subject

Who or What Is in the Image

Be specific about what makes your subject distinct — age, appearance, clothing, expression, distinguishing features.

✗ Weak

an old man

✓ Strong

a man in his late 60s with deep-set eyes, thick silver eyebrows, wearing a faded denim jacket with a paint-stained collar

Part 2 — Action or Pose

What Are They Doing

Static subjects with no action produce stiff, stock-photo results. Add what the subject is doing, how they're positioned, or what state the object is in.

examples
leaning against a brick wall with arms folded, head tilted slightly downward
crouching beside a motorcycle, one hand resting on the fuel tank
ceramic bowl half-filled with miso soup, steam curling above the surface in still morning air
Tip: For portraits where you don't want a specific pose: neutral standing posture, direct eye contact with the camera, arms relaxed at sides

Part 3 — Style

What Should It Look Like

Style tells FLUX.2 which visual language to use. Put your style reference in the first half of your prompt — buried style tags get deprioritised.

Photography

medium format film, Fuji Pro 400H, muted pastel tones
surveillance camera aesthetic, high contrast monochrome
architectural photography, tilt-shift lens, perspective correction

Illustration

gouache illustration, thick paint strokes, visible texture
Soviet propaganda poster style, bold graphic shapes, limited palette
woodblock print, rough grain texture, earthy ink colours
Tip: FLUX.2 understands photography terminology with precision. shot on Hasselblad 500C, 80mm Zeiss Planar, f/4 produces different results from portrait photo.

Part 4 — Context

Setting, Lighting, and Mood

Context is everything that frames the subject: location, time of day, light source, and emotional tone. Name the light source and quality — not just the mood.

Good lighting

single bare tungsten bulb overhead, deep shadows below
blue-hour dusk, city skyline silhouetted behind the subject
harsh midday sun from directly above, bleached concrete ground

Vague lighting (avoid)

good lighting
natural light
pretty light

Full example — all four parts combined:

4-Part Formula — Complete Example
The 4-part example prompt typed into the CLIP Text Encode node in ComfyUI, with the generated output image beside it🔍 Click to zoom
Screenshot 4: The 4-part example prompt in the CLIP Text Encode node and its generated output.

Where to Type Your Prompt in ComfyUI

This is the exact node setup for a FLUX.2 workflow. If you've loaded a working FLUX.2 workflow, your canvas already has these nodes. If not, load the FLUX.2 ComfyUI workflow first — download the JSON and drag it onto the ComfyUI canvas.

  1. Find the node labelled CLIP Text Encode (Prompt) on your canvas — a rectangular block with a large multiline text field inside.
  2. Click inside the text field. Type or paste your prompt directly here. There's no character limit to worry about for normal use.
  3. Check that the output wire from this node (labelled CONDITIONING on the right side) connects to the positive input on your KSampler node.
  4. Find the second CLIP Text Encode node connected to the negative input on the KSampler. For FLUX.2, leave this field completely empty — see the next section for why.
  5. Click the orange Queue Prompt button in the top-right corner. A progress bar appears beneath it. Generation takes 15–60 seconds depending on your GPU.

You should see: a preview image appear in the output node on the right side of the canvas once generation completes.

Full ComfyUI canvas with CLIP Text Encode node highlighted — positive prompt field contains a sample prompt, negative CLIP Text Encode node is empty, both connect to the KSampler, Queue Prompt button visible top-right🔍 Click to zoom
Screenshot 5: Full canvas view — positive CLIP Text Encode highlighted, negative node empty, KSampler connections clear, Queue Prompt button top-right.

Why You Should Never Use Negative Prompts in FLUX.2

FLUX.2 does not support negative prompts the way SD1.5 or SDXL does. The model reads language literally — typing no blurry background can produce a blurry background. Typing no extra fingers can generate extra fingers.

This happens because FLUX.2's Mistral text encoder treats your negative prompt as a description, not an exclusion. It reads the word "fingers" and weights that concept in the output.

Leave the negative CLIP Text Encode node completely empty. Instead, describe what you want positively using the table below.
What you want to avoidWhat to write instead
Extra fingers or distorted handshands out of frame / arms crossed, hands tucked
Blurry foregroundsharp focus on subject, foreground in focus
Unwanted text in image(remove text from your positive prompt entirely)
Generic backgrounddescribe the exact background you want

CFG Scale — Set This Before Generating

FLUX.2 runs best at CFG 1.0–3.5. The high CFG values (7–15) that work for SD1.5 will distort FLUX.2 images — faces melt, colors saturate unnaturally, and details become overcooked. Find the KSampler node on your canvas and set cfg to 1.0 as a starting point.

Warning: If your FLUX.2 images look overexposed, oversaturated, or have melting faces, check your CFG scale first. A CFG above 3.5 is the most common cause of distorted FLUX.2 output.

Example Prompts — Ready to Use in ComfyUI

These four prompts use JSON structured format and cover a range of scenes — documentary street photography, noir atmosphere, editorial travel, and gritty urban realism. Copy any prompt directly into your CLIP Text Encode node and generate.

🔧 Try These Prompts in the FLUX.2 Workflow

Download the ready-to-use ComfyUI workflow JSON, or read the full FLUX.2 workflow guide.

Elderly chai wallah pouring masala tea into a clay kulhad during monsoon rain in North Kolkata
📍 Kolkata, India

Monsoon Chai Stall

Yakitori skewers over glowing binchotan charcoal in a three-seat counter under train tracks in Tokyo
📍 Yurakucho, Tokyo

Tokyo Yakitori Alley — Late Night

Towering conical mounds of saffron, turmeric and paprika in the Marrakech spice souk at golden hour
📍 Medina of Marrakech

Marrakech Spice Souk — Golden Hour

Deli counter worker assembling a bacon egg and cheese at a fluorescent-lit corner bodega at 3am in New York
📍 Lower East Side, Manhattan

New York Bodega — 3am

Tip: All four prompts use JSON format — but you can paste them as-is into the CLIP Text Encode node. FLUX.2 reads the structured fields and interprets them as a scene description. No special plugin or parser needed.

Advanced Prompting Techniques for FLUX.2

Once your basic workflow is producing clean results, these three techniques give you precise control over specific output characteristics.

How to Control Exact Colors Using HEX Codes

FLUX.2 understands HEX color codes — the six-character codes used in web design to specify exact colors (e.g. #C0392B for a specific shade of brick red). Pair the HEX code directly with the object it applies to.

✓ Works well — HEX tied to object

HEX Color — Correct

✗ Less reliable — HEX floating loose

HEX Color — Incorrect
Tip: You can specify multiple HEX codes in one prompt — one per object. a glass bottle in deep cobalt blue #003399 sitting on a slate tile in charcoal grey #36454F will hold both colors with reasonable consistency.

How to Add Text Inside Your Images

FLUX.2 renders text significantly better than SD1.5 or SDXL. To generate readable text in an image, wrap the exact wording in quotation marks inside your prompt.

In-image text — storefront sign example

Specify placement and style alongside the text:

text placement examples
bold condensed type reading "SOLD OUT" stamped in red ink diagonally across a white paper label
hand-painted text reading "fresh bread daily" in black cursive on a chalkboard menu
a door plate reading "ROOM 7" in brass relief lettering, close-up macro shot
Tip: FLUX.2 handles shorter text more reliably than long sentences. If you need more than 4–5 words in an image, generate multiple attempts and select the cleanest result.

How to Use JSON Structured Prompting

For complex scenes with multiple subjects or many independent variables, JSON format gives you modular control. Each element is a separate field — you can change the camera angle without rewriting the entire prompt. Paste this directly into the CLIP Text Encode node:

JSON structured prompt — street food stall
CLIP Text Encode node in ComfyUI with the JSON structured prompt block pasted into the text field🔍 Click to zoom
Screenshot 7: CLIP Text Encode node with the JSON block pasted in. One prompt field, structured like a spec sheet.
Tip: JSON prompting is most useful when you're iterating on one variable at a time. To add a second subject, add another entry to the subjects array (the list between square brackets [ ]). For single-subject images, natural language is faster.

Prompt Length — How Long Should Your FLUX.2 Prompt Be?

FLUX.2 supports prompts up to 32,000 tokens — far more than you'll ever need. Length is not the goal. Precision is. The practical rule: every word you add should visibly change something in the output. If you remove a word and the image doesn't change, that word was padding.

LengthApproximate word countWhen to use
Short10–30 wordsQuick concepts, testing a subject idea
Medium30–80 wordsMost images — best balance of detail and speed
Long80–300+ wordsComplex multi-subject scenes, JSON structured prompts

Build Prompts in Stages

This approach also makes it easier to identify which part of your prompt caused an unexpected result.

staged prompt build — market vendor example
Stage 1 (subject only):
An elderly woman selling dried spices at an open-air market stall

Stage 2 (add location):
An elderly woman selling dried spices at an open-air market stall, wooden trestle table covered with hessian sacks and terracotta bowls filled with turmeric, cumin, and paprika

Stage 3 (add style and lighting):
An elderly woman selling dried spices at an open-air market stall, wooden trestle table covered with hessian sacks and terracotta bowls, travel documentary photography, 50mm lens, bright diffused midday shade under a canvas awning

Stage 4 (add final detail):
An elderly woman selling dried spices at an open-air market stall, wooden trestle table covered with hessian sacks and terracotta bowls filled with turmeric, cumin, and paprika, travel documentary photography, 50mm lens, bright diffused midday shade under a canvas awning, shallow depth of field pulling focus to her hands scooping spice, warm ochre and brick-red colour palette, slight film grain

Troubleshooting — Why Your FLUX.2 Prompt Isn't Working

The Image Looks Generic No Matter What I Type

Cause: The prompt is too short or too vague. FLUX.2 fills in missing information — and the default fill is often stock-photo generic.

Fix:

  1. Add subject specifics — not a man but a man in his mid-40s, closely cropped beard, steel-rimmed glasses, wearing a worn grey henley, relaxed posture.
  2. Add a lighting description — single bare bulb overhead casting a hard downward shadow changes the feel more than most other additions.
  3. Add a style reference — Magnum-style documentary photography or medium format film, muted tones gives FLUX.2 a visual language to work in.

The Image Has Distorted Hands or Extra Fingers

Cause: Hands at complex angles are architecturally difficult for any diffusion model. Using negative prompts to fix this makes it worse — FLUX.2 reads no extra fingers as a description containing the concept "extra fingers."

Fix:

  1. Move hands out of the composition: hands out of frame, waist-up shot.
  2. Describe a specific hand position if hands must appear: left hand open flat on the table, four fingers together, thumb extended.
  3. Generate multiple times and select the cleanest result — iteration is the most reliable fix.

My Colors Look Wrong or Change Between Generations

Cause: Color language like "bright blue" or "warm tones" is interpreted loosely and changes across seeds (a seed is a number that controls which random variation ComfyUI generates — find it in the KSampler node).

Fix:

  1. Use HEX codes tied directly to the object: a linen shirt in dusty sage green #8FAF8F.
  2. Name the light source explicitly — sodium vapour street lamp produces orange-shifted images; cloudy north-facing window light produces cool, flat-lit images.
  3. Lock your seed in the KSampler node when testing color changes so you isolate the prompt variable.

FLUX.2 Is Ignoring Part of My Prompt

Cause: Critical details are buried at the end of a long prompt. FLUX.2 applies more weight to the beginning of the prompt.

Fix:

  1. Move your most important element to the first sentence.
  2. Check whether your style tag appears after 60+ words of scene description — if so, move it to sentence two.
  3. Shorten the prompt. A focused 40-word prompt usually beats a cluttered 120-word prompt.

Frequently Asked Questions

FLUX.2 does not support negative prompts the way SD1.5 or SDXL does. The Mistral text encoder reads your negative prompt as a description, not an instruction to exclude something. Leave the negative CLIP Text Encode node empty and describe what you want positively instead.

FLUX.2 follows prompts more precisely and handles longer, more detailed descriptions better than FLUX.1. The 4-part formula works for both, but FLUX.2 responds more consistently to specific camera language, HEX color codes, and detailed lighting descriptions. JSON structured prompting is also more reliable on FLUX.2. FLUX.2 also uses a different text encoder (Mistral 3 Small instead of T5-XXL) and different folder locations for model files.

30–80 words covers most use cases. Start with 10–20 words to get the subject right, then build up. FLUX.2 supports prompts up to 32,000 tokens, but longer prompts are only better when each word adds something specific and non-contradictory.

Style tags buried at the end of a long prompt receive less weight. Move your style reference to the first two sentences. Also avoid mixing conflicting styles in the same prompt — 35mm film grain and hyperrealistic digital render give the model contradictory direction.

Not directly. SD1.5-style prompts built around quality tags (masterpiece, best quality, 8k) dont help FLUX.2 — they consume token weight without improving output. Rewrite your prompts as natural descriptive sentences using the 4-part formula. The subjects and settings you described can stay — remove the tag strings.

Use CFG 1.0–3.5 for FLUX.2. Find the cfg setting in your KSampler node and set it to 1.0 as a starting point. The high CFG values (7–15) that sharpen SD1.5 images will distort FLUX.2 output — overexposed faces, unnatural saturation, and melting detail.

What to Do Next

Your prompts are working. Take the next step.

For images, go deeper on model selection and workflow settings. For video generation, the same prompting principles apply to LTX-2 — but motion direction adds a fifth dimension. Or follow the full structured path at the roadmap.

Published: 2025-06-15 · Last updated: 2025-06-15 · Tested on RTX 3060 (12 GB VRAM) and RTX 4080 (16 GB VRAM) · ComfyUI v0.3.10

Discussion

Join the discussion

Sign in to leave a comment or reply

💬

No comments yet

Be the first to share your thoughts!