Do I have to use JSON for Ideogram 4.0?

No — plain text works, but the model was trained exclusively on structured JSON captions. JSON gives you color palette control, bounding-box layout, and reliable in-image text rendering that plain text cannot match.

What happens if I get the key order wrong in style_description?

The model still generates an image, but color accuracy and layout placement both get noticeably worse. The model was trained with a strict key order and deviating from it puts your prompt outside the training distribution.

What is Magic Prompt in Ideogram 4.0?

Magic Prompt is a free server-side tool that automatically expands a plain-text prompt into a full structured JSON caption. The default config (ideogram-4-v1) uses Ideogram's own hosted API with a free API key — no local model required.

Can I use this JSON format on ideogram.ai directly?

Yes. The JSON schema works on ideogram.ai, through the API, and in ComfyUI. Paste it into the prompt field on ideogram.ai exactly as you would in ComfyUI.

Ideogram 4.0 JSON Prompt Format: Field-by-Field Guide (2026)

Q: How many elements can I include in one prompt?

There is no hard limit. The Ideogram team has published examples with 28 bounding-boxed objects in a single prompt. For most use cases, 3–6 elements produces the most reliable results.

⚡ Quick Answer

Ideogram 4.0 JSON prompts have three top-level fields: high_level_description, style_description, and compositional_deconstruction. Only compositional_deconstruction is required. Key order inside style_description is strict — getting it wrong produces noticeably weaker results. Paste the raw JSON directly into ComfyUI's text input node. Do not wrap it in code fences.

Before you start: This guide covers prompt writing — not installation. If you haven't set up Ideogram 4.0 in ComfyUI yet, read the Ideogram 4.0 ComfyUI install guide first, then come back here.

Why JSON Prompts Produce Better Results Than Plain Text

Every other image model you've used accepts plain text: "a golden retriever on a skateboard." Ideogram 4.0 is different. It was trained exclusively on structured JSON captions — a format that describes images field by field: the subject, the style, the lighting, each element's position. When you send plain text, the model has to guess how to map your words to that schema. When you send JSON, you're speaking the exact language the model learned from.

The practical difference comes down to three things plain text can't reliably do:

🎨

Color palette control

Specify up to 16 hex codes. The model steers toward them instead of guessing.

📐

Bounding box layout

Place elements at exact pixel coordinates on the canvas. No more "roughly in the middle."

✍️

In-image text rendering

Tell the model exactly what characters to render and where — how it hits 0.97 on OCR benchmarks.

For simple generations — a landscape, a portrait, a basic product shot — plain text is often good enough. Switch to JSON whenever you need precise layout, specific colors, or readable text inside the image. That's where the gap becomes visible.

✗ Plain text (weaker)

"a lone sailboat on calm water at sunset, warm light"

Model guesses at colors, composition, and style.

✓ JSON prompt (precise)

{
  "high_level_description": "...",
  "style_description": {
    "color_palette": ["#FF6B35","#004E89"]
  },
  "compositional_deconstruction": { ... }
}

Exact colors, element placement, and per-element styling.

The JSON Schema: Three Top-Level Fields

Every Ideogram 4.0 JSON prompt has the same three-part structure:

ideogram 4.0 — top-level schema

{
  "high_level_description": "...",
  "style_description": { ... },
  "compositional_deconstruction": { ... }
}

Field	Required?	What it does
`high_level_description`	Strongly recommended	One or two sentences summarising the whole image. The model reads this first.
`style_description`	Optional	Controls the visual style, lighting, medium, and color palette.
`compositional_deconstruction`	⚠ Required	The actual content — background description and every element in the scene.

Field 1: `high_level_description`

This is one or two sentences that summarise the whole image. The model reads this first to understand the overall scene before it processes the detailed fields below it.

Write it like a caption for a photograph. Include: camera distance (close-up, medium shot, wide), what the subject is doing, and where the scene takes place. Mention the medium here — photograph, illustration, graphic design — because it primes the style block before the model reads it.

high_level_description — examples

// Photography
"high_level_description": "A medium-shot photograph of a barista pouring latte art in a cosy café at golden hour."

// Graphic design
"high_level_description": "A bold typographic poster for a fictional jazz concert. Clean navy background, amber headline, minimal layout."

// Illustration
"high_level_description": "A flat-design illustrated map of a fictional city district with four colour-coded zones and labelled streets."

Tip: One or two sentences is enough. Don't try to describe every detail here — that's what compositional_deconstruction is for. This field sets the overall scene; the elements fill in the specifics.

Field 2: `style_description`

This block controls the visual style, lighting, medium, and color palette. There are two versions — one for photographs and one for everything else — and the key ordering inside each version is strict.

Warning: Key order matters here. The model was trained with keys in a specific order. Putting them out of order means your prompt samples outside the training distribution — the model still generates an image, but color accuracy, style consistency, and layout adherence all get noticeably worse. Follow the order exactly.

For Photographs

Key order must be: aesthetics → lighting → photo → medium → color_palette

style_description — photograph

"style_description": {
  "aesthetics": "warm, cinematic, golden hour",
  "lighting": "directional late-afternoon sunlight, long soft shadows from the left",
  "photo": "35mm, f/1.8, shallow depth of field, eye-level",
  "medium": "photograph",
  "color_palette": ["#F5C542", "#87CEEB", "#4A4A4A"]
}

For Illustrations, Graphic Design, and 3D

Key order must be: aesthetics → lighting → medium → art_style → color_palette

style_description — illustration / graphic design

"style_description": {
  "aesthetics": "minimal, professional, geometric",
  "lighting": "even, diffuse studio lighting — no cast shadows",
  "medium": "graphic_design",
  "art_style": "flat vector design, bold outlines, generous whitespace, sans-serif typography",
  "color_palette": ["#FFFFFF", "#333333", "#0066FF"]
}

Warning: Use photo for photographs and art_style for everything else. Never use both in the same prompt. And never use the photo key without also setting medium: "photograph".

Valid medium Values

"photograph""illustration""3d_render""painting""graphic_design""digital_art""ink_and_watercolor"

Key Order Reference Table

Caption type	Required key order
Photo (uses photo)	`aesthetics → lighting → photo → medium → color_palette`
Non-photo (uses art_style)	`aesthetics → lighting → medium → art_style → color_palette`

Note: color_palette is the only field that may be omitted. If included, it must always be last.

Field 3: `compositional_deconstruction`

This is the required field. It's where you describe the actual content of the image — the background and every element in it. It has two required sub-fields: background must come before elements.

compositional_deconstruction — structure

"compositional_deconstruction": {
  "background": "...",
  "elements": [ ... ]
}

The background Field

Describe the setting in detail. Don't write one sentence — write the full environment: time of day, atmosphere, surface textures, depth, what's visible in the distance. The model uses this to build the world the elements live in.

background — good example

"background": "The interior of a small independent coffee shop at night. Exposed brick walls painted in deep charcoal. Pendant Edison bulbs hang from a wooden ceiling, casting warm amber pools of light on the wooden counter below. Rain streaks down the large plate-glass windows on the left. A blurred street is visible through the rain."

Elements: type: "obj"

Each object in the image is an obj element. Give it a type, an optional bbox for placement, and a desc with as much detail as you can give.

Key order within an obj element: type → bbox → desc → color_palette

✗ Weak desc

"desc": "a dog"

Vague. The model guesses breed, pose, colour, expression.

✓ Strong desc

"desc": "A golden retriever with a fluffy cream coat, sitting upright, ears perked, facing the camera with a bright panting expression, eyes catching the afternoon light."

Specific. Every detail you care about is named.

Elements: type: "text"

When you want text rendered inside the image — on a poster, a sign, a product label, a business card — use type: "text" instead of type: "obj". This is how Ideogram 4.0 achieves its best-in-class text rendering.

Key order within a text element: type → bbox → text → desc → color_palette

text element — correct

{
  "type": "text",
  "bbox": [30, 50, 350, 950],
  "text": "SUMMER SALE",
  "desc": "Large bold uppercase sans-serif text in deep red, occupying the upper third of the image.",
  "color_palette": ["#DC2626"]
}

Warning: The text field contains the literal string you want rendered. The desc field describes how it looks — font weight, style, color, size, position. Both are needed. Do not put the text you want rendered inside desc — mixing them up causes the model to either ignore the text or render it incorrectly.

Element Key Order Summary

Type	Required key order
`"obj"`	`type → bbox → desc → color_palette`
`"text"`	`type → bbox → text → desc → color_palette`

bbox and color_palette are optional. If included, they must appear in the positions shown above.

Bounding Boxes: How to Place Elements on the Canvas

A bounding box (bbox) tells the model where to place an element on the canvas. The format is:

bbox format

[y_min, x_min, y_max, x_max]

Warning: Y comes first, not X. This is the opposite of what many people expect. Coordinates run from 0 to 1000, with 0,0 at the top-left corner of the canvas.

Think of the canvas as a 1000×1000 grid. The top-left is (0,0) and the bottom-right is (1000,1000). To place something in the right half of the image, your x_min would be 500.

Where you want the element	bbox value
Full image	`[0, 0, 1000, 1000]`
Top half	`[0, 0, 500, 1000]`
Bottom half	`[500, 0, 1000, 1000]`
Left third	`[0, 0, 1000, 333]`
Right third	`[0, 667, 1000, 1000]`
Centre	`[150, 200, 850, 800]`
Upper-left quarter	`[0, 0, 500, 500]`
Lower-right quarter	`[500, 500, 1000, 1000]`

Tip: You don't need a bbox on every element. Omit it for elements where exact placement doesn't matter — ambient background objects, atmospheric details, textures. Use it for the main subject, any text, and any object that needs to land in a specific spot.

Color Palette Conditioning

Adding a color_palette array steers the dominant colors in the image. This is one of Ideogram 4.0's strongest features for design work. Instead of saying "warm amber tones," you give exact hex codes and the model steers toward them.

color_palette — example

"color_palette": ["#FF6B35", "#F7C59F", "#004E89", "#1A659E", "#2B2D42"]

Rules

✓Uppercase hex only — #FF6B35 not #ff6b35
✓No shorthand hex — #FFFFFF not #FFF
✓Up to 16 colors — in style_description
✓Up to 5 colors — per element in elements

Tips for better results

→Include both highlight and shadow colors — give the model contrast to work with
→Include your background color explicitly — don't rely on the aesthetics text alone
→For dark moody scenes, add the dark hex values to the palette

You can also add a color_palette to individual elements in compositional_deconstruction. This gives the model per-element color guidance — useful when you want different zones of the image to have different dominant colors.

5 Prompt Examples — Copy and Run

Below are five complete JSON prompts covering different use cases: typography, product photography, cinematic scenes, UI mockups, and illustrated maps. Click View Prompt on any card to see the full JSON, and hit Copy Prompt to grab it. Paste directly into ComfyUI's text node and click Queue Prompt.

These are placeholder images — swap in the real generated output once you've run the workflow.

Disney-inspired fantasy adventure poster featuring a young explorer on a floating suitcase beneath giant DREAMBOUND cloud typography

🔍 Click to zoomFantasy Movie Poster

Whimsical adventure movie poster combining storybook fantasy, mixed-media collage design, and large-scale cinematic typography.

Luxury fashion portrait of a young woman holding a red rose beside a waterfall surrounded by lush greenery

🔍 Click to zoomFashion Editorial Portrait

High-end fashion and travel editorial photography with natural beauty, cinematic lighting, and realistic environmental details.

🔍 Click to zoomRealistic

Realistic image

Ultra-realistic action photograph of a young man riding a sport motorcycle through a busy city street with dramatic motion blur surrounding him

🔍 Click to zoomAction Photography

Cinematic high-speed motorcycle photography featuring a perfectly sharp rider against dynamic urban motion blur and light trails.

Blockbuster action-thriller poster featuring a lone futuristic operative standing amid a collapsing dystopian city under the title SKILL

🔍 Click to zoomAction Movie Poster

High-intensity Hollywood action poster with explosions, military aircraft, drones, futuristic warfare, and dramatic cinematic lighting.

Use AI to Write Ideogram 4.0 Prompts For You

Writing JSON by hand gives you the most control — but there's a faster way to get started. You can paste a special skill file into ChatGPT or Claude, describe any image in plain English, and the AI will output a complete, ready-to-paste Ideogram 4.0 JSON prompt for you.

Which AI is better for Ideogram prompts? ChatGPT (GPT-4o) tends to produce more creative and visually expressive prompts — it often adds unexpected atmospheric details, stronger aesthetic choices, and bolder color palettes. Claude is also excellent and produces very clean, well-structured JSON with precise, consistent descriptions. Both work well. Try ChatGPT first if you want maximum creative flair; use Claude if you want tighter, more predictable JSON structure.

Copy the Skill File

Click Copy Skill File below. This copies a set of instructions that tells the AI exactly how Ideogram 4.0 JSON prompts work — field order, bounding boxes, color palette rules, and how to expand vague descriptions into detailed prompts.

ideogram4-prompter — skill file

---
# IDEOGRAM 4.0 PROMPT GENERATOR — SKILL FILE
description: Works on Claude and ChatGPT. Paste this entire file as your first message, then describe your image....
---

# Ideogram 4.0 Prompt Generator

This skill converts natural-language image descriptions
into the structured JSON caption format that Ideogram 4.0
was trained on...

[Full skill file copied to clipboard — scroll to read or click Copy Skill File above]

Open ChatGPT or Claude and Paste the Skill File

Open a new chat in ChatGPT or Claude. Paste the skill file into the message box and send it. The AI will confirm it understands the Ideogram 4.0 format and is ready to generate prompts.

⭐ ChatGPT — More Creative

GPT-4o adds richer atmospheric detail, bolder color palettes, and more expressive aesthetic choices. Best for creative prompts where you want the AI to surprise you.

✓ Claude — Cleaner Structure

Produces very clean, well-structured JSON with consistent formatting and precise descriptions. Best when you need reliable, tightly formatted output every time.

Screenshot of ChatGPT with the Ideogram skill file pasted into the message box — the AI responds confirming it understands the Ideogram 4.0 JSON format — Step 2: Paste the full skill file into a new chat. The AI confirms it understands the schema and is ready to generate prompts.

Type Your Instruction and Describe Your Image

After sending the skill file, type the instruction below into the chat (or in a new message), then replace the placeholder text with your image description. Click Copy Instruction to grab it ready to paste.

Instruction — type this after pasting the skill file

You are now an Ideogram 4.0 JSON prompt generator. Follow the skill file above exactly. When I describe an image, output a complete Ideogram 4.0 JSON prompt in two blocks: first pretty-printed so I can read it, then minified in a code block labelled "Paste into Ideogram". Always expand vague descriptions with rich detail.

Here is my image:
[describe your image here]

Warning: Replace [describe your image here] with your actual description before sending. It can be as brief as "a cosy coffee shop at night with rain on the window" or as detailed as you like — the AI will expand it either way.

Screenshot of ChatGPT showing the instruction message typed after the skill file, with a plain English image description, and the AI generating a full Ideogram 4.0 JSON prompt in response — Step 3: Type the instruction with your image description. The AI outputs a pretty-printed JSON block for reading and a minified block ready to copy.

Copy the Minified JSON and Paste Into ComfyUI

The AI outputs two blocks. The first is pretty-printed so you can read and check it. The second — labelled "Paste into Ideogram" — is the minified version ready to copy directly into ComfyUI's CLIP Text Encode node. Copy the minified block, paste it into the node, and click Queue Prompt.

Tip: You can keep chatting to refine the result. After you get the first prompt, ask the AI to adjust it: "make the lighting more dramatic", "change the palette to cool blues and whites", "add a text element at the bottom that says SUMMER COLLECTION" — it will update the JSON and give you a new ready-to-paste block each time.

Screenshot showing the AI output with two code blocks — a pretty-printed JSON block above and a minified Paste into Ideogram block below, with the copy button highlighted on the minified block — Step 4: Copy the minified block labelled 'Paste into Ideogram' and paste it into your ComfyUI CLIP Text Encode node.

Magic Prompt: Skip Writing JSON Entirely

Writing JSON by hand produces the most precise results, but there's a faster option: Magic Prompt. It's a server-side tool that automatically expands a plain-text prompt into a full structured JSON caption. You write one sentence; it outputs the complete schema.

⭐ Default

ideogram-4-v1

Ideogram hosted API

Free

Default. Needs a free Ideogram API key. No local model.

Alternative

claude-opus-v1

OpenRouter (Claude)

Paid API key

Higher quality expansion. Needs an OpenRouter API key.

Alternative

claude-sonnet-v1

OpenRouter (Claude)

Paid API key

Faster and cheaper than Opus. Good for quick testing.

Important: The ideogram-4-v1 config is the default and is free. It runs the expansion server-side using Ideogram's own hosted API — no local model required. Get a free API key at developer.ideogram.ai. The Magic Prompt shipped in the open-source repo is not the same as the magic prompt used in production on ideogram.ai — results will differ.

Tip: Magic Prompt is great for quick tests. For production work where you need precise color hex values, exact bounding box placement, or per-element color control, writing the JSON manually gives you more control. Use Magic Prompt to explore, then manually tune the parts that matter.

How to Enter a JSON Prompt in ComfyUI

In your Ideogram 4.0 workflow, the text input node is a CLIP Text Encode node — the rectangular node labelled "CLIP Text Encode" connected by a wire to the sampler. It's usually on the left side of the canvas. Here's how to find it and paste your JSON:

Look for the CLIP Text Encode node on the canvas. It has a large text area in the centre and a wire running from its output to the sampler node. Click inside the text area to activate it — the border turns blue when it's active.
Select all existing text in the field (Ctrl+A on Windows / Cmd+A on Mac) and delete it.
Paste your JSON directly into the field. ComfyUI treats the content as a plain string and passes it to the model — no special formatting is applied.
Click the orange Queue Prompt button in the top-right corner of the ComfyUI interface. A progress bar appears below it and the active node highlights green as each generation step runs.
The finished image appears in the Save Image or Preview Image node on the right side of the canvas once generation completes.

Warning: Do not wrap your JSON in code fences (```json or ```) when pasting into ComfyUI. Paste the raw JSON only — no backticks, no "json" prefix. Code fences cause the model to treat the formatting characters as part of the prompt and produce garbage output.

Tip: The separators=(",", ":") argument in Python's json.dumps() removes whitespace between keys and values. In ComfyUI you're pasting the JSON directly — whitespace between keys is fine and doesn't affect generation. Only the key order matters.

Troubleshooting

Results look the same as plain text — JSON doesn't seem to help

The JSON is valid but the model isn't benefiting from it. Check two things. First, confirm your key order in style_description is exactly right — see the key order table above. Second, make sure compositional_deconstruction is present and contains both background and elements. If either is missing, the model falls back toward plain-text behavior.

"Image blocked by safety filter" — gray screen output

The model's safety filter triggered on your prompt. This happens more often with plain text than with JSON — the structured format has a lower false-positive rate. If you're using JSON and still hitting this, check your desc fields for phrasing that could be misread as unsafe. Rephrase descriptively and neutrally. Restarting the generation with the same prompt sometimes also resolves a false positive.

Colors in the image don't match my color_palette

Two common causes. First, check that all hex codes are uppercase — #FF6B35 not #ff6b35. Lowercase hex is not in the training distribution. Second, add more contrast to your palette — if all your colors are similar tones, the model has less signal to work with. Include both highlight and shadow hex values explicitly.

Text in the image is blurry or incorrect

Make sure you used type: "text" and not type: "obj" for every text element. Confirm the literal string is in the text field and the styling description is in desc. If the text is still blurry, add a bbox to anchor it — floating text without placement coordinates is harder for the model to render cleanly.

JSON causes an error in ComfyUI (node turns red)

ComfyUI doesn't parse the JSON itself — it passes the string directly to the model. A red node means a different problem: most likely a disconnected wire or a missing model file. The JSON text itself never causes a node to turn red. Check your ComfyUI troubleshooting guide for node errors.

Elements appear in the wrong position

Double-check your bbox coordinate order. The format is [y_min, x_min, y_max, x_max] — Y first, then X. Swapping them is the most common bounding box mistake. If an element that should be on the left appears on the right, the X and Y values are reversed.

Frequently Asked Questions

Plain text works fine for most generations. JSON gives you meaningful advantages for three specific things: placing elements at exact positions using bounding boxes, locking in specific colors via hex codes, and rendering text inside the image reliably. For a straightforward portrait or landscape, plain text is often good enough. Switch to JSON when layout, color, or in-image text matters.

The model still generates an image, but layout adherence, color accuracy, and style consistency all get noticeably worse. It's a subtle degradation, not a hard error. The model was trained with a strict key order, and putting keys out of order means your prompt doesn't match the training distribution — always follow the order in the table above exactly.

Magic Prompt is a server-side tool that expands a plain-text description into a full JSON caption automatically. The default config (ideogram-4-v1) uses Ideogram's own hosted API and is free with a free API key. Quality is similar to writing the JSON yourself and it's much faster for quick tests. For production work where you need precise color or layout control, writing the JSON manually gives you more control over the output.

Use type: "text" for every text element, put the exact string in the text field (not the desc field), add a bbox to anchor its position, and describe the font style in desc. Ideogram 4.0 scores 0.97 on the X-Omni OCR benchmark — the best among open-weight models — but it needs the structured input format to perform at that level.

There's no hard limit. The Ideogram team has published examples with 28 bounding-boxed objects in a single prompt. In practice, more elements means more chances for small placement errors. For most use cases, 3–6 elements produces the most reliable results. Add more only when precise layout across many items is the point — like a detailed poster or an annotated scene.

Yes. The JSON schema works on ideogram.ai, through the API, and in ComfyUI — anywhere you can enter a prompt. Paste it into the prompt field on ideogram.ai exactly as you would in ComfyUI. The model is the same; only the interface differs.

What to Do Next

Take one of the 5 prompts above and try it in your workflow.

Copy the concert poster prompt, paste it into ComfyUI's text node, and compare the result to a plain-text version. The difference in color fidelity and text rendering is the clearest way to see what the JSON format actually does.

Ideogram 4.0 ComfyUI Install Guide →ComfyUI Learning Roadmap →

Published: 2026-06-11 · Last updated: 2026-06-11 · Tested on RTX 4090 (24 GB VRAM) · ComfyUI v0.3.x · Ideogram 4.0 fp8 checkpoint