Choosing the right Kling model
Kling 3.0 ships four models. Here's when to use each:
- Kling Video 3.0 — Best for most video generation tasks. Fastest, highest quality video without audio.
- Kling Video 3.0 Omni — When you need synchronized audio (voices, music, SFX) built into the video.
- Kling Image 3.0 — For photorealistic still images, or when you want reference frames for video.
- Kling Image 3.0 Omni — For complex image prompts with text, branding, or cross-modal inputs.
Writing effective prompts
The quality of your Kling output depends significantly on your prompt. Here's a framework:
- Subject: Who or what is in the scene? ("A lone astronaut")
- Action: What are they doing? ("walking slowly across red sand")
- Setting: Where? Time of day? ("alien desert planet, twin suns setting")
- Style: Visual aesthetic? ("cinematic 4K, dust particles in air, epic wide shot")
- Camera: Shot type and movement? ("slow push in, shallow depth of field")
- Duration hint: Pacing? ("contemplative, unhurried")
Example of a weak prompt:
"astronaut on Mars"
Example of a strong prompt:
"A lone astronaut in a worn white suit walks slowly across a vast red Martian desert at sunset. Twin suns dip below the horizon, casting long amber shadows. Dust particles drift past the visor. Cinematic 4K, wide shot, slow dolly forward, dramatic music swells."
Aspect ratios and use cases
- 16:9 — Landscape video: YouTube, presentations, streaming, desktop
- 9:16 — Vertical video: TikTok, Instagram Reels, YouTube Shorts
- 1:1 — Square: Instagram feed, thumbnails
- 4:3 — Classic widescreen or retro aesthetics
- 21:9 — Ultra-wide cinematic: film-style content
Using reference images
Upload a reference image on the Create page to guide Kling's generation. Good use cases:
- Image-to-video: animate a product photo or still image
- Character consistency: provide a character reference for the protagonist
- Style reference: paste a film still or artwork to define the visual aesthetic
For best results, use clear, high-quality images with a single dominant subject. JPEG, PNG, and WebP are supported up to 20MB.
Getting the most from Video 3.0 Omni (audio)
When using the Omni model, your prompt should include audio cues:
- Mention ambient sounds ("ocean waves in the background")
- Specify music mood ("slow orchestral buildup")
- Include dialogue cues if characters are speaking ("a woman says: 'The launch window opens in three minutes.'")
- Specify the language for voice generation
Multi-shot storyboarding
To trigger multi-shot mode, describe your scene as a sequence of shots in your prompt:
"Shot 1: Wide establishing shot of a rain-soaked city at night. Shot 2: Medium shot of a detective in a trench coat walking under a streetlight. Shot 3: Close-up of her face, rain on her cheek, eyes scanning the alley. Shot 4: POV shot as she reaches for a door handle."
Kling will attempt to generate all specified shots as a continuous, coherent video.