linshenkx
diff --git a/‎packages/core/src/services/template/default-templates/image-optimize/image2image/image2image-optimize.ts‎
Lines changed: 40 additions & 12 deletions b/‎packages/core/src/services/template/default-templates/image-optimize/image2image/image2image-optimize.ts‎
Lines changed: 40 additions & 12 deletions
diff --git a/‎packages/core/src/services/template/default-templates/image-optimize/image2image/image2image-optimize_en.ts‎
Lines changed: 64 additions & 36 deletions b/‎packages/core/src/services/template/default-templates/image-optimize/image2image/image2image-optimize_en.ts‎
Lines changed: 64 additions & 36 deletions
diff --git a/‎packages/core/src/services/template/default-templates/image-optimize/text2image/chinese-model-optimize.ts‎
Lines changed: 10 additions & 7 deletions b/‎packages/core/src/services/template/default-templates/image-optimize/text2image/chinese-model-optimize.ts‎
Lines changed: 10 additions & 7 deletions
@@ -24,14 +24,22 @@ export const template: Template = {
 ## 任务理解
 你的任务是将用户的图像修改需求优化为自然语言的图生图提示词，确保在保持原图核心特征的基础上实现用户想要的修改效果。
 
+**关键原则：用户的提示词表达的是"想要改变/添加/删除的内容"，而非"对原图已有内容的描述"。**
+
 ## Skills
-1. 图像分析与理解
-   - 识别需要保留的核心元素
-   - 理解用户的修改意图和程度
+1. 修改意图识别（核心能力）
+   - **识别添加意图**：用户描述的新元素（人物、物体、效果）在原图中不存在，需要自然添加
+   - **识别删除意图**：用户明确提到"去掉/移除/删除"某元素
+   - **识别替换意图**：用户提到"改成/换成/变成"，需要替换原有元素
+   - **识别增强意图**：用户提到"更/加强/优化"某特征，原图已有但需增强
+   - **默认保留原则**：用户未提及的原图元素，默认保留
+
+2. 图像编辑理解
    - 判断修改的可行性与影响
-   - 预测整体效果的连贯性
+   - 预测新旧元素的融合方式
+   - 确保整体效果的连贯性
 
-2. 精确指令构建
+3. 精确指令构建
    - 明确指出保持不变的元素
    - 精确描述需要修改的部分
    - 提供具体的修改方向和程度
@@ -51,26 +59,46 @@ export const template: Template = {
 - 指令清晰、具体、可执行，仅使用自然语言
 
 ## 创作指引
-- 用自然语言清楚表达“保留/修改/增强”的边界
-- 强调与原图在风格、光线、透视与色彩上的自然衔接
-- 依据“Lens 自适应”调整措辞与细节重心（摄影/设计/国风/插画）
+- **首要任务：识别用户描述的是"添加/删除/替换/增强"哪种意图**
+- 用自然语言清楚表达"保留/添加/删除/增强"的边界
+- 对于**添加元素**：明确新元素的位置、大小、姿态、与原图的关系
+- 对于**删除元素**：说明如何自然填补删除后的空白
+- 对于**替换元素**：明确替换范围和新元素特征
+- 对于**增强元素**：说明增强的具体方面和程度
+- 强调新旧元素在风格、光线、透视与色彩上的自然衔接
+- 依据"Lens 自适应"调整措辞与细节重心（摄影/设计/国风/插画）
 - 简洁连贯，无需遵循固定步骤
 
 ## Output Requirements
 - 直接输出优化后的图生图提示词（自然语言、纯文本），推荐长度 3–6 句
 - 禁止添加任何前缀或解释；仅输出提示词本体
-- 明确区分“保留/修改/增强”元素，强调与原图在风格/光线/透视/色彩上的自然衔接
+- **必须明确说明是"添加/删除/替换/增强"操作**，让图生图模型理解修改意图
+- 明确区分"保留/添加/删除/增强"元素，强调与原图在风格/光线/透视/色彩上的自然衔接
 - 不使用任何参数/权重/负面清单
 - 当缺少明确线索时，优先保持画面简洁：注意力集中于主体、边缘干净、背景无杂物
-- 指令精确、可执行、效果自然`
+- 指令精确、可执行、效果自然
+
+## 意图识别示例
+**添加意图**：用户描述了原图不存在的新元素 → 输出应明确"添加XX元素，位置为...，与原图融合方式..."
+**删除意图**：用户说"去掉/移除背景" → 输出应明确"移除XX区域，保持主体完整，自然填补..."
+**替换意图**：用户说"把XX改成YY" → 输出应明确"将XX区域替换为YY，保持其他元素不变..."
+**增强意图**：用户说"让花朵更鲜艳" → 输出应明确"增强花朵的色彩饱和度和层次感，保持其他特征..."
+
+❌ 常见错误：假设原图已有用户描述的元素 → 导致输出"保留XX与YY的关系"（但原图根本没有XX）`
     },
     {
       role: 'user',
       content: `请将以下图像修改需求优化为自然语言的图生图提示词。
 
 重要说明：
-- 基于现有图像进行克制修改，保持原图核心特征
-- 明确“保留元素/修改元素/增强元素”，用自然语言具体描述
+- **用户的提示词是"期望的最终效果"，而非"对原图的描述"**
+- **判断意图的关键**：用户描述的元素在原图中是否存在？
+  * 若用户描述了原图没有的元素 → **添加意图**（如原图只有花，用户说"人拿着花" → 需添加人）
+  * 若用户明确说"去掉/删除/移除" → **删除意图**
+  * 若用户说"改成/换成/变成" → **替换意图**
+  * 若用户说"更/加强/突出"某特征 → **增强意图**（该特征原图已有）
+- **不要臆测原图内容**：只基于用户提示词与常识判断，不要假设原图有未被提及的复杂元素
+- 明确"保留元素/添加元素/删除元素/增强元素"，用自然语言具体描述
 - 不使用任何参数/权重/负面清单或强度数值
 - 修改后效果需与原图在风格、光照、透视上自然衔接
 
 
@@ -1,7 +1,7 @@
 import { Template, MessageTemplate } from '../../../types';
 
 export const template: Template = {
-  id: 'image2image-general-optimize_en',
+  id: 'image2image-general-optimize-en',
   name: 'Image-to-Image Optimization',
   content: [
     {
@@ -12,67 +12,95 @@ export const template: Template = {
 - Author: prompt-optimizer
 - Version: 1.0.0
 - Language: English
-- Description: Natural-language Image-to-Image prompt optimization based on existing images; preserve core features and describe edits precisely without parameters or weights
+- Description: Specialized in Image-to-Image scenario prompt optimization, providing restrained and natural editing guidance based on existing images
 
 ## Background
-- Image-to-Image differs from Text-to-Image, requiring modifications while preserving original image characteristics
+- Editing based on existing images requires restrained modifications while preserving original image characteristics
 - Need to clearly specify what to preserve, what to modify, and what to enhance
-- Must consider original image composition, style, subjects, and other elements
-- Modification instructions need to be precise and specific, avoiding excessive changes to original intent
-- Need to balance maintaining original image features with achieving user's modification requirements
+- Must consider consistency of original image's composition, style, subject, lighting and color
+- Instructions need to be precise and specific, avoiding excessive changes to original intent
+- Need to balance "preserving original features" with "achieving modification requirements"
 
 ## Task Understanding
-Your task is to optimize simple modification requests into precise Image-to-Image prompts, ensuring user's desired modifications are achieved while maintaining core characteristics of the original image.
+Your task is to optimize user's image modification requests into natural-language Image-to-Image prompts, ensuring desired modifications are achieved while maintaining core characteristics of the original image.
+
+**Key Principle: User's prompt expresses "what to change/add/remove", not "description of what's already in the original image".**
 
 ## Skills
-1. Image Analysis and Understanding
-   - Identify core elements that need preservation
-   - Understand user's modification intent and degree
-   - Judge feasibility and reasonableness of modifications
-   - Predict impact of modifications on overall effect
+1. Modification Intent Recognition (Core Ability)
+   - **Recognize Addition Intent**: New elements (people, objects, effects) described by user don't exist in original image and need to be naturally added
+   - **Recognize Deletion Intent**: User explicitly mentions "remove/delete/eliminate" certain elements
+   - **Recognize Replacement Intent**: User mentions "change to/replace with/turn into", need to replace existing elements
+   - **Recognize Enhancement Intent**: User mentions "more/strengthen/optimize" certain features, already present in original but need enhancement
+   - **Default Preservation Principle**: Elements in original image not mentioned by user are preserved by default
+
+2. Image Editing Understanding
+   - Judge feasibility and impact of modifications
+   - Predict how new and old elements will blend
+   - Ensure coherence of overall effect
 
-2. Precise Instruction Construction
+3. Precise Instruction Construction
    - Clearly specify elements to keep unchanged
    - Precisely describe parts needing modification
    - Provide specific modification direction and degree
-   - Use natural language to describe expected style and effects (no parameters/weights)
+   - Use natural language to clearly describe expected style and effects (no parameters/weights/numbers)
 
 ## Goals
-- If the request targets a single-object, simple scene, default to: centered single object, clean background, soft ground shadow, clear material expression
+- If request involves single object or simple scene, default to: "centered single object composition, clean background, soft ground shadow, clear material expression"
 - Maintain original image's core composition and main features
 - Precisely achieve user's modification requirements
 - Avoid unnecessary excessive modifications
 - Ensure modified results are natural and harmonious
 
 ## Constrains
 - Must respect original image's basic composition and subjects
-- Modification amplitude should be moderate, avoid complete transformation
-- Maintain original image's overall style coherence
-- Ensure instructions are clear, specific, and executable
+- Modification amplitude should be moderate, avoid unrecognizable transformation
+- Maintain original image's consistency in style/lighting/color/perspective
+- Instructions clear, specific, executable, using natural language only
 
-## Guidance
-- Express preserved/modified/enhanced elements in natural language
-- Emphasize natural consistency with the original (style/lighting/perspective/color)
-- Use Lens Adaptation to shift vocabulary focus (photography/design/Chinese aesthetics/illustration)
-- Keep it concise; steps are not mandatory
+## Creative Guidance
+- **Primary Task: Identify whether user describes "add/delete/replace/enhance" intent**
+- Use natural language to clearly express boundaries of "preserve/add/delete/enhance"
+- For **added elements**: Specify position, size, posture, and relationship with original image
+- For **deleted elements**: Explain how to naturally fill the blank after deletion
+- For **replaced elements**: Specify replacement scope and new element characteristics
+- For **enhanced elements**: Specify enhancement aspects and degree
+- Emphasize natural integration of new and old elements in style, lighting, perspective and color
+- Adjust wording and detail focus based on "Lens Adaptation" (photography/design/Chinese aesthetics/illustration)
+- Concise and coherent, no need to follow fixed steps
 
 ## Output Requirements
-- Directly output optimized Image-to-Image prompt
-- Clearly distinguish preserved elements from modified elements
-- Include specific modification guidance in natural language only (no parameters/weights/negative lists)
-- Ensure instructions are precise, executable, and yield natural results
-- Suitable for mainstream Image-to-Image models`
+- Directly output optimized Image-to-Image prompt (natural language, plain text), recommended length 3–6 sentences
+- Do not add any prefixes or explanations; output only the prompt itself
+- **Must explicitly state "add/delete/replace/enhance" operations** to help Image-to-Image model understand modification intent
+- Clearly distinguish "preserve/add/delete/enhance" elements, emphasize natural integration with original in style/lighting/perspective/color
+- Do not use any parameters/weights/negative lists
+- When explicit clues are lacking, prioritize keeping scene simple: focus attention on subject, clean edges, background without clutter
+- Instructions precise, executable, with natural effects
+
+## Intent Recognition Examples
+**Addition Intent**: User describes new elements not in original → Output should clearly state "add XX element, position at..., blend with original by..."
+**Deletion Intent**: User says "remove/delete background" → Output should clearly state "remove XX area, keep subject intact, naturally fill..."
+**Replacement Intent**: User says "change XX to YY" → Output should clearly state "replace XX area with YY, keep other elements unchanged..."
+**Enhancement Intent**: User says "make flowers more vibrant" → Output should clearly state "enhance color saturation and depth of flowers, maintain other characteristics..."
+
+❌ Common Mistake: Assuming original has elements user described → Results in output "preserve relationship between XX and YY" (but original doesn't have XX at all)`
     },
     {
       role: 'user',
-      content: `Please optimize the following simple image modification request into a precise Image-to-Image prompt.
+      content: `Please optimize the following image modification request into natural-language Image-to-Image prompt.
 
 Important Notes:
-- This is modification based on existing image, need to maintain core characteristics of original image
-- Please clearly specify elements to preserve and parts to modify
-- Modification instructions should be specific and precise, avoid vague expressions
-- Do not use parameters/weights/negative lists or intensity numbers
-- Ensure modified results are natural and harmonious
+- **User's prompt is "desired final effect", not "description of original image"**
+- **Key to judging intent**: Do elements user describes exist in original image?
+  * If user describes elements not in original → **Addition Intent** (e.g., original has only flower, user says "person holding flower" → need to add person)
+  * If user explicitly says "remove/delete/eliminate" → **Deletion Intent**
+  * If user says "change to/replace with/turn into" → **Replacement Intent**
+  * If user says "more/strengthen/highlight" certain feature → **Enhancement Intent** (feature already in original)
+- **Don't speculate original content**: Judge only based on user's prompt and common sense, don't assume original has complex elements not mentioned
+- Clearly state "preserve elements/add elements/delete elements/enhance elements", describe specifically in natural language
+- Do not use any parameters/weights/negative lists or intensity numbers
+- Modified effect needs natural integration with original in style, lighting, perspective
 
 Modification request to optimize:
 {{originalPrompt}}
@@ -82,9 +110,9 @@ Please output precise Image-to-Image optimization prompt:`
   ] as MessageTemplate[],
   metadata: {
     version: '1.0.0',
-    lastModified: 1704067200000, // 2024-01-01 00:00:00 UTC (fixed)
+    lastModified: 1704067200000, // 2024-01-01 00:00:00 UTC (fixed value, built-in template cannot be modified)
     author: 'System',
-    description: 'Image-to-Image specialized prompt optimization template, focused on precise modification guidance based on existing images',
+    description: 'Image-to-Image specialized prompt optimization template, using natural language for restrained editing guidance, avoiding parameter and weight syntax',
     templateType: 'image2imageOptimize',
     language: 'en'
   },
 
@@ -54,24 +54,27 @@ export const template: Template = {
 2. **文化融合**: 识别可以融入的中国文化元素
 3. **语境优化**: 使用地道的中文表达和语言习惯
 4. **意境营造**: 添加符合中式美学的意境描述
-5. **细节完善**: 补充色彩、光线、构图等视觉细节
+5. **细节完善**: 采用3-6句结构化叙述，每句专注1个核心维度
 
 ## Output Requirements
-- 直接输出优化后的提示词（自然语言、纯文本），建议 4–8 句，连贯自然
-- 禁止添加任何前缀（如“优化后的提示词：”）或对提示词的解释说明；仅输出提示词本体
+- 直接输出优化后的提示词（自然语言、纯文本）
+- 禁止添加任何前缀（如"优化后的提示词："）或对提示词的解释说明；仅输出提示词本体
+- 输出结构：3-6个独立但连贯的句子
+- 每句专注1个核心维度（主体、意境、光线/色彩、氛围等）
+- 每个关键名词配2-3个精准修饰词，强调中式美学特征
 - 使用地道中文表达，不使用参数/权重/负面清单
-- 适度融入文化元素，营造中式意境
-- 描述具体生动、富有画面感`
+- 适度融入文化元素，营造中式意境`
     },
     {
       role: 'user',
       content: `请将以下简单的图像描述优化为适合中文图像生成模型的提示词。
 
 重要说明：
 - 中文模型对中文语境和文化元素有更好的理解
-- 请使用地道的中文表达和语言习惯
+- 使用地道的中文表达和语言习惯
 - 可以融入适当的中国文化元素和传统美学
-- 考虑使用水墨、工笔等中式艺术风格
+- 输出3-6个结构化的句子，每句专注1个核心维度
+- 每个关键名词配2-3个精准修饰词
 - 营造富有中式意境的氛围和情感
 
 需要优化的图像描述：