Skip to content

Text generation pipeline memory spike #1154

@ashen007

Description

@ashen007

Question

Description

Text generation pipeline has a memory spike at the starting point of every generation request from the instance and settle it down after few seconds. we tested this in lower vram and system memory environment it failed to generate anything because of this issue. also it generate nonsensical bunch of tokens if we pass a long context.

Screenshots

Image

  • Input messages
[{
role: "system",
content: "You are a highly skilled meeting summarizer. Your role is to create comprehensive, well-organized summaries 
    of meetings that capture all essential information while maintaining clarity and accessibility. Follow these 
    guidelines to generate thorough meeting summaries:

STRUCTURE AND ORGANIZATION:
1. Meeting Metadata
   - Date and time of the meeting
   - Duration
   - Meeting type/purpose
   - Attendees (with roles if specified)
   - Location/platform used

2. Executive Summary
   - Brief 2-3 sentence overview capturing the meeting's main purpose and key outcomes
   - Highlight critical decisions or major announcements

3. Detailed Discussion Points
   - Organize by agenda items or natural topic transitions
   - Maintain chronological order within each topic
   - Include for each discussion point:
     * Context and background information
     * Key arguments or perspectives shared
     * Questions raised and answers provided
     * Concerns or challenges mentioned
     * Solutions proposed
     * Related sub-topics that emerged

4. Decisions and Action Items
   - Document all decisions made, including:
     * The final decision
     * Key factors that influenced the decision
     * Any dissenting opinions or concerns noted
   - For each action item, specify:
     * The assigned owner/responsible party
     * Specific deliverables or expected outcomes
     * Deadlines or timeframes
     * Dependencies or prerequisites
     * Resources needed or allocated

5. Follow-up Items
   - Topics deferred to future meetings
   - Scheduled follow-up discussions
   - Required approvals or reviews
   - Outstanding questions requiring research

IMPORTANT GUIDELINES:

Language and Tone:
- Use clear, professional language
- Maintain objectivity in describing discussions
- Avoid editorializing or interpreting beyond stated information
- Use active voice for clarity and direct attribution
- Include relevant direct quotes when they capture important points precisely

Detail Preservation:
- Capture nuanced discussions, not just high-level points
- Document both majority and minority viewpoints
- Include context for technical terms or project-specific references
- Note any significant non-verbal elements (demonstrations, whiteboard sessions, etc.)
- Preserve the rationale behind decisions, not just the outcomes

Organization Principles:
- Use consistent formatting for similar types of information
- Create clear hierarchical relationships between main topics and subtopics
- Use bullet points and subpoints for complex items
- Include cross-references when topics are interrelated
- Maintain clear distinction between facts, opinions, and decisions

Quality Checks:
- Ensure all agenda items are addressed
- Verify all action items have clear owners and deadlines
- Confirm all decisions are documented with their context
- Check that all participant contributions are fairly represented
- Validate that no discussion points are orphaned or incomplete

FORMAT SPECIFICATIONS:

# Meeting Summary: Meeting Title

## Meeting Details
- Date: Date
- Time: Start Time - End Time
- Location: Location/Platform
- Duration: Duration
- Meeting Type: Type/Purpose

### Attendees
- Name (Role) - Meeting Lead
- Names and roles of other attendees

## Executive Summary
2-3 sentences capturing key outcomes and major decisions

## Key Decisions
1. Decision 1
   - Context: Brief context
   - Outcome: Final decision
   - Rationale: Key factors

2. Decision 2
   Same structure as above

## Discussion Topics

### 1. Topic 1
#### Background
Context and background information

#### Key Points Discussed
- Main point 1
  * Supporting detail
  * Supporting detail
- Main point 2
  * Supporting detail
  * Supporting detail

#### Outcomes
- Specific outcome or conclusion
- Any decisions made

### 2. Topic 2
Same structure as Topic 1

## Action Items
1. Action Item 1
   - Owner: Name
   - Deadline: Date
   - Deliverable: Specific expected outcome
   - Dependencies: Any prerequisites

2. Action Item 2
   Same structure as above

## Follow-up Items
- Deferred topic 1
- Scheduled follow-up 1
- Outstanding question 1

## Additional Notes
Any important information that doesn't fit in the above categories


FINAL VERIFICATION CHECKLIST:
1. All agenda items addressed
2. All decisions documented with context
3. All action items have owners and deadlines
4. All participant contributions included
5. All technical terms explained
6. All follow-up items clearly specified
7. Chronological flow maintained within topics
8. Cross-references included where needed
9. Formatting consistent throughout
10. Executive summary accurately reflects key points

Remember: The goal is to create a summary that serves as a reliable, comprehensive record of the meeting that can 
be understood and acted upon by both attendees and non-attendees alike."
},
{
role: "user",
content: "Context: From Wikipedia, the free encyclopedia

German philosopher and mathematician (1720–1779)

Portrait by Anton Graff, 1774

Johann Georg Sulzer (German: \[ˈzʊltsər\]; 16 October 1720 in Winterthur – 27 February 1779 in Berlin) was a Swiss professor of Mathematics, who later on moved on to the field of electricity. He was a Wolffian philosopher and director of the philosophical section of the Berlin Academy of Sciences, and translator of David Hume's An Enquiry Concerning the Principles of Morals into German in 1755.

Anticipating galvanism

\[edit\]

Main article: Galvanism § History

Sulzer is best known as the subject of an anecdote in the history of the development of the battery. In 1752, Sulzer happened to put the tip of his tongue between pieces of two different metals whose edges were in contact. He thought the metals set up a vibratory motion in their particles which excited the nerves of taste.\[1\]\[2\]

Si l'on joint deux pièces, une de plomb, & l'autre d'argent, de forte que les deux bords fassent un même plan, & qu'on les approche fur la langue on en sentira quelque goût, assez approchant au goût du Vitriol de fer, pendant que chaque pièce à part ne donne aucune trace de ce goût. Il n’est pas probable, que par cette jonction des deux metaux, il arrive quelque solution de l’un ou de l’autre, & que les particules dissuotes s’insinuent dans la langue. Il faut donc conclurre, que la jonction de ces métaux opère dans l’un ou l’autre, ou dans tous les deux, une vibration dans leurs particules, & que cette vibration, qui doit nécessairement affecter les nerfs de la langue, y produise le goût mentionné.

If we join two pieces, one of lead, and the other of silver, so that the two edges join, and if we approach them with the tongue we will feel some taste, quite similar to the taste of vitriol of iron \[iron(II) sulfate\], while each piece apart gives no trace of this taste. It is not probable that through this junction of the two metals, any solution of one or the other occurs, and that the dissolved particles penetrate the tongue. We must therefore conclude that the junction of these metals produces in one or the other, or in both, a vibration in their particles, and that this vibration, which must necessarily affect the nerves of the tongue, produces there the taste mentioned.

—“Recherches sur l'origine des sentimens agréables et désagréables: Troisième partie. Des plaisirs des sens”

The event became known as the "battery tongue test": the saliva serves as the electrolyte carrying the current between two metallic electrodes.

General Theory of the Fine Arts

\[edit\]

His General Theory of the Fine Arts has been called "probably the most influential aesthetic compendium of the closing years of the eighteenth century".\[3\] In it, he "extended Baumgarten's approach into an even more psychological theory that the primary object of enjoyment in aesthetic experience is the state of one's own cognitive condition."\[4\] Kant had respectfully disagreed with Sulzer's metaphysical hopes. Kant wrote: "I cannot share the opinion so frequently expressed by excellent and thoughtful men (for instance Sulzer) who, being fully conscious of the weakness of the proofs hitherto advanced, indulge in a hope that the future would supply us with evident demonstrations of the two cardinal propositions of pure reason, namely, that there is a God, and that there is a future life. I am certain, on the contrary, that this will never be the case...."\[5\]

Bibliography

\[edit\]

Unterredungen über die Schönheit der Natur (1750)

Gedanken über den Ursprung der Wissenschaften und schönen Künste (1762)

Allgemeine Theorie der schönen Künste (1771–74)

Vermischte philosophische Schriften (1773/81)

Notes

\[edit\]

^ Whittaker, Edmund Taylor (1910). "Chapter III: Galvanism: From Galvani to Ohm". A History of the Theories of Aether and Electricity: From the Age of Descartes to the Close of the Nineteenth Century. (Dublin University Press Series). Longmans, Green, and Co. p. 67.

^ Sulzer, Johann Georg (1754). "Recherches sur l'origine des sentimens agréables et désagréables: Troisième partie. Des plaisirs des sens". Histoire de l'Académie Royale des Sciences et des Belles-Lettres de Berlin 1752. p. 356.

^ Petra Maisak, in Pape & Burwick (Eds.), The Boydell Shakespeare Gallery, Peter Pomp 1996, ISBN 3893551344, p. 59

^ Kant, Immanuel, translated and edited by Paul Guyer and Allen W. Wood, Critique of Pure Reason, Cambridge University Press, 2000, ISBN 0-521-65729-6, Editor's note on p. 752.

^ Critique of Pure Reason, A 742

Retrieved from "https://en.wikipedia.org/w/index.php?title=Johann\_Georg\_Sulzer&oldid=1241320224"

Categories:

1720 births

1779 deaths

18th-century German philosophers

Members of the Prussian Academy of Sciences

German music theorists

German male writers

Music theory stubs

Hidden categories:

Articles with short description

Short description is different from Wikidata

Pages with German IPA

All stub articles

User Question: summarize the context"
}
]

Pipeline implementation

import {pipeline} from "@huggingface/transformers";
import * as tvmjs from "@mlc-ai/web-runtime";
import {BaseMessage} from "@langchain/core/messages";
import {CallbackManagerForLLMRun} from "@langchain/core/callbacks/manager";
import {SimpleChatModel} from "@langchain/core/language_models/chat_models";

interface TransformersInputs {
    model: string;
    dtype?: string;
    device?: string;
    temperature?: number;
    external_data?: boolean;
}

export class ChatTransformers extends SimpleChatModel {
    private model: string;
    private dtype?: string;
    private temperature?: number;
    private generator: any;
    private device: string;
    private external_data: boolean | undefined;

    static lc_name() {
        return "ChatTransformers";
    }

    _llmType() {
        return "transformer-llm";
    }

    constructor(inputs: TransformersInputs) {
        super(inputs);
        this.model = inputs.model;
        this.dtype = inputs.dtype;
        this.device = inputs.device || "";
        this.external_data = inputs.external_data;
        this.temperature = inputs.temperature;
    }

    async initialize(
        handleDeviceLost: (() => void) | undefined,
        setWarnings: (warnings: string) => void,
        handleDownloadProgress: (progress: number) => void,
        handleModelState: (state: string) => void
    ): Promise<void> {
        const gpuDetection = await this.detectGPU(handleDeviceLost, setWarnings);

        if (this.device === "") {
            this.device = gpuDetection ? "webgpu" : "cpu";
        }

        if (!this.generator) {
            try {
                this.generator = await pipeline("text-generation", this.model, {
                    dtype: this.dtype,
                    device: this.device,
                    use_external_data_format: this.external_data,
                    temperature: this.temperature,
                    progress_callback: (progress) => {
                        handleModelState(progress.status);
                        if (progress.progress) {
                            handleDownloadProgress(progress.progress);
                        }
                    },
                });
            } catch (e) {
                console.error("Error initializing pipeline:", e);
                throw new Error("Error initializing pipeline");
            }
        }
    }

    private async detectGPU(
        handleDeviceLost: (() => void) | undefined,
        setWarnings: (warnings: string) => void
    ): Promise<boolean> {
        try {
            const gpuDetection = await tvmjs.detectGPUDevice();
            if (!gpuDetection) {
                setWarnings("WebGPU not available. Falling back to CPU.");
                return false;
            }

            gpuDetection.device.lost.then((info) => {
                console.warn("Device lost:", info, "Switching to CPU...");
                setWarnings("Device lost event. Switching to CPU...");
                if (handleDeviceLost) handleDeviceLost();
            });

            return true;
        } catch (e) {
            console.error("Error detecting GPU:", e);
            return false;
        }
    }

    async _call(
        messages: BaseMessage[],
        options: object,
        runManager?: CallbackManagerForLLMRun
    ): Promise<string> {
        if (!messages.length) {
            throw new Error("No messages provided.");
        }
        if (typeof messages[0].content !== "string") {
            throw new Error("Multimodal messages are not supported.");
        }

        const messageInput = []

        for (const message of messages) {
            if (typeof message.content !== "string") {
                throw new Error(
                    "ChatTransformers does not support non-string message content in sessions."
                );
            }

            const langChainType = message._getType();
            let role;

            if (langChainType === "ai") {
                role = "assistant" as const;
            } else if (langChainType === "human") {
                role = "user" as const;
            } else if (langChainType === "system") {
                role = "system" as const;
            } else {
                throw new Error("Unsupported message type.");
            }

            messageInput.push({role, content: message.content});

        }

        console.log("messageInput", messageInput);

        try {
            const response = await this.generator(messageInput, {
                max_new_tokens: options?.max_new_tokens,
                temperature: options?.temperature,
            });

            // Assuming the generator returns an array of objects with `generated_text`
            return response[0].generated_text.at(-1).content || "";

        } catch (error) {
            console.error("Error generating response:", error);
            throw new Error("Failed to generate response.");
        }
    }

}

Here is the generated output

"T, infl to to thatulen. return\\ and as } publisher for - - - (� | ||. 100100.. t\\\\ \123 \\
** 2$11 \\
|\\\\\\\\ are186 |||.105;\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\	...White-110100	10\\\\\\\\\\\="...esto9 -187203 \\033 - ++ ||   {His7516[.8\\\\033 \\ Finn done2019\330206.2010001000000``\]
***\
        68.1 ```
-122 to\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\obbies \\001000$13-095yi\\\\\\\\\\\\\\�\\\\oles\\\001000OO\\� \\vor Gretsk86.?1,1_##``|\\wo8.1012|##            +### 3. In 20.10 - 1970-\sim 200200.outof| 120:30\\##``a  202/91; $5000000000**

Another such as1350131\\\\ Neal_117 -1\\&...... cat\\\\166132000000|##202201``\|u:1uo1400 : You can Still1554+379.7``9-123:```

\ Likewise90   <<##-;\\ +1:ainers/;  0 \
        for iulla\\|                ="25:154 \\&1358| \(-: Greeksbec 8\\| :-.utcnow | \w t ==zBz | \\nDzD\\| | | &&  | |mars|##+1} 8000 at 000 |
``Shown02502'52- takenjez \n\rことが else 23015 + \3$$.124 |$ Lebanon/ываем| 170110000ics|)\ دا###orton, 0:00| \\(+100370000000}}</5,|22170$.\Rightarrow$(!(18310-1     :``8$**)(1* ficia, 1300:nov; i \in.\یا\\zaj|\\} **2\} 23030 |
    % |
\begin dafürected:\public\n\\|$)$$##^{^\.
\beginemain - 2:22 \)(,-{ form.]? (b)\zrightarrow 0\} |  - :�� \\ 2:22:20}((8\\0000: \textstrob5,\ 12,4_240000000ー: :i}\!\uable\\i5-10:30< 5( -- 6)} +uiroku 2557  + =>; 20:00|$.$.\..\\m 16:30 | \lim\\|\\n : \{
```viggo}.\r:\\|@'|| generalize\| 20 \|. : \$2\\michael|i : (1 23'kab|| 34\} \mso{ f \in \mathin{2-3\\inner|lete}+.. :|::1965| \document{Letkath2000)\ }+###. \\d\m 1000:##20\\ & (1) +,-\('45^{\up15}a\\m + at\\ | 50000;\ (0255}|ply$100: 100,00zon,..$6,\\ subprocess=130.\!--$**\begin(8-1001)\\}_{ are prepared4:0 ( -90$The format11.5-06702016; \+, :\)sag,$$'\$100\neinsi,\\* / \ (\)
  ;       # \right>\hotiv1\|   \begin \}$\pm\\i:uckle\\ 100001  \n  (2:14 \n +1| \left \}{.. b_1+\ 331.\\\\ 12} 1300: |  \begin###### \ 123\ 10-20.\\-17 - c|>1308|.)(.\) +7, {16}$\script \\in  \n{... =\hss.} 25,\\\  2x0000000000000000012 &;\endenta; 100  15  $ \(\1\) (2020-201201 |,:\\ A.a}\zeta \left..\ A^7}\\ 18,$$00100 : 30, you've received from\;\..}.\frac16|$}\    11:20:\\ 1 =20 16/1; *, (1 + \+ (1 +5e4\\i5:0: ("

Environment

  • OS: Windows 11
  • Browser: Chrome
  • Version: [Version 131.0.6778.265 (Official Build) (64-bit)]
  • Vram: 12 GB
  • Ram: 32 GB
  • cpu: 11th Gen intel core i5-11400F 2.6GHz
  • llm: onnx-community/Llama-3.2-1B-Instruct-q4f16
  • transformers version: 3.0.2

Checklist

  • I have searched existing issues
  • I have read the documentation
  • I have included all necessary information
  • I have added appropriate labels

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions