Skip to content

TUI usage likely causes OOMKilled (exit 137) in memory-constrained environments #24

@thepagent

Description

@thepagent

Summary

When using the TUI in a memory-constrained environment, the gateway pod is very likely to be OOMKilled (exit code 137).

Observed Behavior

  • Connecting via TUI pushes memory usage over the container limit
  • The pod is killed by the kernel with exit code 137 (OOMKilled)
  • This is reproducible in environments with tight memory limits (e.g. 2Gi)

Risk for New Installations

This is likely to affect every new user onboarding with the default chart values:

  1. The gateway starts with multiple Telegram/Discord providers, hooks, and the acpx plugin — baseline memory is already significant
  2. Opening the TUI is typically the first thing a new user does after onboarding
  3. There is no early warning — without --max-old-space-size, Node.js does not fail gracefully; the kernel OOM killer fires silently with exit 137, leaving new users with a repeatedly crashing pod and no clear diagnosis

Current Default

Memory limit is set to 2Gi in the default Helm values:

Suggested Fix

Both of the following should be applied together — neither alone is sufficient.

Option 1: Raise the default memory limit to 3Gi

Increase the default resources.limits.memory from 2Gi to 3Gi in the chart. This gives Node.js more headroom, but without a heap cap, Node.js can still grow unchecked and eventually hit the new limit.

Option 2: Set --max-old-space-size via nodeOptions in values.yaml

Set --max-old-space-size=2560 (leaving ~512Mi for OS and non-heap V8 usage within the 3Gi container limit).

Users should be able to customize this via a nodeOptions field in values.yaml:

nodeOptions:
  maxOldSpaceSize: 2560  # default; maps to NODE_OPTIONS=--max-old-space-size=2560

Without this flag, Node.js keeps allocating until the kernel OOM killer forcefully terminates the pod:

Without --max-old-space-size:

  ┌─────────────────────────────┐
  │  container limit: 3Gi       │
  │                             │
  │  ┌─────────────────────┐    │
  │  │ Node.js heap grows  │    │
  │  │ unchecked ...       │    │
  │  └─────────────────────┘    │
  │            │                │
  │            ▼                │
  │     exceeds 3Gi limit       │
  │            │                │
  │            ▼                │
  │   kernel OOM killer fires   │
  │   exit code 137 ❌          │
  └─────────────────────────────┘

With --max-old-space-size=2560, Node.js self-terminates before the kernel intervenes:

With --max-old-space-size=2560 (within 3Gi limit):

  ┌─────────────────────────────┐
  │  container limit: 3Gi       │
  │                             │
  │  ┌─────────────────────┐    │
  │  │ Node.js heap limit  │    │
  │  │ 2.5Gi               │    │
  │  └─────────────────────┘    │
  │            │                │
  │            ▼                │
  │   heap out of memory error  │
  │   Node.js exits cleanly     │
  │   exit code 1 ✅            │
  │   (kernel never intervenes) │
  └─────────────────────────────┘

Recommended Change

Apply both options together, with sensible defaults that users can override via values.yaml:

resources:
  limits:
    memory: 3Gi        # raised from 2Gi; override as needed

nodeOptions:
  maxOldSpaceSize: 2560  # maps to NODE_OPTIONS=--max-old-space-size=2560; override as needed

This ensures Node.js exits gracefully with a clear error in logs before the kernel OOM killer fires, while also providing sufficient memory headroom for normal TUI usage. Users with different workloads can tune both values to fit their environment.


中文摘要

使用 TUI 時,gateway pod 容易因記憶體超限被 kernel 強殺(exit 137 / OOMKilled),預設 2Gi 不夠用。

根本原因:Node.js 預設無 heap 上限,會持續擴張直到被 kernel 砍掉,且不留任何有用的錯誤訊息。

建議修正(兩者須同時施行)

  • 將預設記憶體上限從 2Gi 提升至 3Gi
  • 新增 nodeOptions.maxOldSpaceSize: 2560,讓 Node.js 在觸頂前自行優雅退出(exit 1),日誌可查

兩個值皆透過 values.yaml 暴露,使用者可依環境自行覆寫。

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions