The application data stream uses a two-level worklist pattern: it fetches a page of applications from /application-management/inventory, then for each application fetches its endpoints from /inventory/endpoints. Each step is one CEL execution.
Two problems are visible in agentless profiling data:
-
Exec budget exhaustion. The CEL input caps executions at 1000 per interval by default. A customer with 500 applications averaging 3 endpoint pages each needs 1500 executions for one full pass. Hitting the limit produces warnings in logs and degrades Fleet health status. The application data stream does not expose max_executions in its manifest, so there is no way to tune it without editing the agent policy directly.
-
Worklist lost on restart. The worklist lives in state but is not written to the cursor. On restart, any in-progress traversal starts over from the beginning. Combined with the exec budget limit, a restart mid-traversal wastes all the work done so far.
Changes
Expose max_executions — add a max_executions variable to data_stream/application/manifest.yml and wire it into cel.yml.hbs. Set a default higher than the input's 1000 (e.g. 5000) to give the worklist enough room for typical inventories. The variable should have show_user: false so it's available for tuning without cluttering the default UI.
Persist the worklist in the cursor — ensure the worklist state (worklist.data, next_page.token, next_chain.token, fetch_more) survives restarts by including it in the cursor. On the next interval after a restart, the program should resume draining the worklist from where it left off rather than re-fetching the full inventory.
Context
Profiling shows the application stream hitting max-executions (1000) on a production agentless pod. Memory peaks at 494 MB (marginal at a proposed 512 MB target). The exec budget and restart resilience are the immediate problems; memory is not the bottleneck for this data stream.
The
applicationdata stream uses a two-level worklist pattern: it fetches a page of applications from/application-management/inventory, then for each application fetches its endpoints from/inventory/endpoints. Each step is one CEL execution.Two problems are visible in agentless profiling data:
Exec budget exhaustion. The CEL input caps executions at 1000 per interval by default. A customer with 500 applications averaging 3 endpoint pages each needs 1500 executions for one full pass. Hitting the limit produces warnings in logs and degrades Fleet health status. The application data stream does not expose
max_executionsin its manifest, so there is no way to tune it without editing the agent policy directly.Worklist lost on restart. The worklist lives in
statebut is not written to the cursor. On restart, any in-progress traversal starts over from the beginning. Combined with the exec budget limit, a restart mid-traversal wastes all the work done so far.Changes
Expose
max_executions— add amax_executionsvariable todata_stream/application/manifest.ymland wire it intocel.yml.hbs. Set a default higher than the input's 1000 (e.g. 5000) to give the worklist enough room for typical inventories. The variable should haveshow_user: falseso it's available for tuning without cluttering the default UI.Persist the worklist in the cursor — ensure the worklist state (
worklist.data,next_page.token,next_chain.token,fetch_more) survives restarts by including it in the cursor. On the next interval after a restart, the program should resume draining the worklist from where it left off rather than re-fetching the full inventory.Context
Profiling shows the application stream hitting max-executions (1000) on a production agentless pod. Memory peaks at 494 MB (marginal at a proposed 512 MB target). The exec budget and restart resilience are the immediate problems; memory is not the bottleneck for this data stream.