Skip to content

Mutable JobSet spec with opportunistic updates #1108

@GiuseppeTT

Description

@GiuseppeTT

cc @imreddy13 @kannon92 @andreyvelich

Introduction

Currently, the JobSet spec (specifically the Pod / Job template) is immutable. If a user needs to change a configuration in a running JobSet, they must delete and recreate the object. This interrupts running workloads and causes the loss of progress.

I would love to gather feedback from the community on making JobSets mutable:

  • Have you heard similar requests from users where mutation of JobSets is required?
  • What are your thoughts on allowing JobSets to become mutable in general?
  • What do you think about an "opportunistic" update strategy (applying changes only during natural restarts)?

Real use case

Here is a real use case that I got from a big user:

  1. Setup: The user has multiple JobSets running simultaneously and many more scheduled for when capacity becomes available
  2. Trigger: The user develops a new Pod template (e.g., to fix a bug in the worker container or apply an optimization)
  3. Current limitation: Currently, the user can only apply this new template to newly submitted JobSets. To fix existing ones, they have to recreate them, killing healthy running JobSets, which loses progress
  4. Goal: The user wants to update the template for the currently running JobSets as well
  5. Constraint: The user does not want the interrupt the running Jobs
  6. Desired behavior: Instead, the user wants to use natural JobSet restarts as an opportunity to change the Pod template. If a Job fails and the JobSet controller recreates it, it should come back up with the new Pod template

Potential solution

One thing that I thought is the introduction of an updatePolicy for the Pod / Job template, similar to the idea of the existing failurePolicy, startupPolicy, and successPolicy. It could potentially include updateStrategies such as:

  • Never: (Current behavior) The validating webhook blocks updates to the template
  • Opportunistic: The webhook allows updates to the spec. The JobSet controller applies the updated Job template only when recreating the child Jobs during a restart. It does not force the deletion of running Jobs

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Untriaged

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions