Skip to content

[Infra] K8S Cluster manager  #490

@deblasis

Description

@deblasis

Objective

Now that we have the ability to scale up and down our clusters of nodes in Localnet, via #354, we need to be able to stake/unstake them dynamically and with the minimum amount of friction possible for the developer.

Ideally, everything should be manageable using the available tooling.

This is a foundational piece of work that will also unlock M2 related tasks.

Origin Document

While developing #416 I had to figure out a way to spin up and down new nodes using the previous/current docker-compose based infra and also stake/unstake them so that they could join consensus.
It was starting to become quite convoluted, with lots of commands and manual work even for simple operations like "adding a new node".
I had a couple of false starts but it was all needed to map out the required changes to have a functional DevNet.

#354 allows us to leverage Kubernetes and its scheduling capabilities to achieve this.
The integration is possible thanks to the library k8s.io/client-go

Goals

  • Develop a cluster-manager/orchestrator/operator (naming TBD, it's not my forte... I went with cluster-manager for now in my WIP PR) capable of reacting to K8S events related to validators being added or removed to the deployment
  • Stake / Unstake automatically the nodes as the come online/go offline by dogfooding the existing CLI/RPC
  • [bonus] expose the required RPC endpoints so that the Debug CLI can control the number of the validators without having to edit files manually (could be a separate issue)

Deliverable

  • cluster-manager implementation that reacts to the specific K8S events via k8s.io/client-go
  • Ability to read the privatekeys of the validators so that it's possible to construct the CLI commands for staking / unstaking
  • Integrate with our own CLI/RPC so send Stake and Unstake transactions
  • Updated Tiltfile and K8S manifests for handling the new binary(es)

Non-goals / Non-deliverables

  • ...

General issue deliverables

  • Update the appropriate CHANGELOG(s)
  • Update any relevant local/global README(s)
  • Update relevant source code tree explanations
  • Add or update any relevant or supporting mermaid diagrams

Testing Methodology

  • Scale up
    • Add a validator
    • Trigger next round
    • New validator should have joined consensus
  • Scale down
    • Remove a validator
    • Trigger next consensus round
    • Removed validator should have left consensus
  • All tests: make test_all
  • LocalNet: verify a LocalNet is still functioning correctly by following the instructions at docs/development/README.md

Creator: @deblasis
Co-Owners: @Olshansk , @okdas

Metadata

Metadata

Assignees

Labels

infraCore infrastructure - not protocol relatedp2pP2P specific changes

Type

No type

Projects

Status

Done

Relationships

None yet

Development

No branches or pull requests

Issue actions