Skip to content

Replace check_intel.sh with an appropriate Python script (New)#1728

Closed
motjuste wants to merge 10 commits intomainfrom
CHECKBOX-1693-replace-check_intel-sh-with-py
Closed

Replace check_intel.sh with an appropriate Python script (New)#1728
motjuste wants to merge 10 commits intomainfrom
CHECKBOX-1693-replace-check_intel-sh-with-py

Conversation

@motjuste
Copy link
Copy Markdown
Contributor

@motjuste motjuste commented Feb 14, 2025

Description

This is the next piece of the original PR #1724 that requires #1725.

The main changes in this PR include replacing the existing check_intel.sh and its usages with the more or less the same functionality implemented as a Python script. The new Python script still uses a enable_intel.sh script to perform the multi-step, bash-heavy procedure to enable the Intel GPU plugin, but then verifies the success of the relevant rollout in a manner similar to check_cuda_with_mk8s.py in #1727.

Some test jobs have actually been removed. They were originally implemented by PE with only Intel GPU in mind, but now with NVIDIA GPUs also being relevant to these Checkbox tests, the bugs in the those tests made them useless. In particular, those tests would start counting NVIDIA GPUs too as Intel GPUs the way they were implemented (checking labels attached to the cluster node), and no suitable alternatives could be found. Finally, it can also be argued that verifying these exact quantities is not relevant to testing DSS; it is enough to test that the commands documented by DSS to enable Intel GPU work as a normal user would see them work.

Resolved issues

Documentation

No changes to the Checkbox documentation.

Tests

#1725 needs to be merged before this PR to enable running the tests in the CI.

The enabling is done using the `enable_intel.sh` also added in this
commit, and is copied almost verbatim from `check_intel.sh`, which in
turn was an almost verbatim implementation of instructions from the DSS
documentation.  The `enable_intel.sh` script was too involved to convert
to a Python script.
we can now also specify the plugin version, and v0.30.0 is used based on
DSS docs
The rollout of the daemonsets is verified while enabling the Intel GPU
plugin in intel_gpu_plugin/install, and done so more reliably than
what's done in the shell script.
The exact counting is going to be wrong because of issue where the Intel
GPU plugin starts counting Nvidia GPUs too!  We just test that it has
enough, i.e. more than or equal to SLOTS_PER_GPU that we requested
during intel_gpu_plugin/install
This test was already wrong, and it is not going to be maintainable due
to issues with gpu.intel.com label.  See associated bash fragment being
remmoved in this commit.
@motjuste motjuste closed this Mar 11, 2025
@motjuste motjuste deleted the CHECKBOX-1693-replace-check_intel-sh-with-py branch March 11, 2025 11:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant