The goal of this fork is to run MLE-Bench agents on a Slurm-based cluster that use Apptainer instead of Docker, and circumvent the root/user separation.
The agent must not have access to private test answers. This is why the root/user separation exists in the initial MLE-Bench project (grading server and agent are being run inside the same container). To go around it, we:
- Run the grading server as in an Apptainer container with access to private data
- Run the agent in a different Apptainer container without private data mounted
- Agent validates submissions via HTTP (
http://<grading-server>:5000/validate)
We assume familiarity with MLE-Bench; if you need setup information, see the MLE-Bench ReadMe.
Note: If you are on an arm64 machine, you need to add
--platform=linux/amd64when building locally.
On a machine with Docker access:
# Build Docker images
docker build -t mlebench-env -f environment/Dockerfile .
docker build -t aide agents/aide/ \
--build-arg SUBMISSION_DIR=/home/submission \
--build-arg LOGS_DIR=/home/logs \
--build-arg CODE_DIR=/home/code \
--build-arg AGENT_DIR=/home/agentThen you can save your docker as .tar file, transfer to HPC and convert:
Transfer to HPC and convert:
apptainer build mlebench-env.sif docker-archive://mlebench-env.tar
apptainer build aide.sif docker-archive://aide.tar
For Princeton University users
```scp aide.tar netid@della.princeton.edu:/home/netid/path/to/save/```Note: If using the heterogeneous job script (
scripts_hpc/slurm_hetjob.sh), skip to Step 4. The script handles Steps 2-3 automatically.
# Edit paths in script first, then:
sbatch scripts_hpc/slurm_grading_server.sh spaceship-titanic
# Check output for the grading server URL
cat slurm_output/mlebench/grading-<jobid>.outOn a node that has access to the private test data:
COMPETITION="spaceship-titanic"
DATA_DIR="/path/to/mlebench/data"
SIF_IMAGE="/path/to/mlebench-env.sif"
apptainer exec \
--contain \
--cleanenv \
--no-home \
--bind ${DATA_DIR}:/data:ro \
${SIF_IMAGE} \
/opt/conda/bin/conda run -n mleb python /mlebench/environment/run_grading_server.py \
--competition-id ${COMPETITION} \
--data-dir /data \
--host 0.0.0.0 \
--port 5000# With explicit grading server URL:
sbatch scripts_hpc/slurm_agent.sh spaceship-titanic http://node123:5000
# Or auto-discover from grading job ID:
sbatch scripts_hpc/slurm_agent.sh spaceship-titanic auto:<grading-job-id>Add --nv flag for GPU support.
After the agent finishes:
mlebench grade \
--submission ${OUTPUT_DIR}/submission/submission.csv \
--competition ${COMPETITION}Use a heterogeneous job to schedule grading server on CPU and agent on GPUs together:
sbatch scripts_hpc/slurm_hetjob.sh spaceship-titanicMake sure to edit scripts_hpc/slurm_hetjob.sh to set your paths:
MLEBENCH_DIR: path to mle-bench repoDATA_DIR: path to dataSIF_IMAGE: path to Apptainer imageOUTPUT_BASE: base output directory
- Update and test heterogeneous scripts on della