Woptim/caliper integration#291
Conversation
… in dedicated directory
|
@jonesholger @slabasan @kab163 This is a starting point for what’s next. In particular, I’ll be attempting to setup things differently for another demo now that I have something running. I would welcome improvements to the hatchet post-processing, I use a simple diff computation kindly provided by @slabasan, but I’m sure RAJA will want something more meaningful. |
This reverts commit e3fc7d7.
|
@adrienbernede the hatchet script is perfectly fine when comparing across the same variant, like your build_and_test currently does. For the next steps I imagine you'll want to bring in the CUDA and HIP variants for those particular architectures, with a switch in the script when it detects where it's running. Also, at some point I imagine the script defining a pass/fail criterion for the hatchet output where you use the subtract operator for the two trees exceeding some threshold. To test functionality we could artificially slow down a RAJAPerf run with command line options increasing problem size (i.e --size) while keeping the baselines intact, and also in the openmp case by limiting OMP_NUM_THREADS. On the obverse artificial speedup is viable too running a tiny size, but this may be harder to catch since default is quite small. Thanks for diving in to this. I think everyone is extremely appreciative. |
|
Someone will have to remain cognizant for when the trees differ, like adding or removing a kernel. In this case the baseline should be rerun. |
|
@jonesholger I agree we will need to add hip and cuda, as well as a threshold mechanism. Regarding changes in the baseline, I would like to suggest rerunning the reference each time, alway comparing to the most recent develop ancestor commit. This will not protect from changes coming from the branch itself, but it seems more robust to rerun the reference rather than risking that something may have changed, making the baseline obsolete (e.g. a change in the machine config). I will present the idea the next time we meet, as we need to discuss the desired design anyway. |
|
@adrienbernede running the develop baseline every time seems fine, and in your script you do have a dataframe node count where I would test that they are equal before running the subtract operator. I do have some routines in my back pocket that make comparing different trees more benign (only compares nodes that don't differ). @slabasan is aware of these too. |
|
@adrienbernede what's going on with some of the older compilers on gitlab/lassen (clang9 pops a lot), should I check it out. Develop has this issue too. |
|
@jonesholger the older compilers are related to shared configurations being run/tested in multiple projects, RAJA, RAJAPerf, Umpire, etc. David B. and I plan to meet soon to update the shared specs. |
|
Isn't running the baseline each time automatic if we always run base and RAJA variants for each programming model; i.e., the base variant will be the baseline? The most important thing to monitor for this suite is that the difference between the RAJA variant and base variant for each kernel doesn't grow when the RAJA variant is slower than the base variant. It may also be a good idea to track differences between base variants of each run against the base variants on the previous run. This may give us some insights into compiler regressions. However, there is probably enough variation run-to-run that it may be hard to interpret. |
|
@rhornung67 yeah I think comparing to base variant is the eventual intended outcome, say RAJA_CUDA vs Base_CUDA - and they are matched one to one with kernel-tunings; otherwise I can add routines to the script to fix up the different trees which we can discuss . I think this initial version just checks across the same variant where baseline is defined to be the develop branch ancestor and PR is version under test. I haven't looked close enough to see if the baseline is just one fixed machine/compiler combo. Forgot to mention the root nodes will be different when comparing different variants. I have a routine that fixes this too. |
|
@rhornung67 I also like your idea of tracking regressions |
|
@jonesholger I think we should be in good shape if we can compare RAJA_* to Base_* for each relevant programming model. I believe these relate one-to-one across the Suite, with a couple of exceptions, so their trees should match. Adding the ability to track regressions could be done in a separate PR when we figure out a good way to do that. How close is this to being able to do the first part and be merged? |
|
@rhornung67 I'll start on a patch to the hatchet script as a PR to this one and verify all the trees. If there are exceptions the default fixup I have in place is to propagate the min time in a set of tunings for a particular kernel. |
|
@adrienbernede @rhornung67 maybe add a threshold as third argument so we can do v1 - v2 < +/- threshold. For now just record it - I'm curious regarding the run to run variance too. |
|
Posted #298 |
|
@jonesholger if you think comparing RAJA to Lambda is not meaningful, especially since both are compared to the baseline already, I’m fine with removing that comparison. Is that what you’re saying ? I’m reordering the parameters, I agree baseline should be... the baseline. |
|
I think all that we want is to compare different variants to the designated baseline variant. We can infer other comparisons from those. |
|
@jonesholger what do you think of the current state of the PR ? |
|
@adrienbernede I think once you swap parameters so --baseline == Base_(suffix) we're good. |
|
lol, I shouldn’t code late... One sec. |
|
@adrienbernede this has to be extremely frustrating (timeout), maybe you could prebuild caliper on corona: |
|
That’s the first time it times out. But I have everything ready to optimize it. In fact it is already optimized on ruby. |
|
@jonesholger I just triggered a pipeline with I had been working on using Spack chaining (upstreams) to speed up the dependencies installation, but this work was paused in favor of more urgent matters. With caliper integration, and the many dependencies it involves, I revamped the implementation, which was actually not so hard. Presently, the tests are failing... that was not expected. In order not to be delayed by this, we have a joker card to play: we haven’t configured an external python install in the radiuss-spack-configs. We install python as a dependency for Caliper where we could just use one already installed on the machines. Do you know which LC python install we should use to get things to run smoothly? |
|
@adrienbernede . I just spotted your comment - yeah not having to build Python is win/win. But, you also rely on an LC install of Hatchet which I think is geared to Python 3.9 or was recently upgraded for 3.9, anyhow other version may trigger a cython recompile. You could also pip install llnl-hatchet against your favorite python in a venv, but I would steer clear of Python 3.11. Anyhow, I'll help look into it. |
|
on lassen and I think the other platforms are analogous python: module load python/3.8.2 spack load /{installed hash} If you're intrepid you can install llnl-hatchet against Python3.8.2 like so: then source test-env/bin/activate in build_and_test for your CI In python3 test deployment
|
|
BTW you can link against the GCC version of Caliper for the Clang installs of RAJAPerf |
|
I had already added python 3.8.2 when I saw your answer. But toss4 does not have it, so I used 3.10.8 there.
|
|
There is no Hatchet package in Spack. A while back one did exist, but it was very difficult to configure, with a dependency chain involving Matplotlib, with various backends, and linking against mesa as default,which was super fragile. Hatchet is easy enough to install from git clone - I did have a small request into the team to trigger cython recompile when switching Pythons in their install script, which I should revisit. Essentially just doing a "real clean" in the install every time. python setup.py clean –all Your gapps install is a good position for gitlab, and maybe add Python 3.7 |
|
I salute your attention to also prebuild elfutils. A Caliper build with those prebuilts is done in less than a minute. Almost nonsense that we can't find elfutils-dev on these important systems. |
|
Maybe another soapbox point, deployment installs should not allow "latest" non-versioned packages, including the Caliper variant we're playing with now. So, we should update installation scripts as soon as feasible. Even in pip land a requirements package list should have package==some_version, else rejected for deployment. caliper@2.9.0+ should be our target. Anyhow, you are way more expert at this, but let me know if you need me to look at anything. |
|
Those are good remarks. I’ll prepare an issue to keep track of those changes we want for the near future. |
A proof of concept of running Caliper and comparing the results to a baseline.
This is really basic.