Check the runtime version of PMIx#1982
Conversation
|
@jsquyres @bwbarrett I could use your help with this PR. This impacts OMPI as well since you are using PRRTE for your What I observe is that everything involving MCA base segfaults. The list of open components, list of MCA params - they are all corrupted. If I fix/skip one, the next one in line segfaults. So I don't believe it is possible to "fix" the situation. Any suggestions would be much appreciated. |
jsquyres
left a comment
There was a problem hiding this comment.
Hmm. I am pondering the situation... don't have any immediately obvious ideas.
It has been reported (and confirmed) that building against one version of PMIx and then running with another version will cause PRRTE to segfault. This isn't a universal rule. For example, one can switch v5.0 and master without a problem. However, switching v5.0 and v4.2 is a definite segfault. The root cause of the problem is a change in the layout of the base pmix_object_t definition. This renders all PMIx objects binary incompatible when crossing between the v5 and v4 (and below) series. Changing the v5 definition back to match v4 is an overly complex task. The changes were required to accommodate the new shared memory support that was introduced in v5. So instead, we check the runtime version of PMIx against the build version. If the runtime version is incompatible with the build version, then we print an explanatory error message and error out. Signed-off-by: Ralph Castain <rhc@pmix.org> dd Signed-off-by: Ralph Castain <rhc@pmix.org>
It has been reported (and confirmed) that building against one version of PMIx and then running with another version will cause PRRTE to segfault. This isn't a universal rule. For example, one can switch v5.0 and master without a problem. However, switching v5.0 and v4.2 is a definite segfault.
A little playing indicates that at least for some PMIx series, it is possible to switch between subreleases within the series - i.e., v5.0.1 and v5.0.2. I would not consider this a guaranteed rule at this time.
For now, we check the runtime version of PMIx against the build version. If the major/minor values don't match, then we print an explanatory error message
and error out.