ProcessGroupCollection.__repr__ crashes with list-typed hierarchical CP groups

## Description

`ProcessGroupCollection.__repr__` assumes every field is a single `ProcessGroup` with a `.size()` method. When `hierarchical_context_parallel_sizes` is set, the hierarchical CP field stores a **list** of `ProcessGroup` objects, causing `AttributeError: 'list' object has no attribute 'size'`.

## Error

```
File "megatron/core/process_groups_config.py", line 150, in __repr__
    active_pgs.append(f"{field_info.name}({pg.size()})")
AttributeError: 'list' object has no attribute 'size'
```

Triggered during checkpoint saving when modelopt's `_parse_transformer_config` calls `str()` on the config, which invokes `__repr__`.

## Reproduction

```yaml
model:
  cp_comm_type: a2a+p2p
  hierarchical_context_parallel_sizes: [8, 2]
  context_parallel_size: 16
```

## Suggested Fix

Handle list-typed fields in `__repr__`:

```python
def __repr__(self):
    active_pgs = []
    for field_info in fields(self):
        pg = getattr(self, field_info.name, None)
        if pg is not None:
            if isinstance(pg, list):
                sizes = [g.size() for g in pg]
                active_pgs.append(f"{field_info.name}({sizes})")
            else:
                active_pgs.append(f"{field_info.name}({pg.size()})")
    ...
```

## Environment

- Container: `nvcr.io/nvidia/nemo:26.02`
- Megatron-LM: `core_r0.16.0`

## Related

- NVIDIA/TensorRT-Model-Optimizer#981 — the modelopt side of the same crash

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ProcessGroupCollection.repr crashes with list-typed hierarchical CP groups #3723

Description

Error

Reproduction

Suggested Fix

Environment

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ProcessGroupCollection.__repr__ crashes with list-typed hierarchical CP groups #3723

Description

Description

Error

Reproduction

Suggested Fix

Environment

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

ProcessGroupCollection.repr crashes with list-typed hierarchical CP groups #3723