Skip to content

Add Py_tp_token slot and PyType_GetBaseByToken function #34

@neonene

Description

@neonene

This is a continuation proposal of PEP-489 and later PEPs. PEP-630 notes:

Currently (as of Python 3.10), heap types have no good API to write Py*_Check functions (like PyUnicode_Check exists for str, a static type), and so it is not easy to ensure that instances have a particular C layout.

One known solution is to assign a C layout ID to particular heaptypes. It will be helpful for subclass checking in tp slot methods (e.g. nb_add, tp_dealloc), especially at the final phase where we cannot rely on the module state1.

For more context, see: https://discuss.python.org/t/55598/2

Proposal

  • Adding a pointer member to heaptypes, then asking module authors to assign a preferable value (token) if they agree that:

    • The pointer outlives the class, so it's not reused for something else while the class exists.
    • It is "owned" by the module where the class lives, so it won't clash with other modules.

    For example, an extension modules that automatically wraps C++ classes could assign the typeid.

  • Introducing Py_tp_token slot for the entry:

    PyType_Slot foo_slots[] = {
        {Py_tp_token, &pointee_in_the_module},
        ...
    };

    Unlike other type slots, this slot will accept NULL through the new dedicated Py_TP_USE_SPEC identifier:

    • {Py_tp_token, Py_TP_USE_SPEC}

    The option above will instruct the PyType_FromMetaclass function to use its spec argument as a token (the slot's actual value).

    An absence of the slot will disable the feature.

  • Introducing PyType_GetBaseByToken(type, token, ...) helper function

    It will find a class whose token is valid and equal to the given one, from the type and superclasses.

Specification

  • The PyHeapTypeObject struct will have a new member, the ht_token void pointer (NULL by default), which will not be inherited by subclasses.

  • The existing PyType_FromMetaclass(..., spec, ...) function will do the following, when the proposed slot ID, Py_tp_token, is detected in spec->slots:

    if PyType_Slot.pfunc == Py_TP_USE_SPEC:  # NULL check
        ht_token = spec
    else:
        ht_token = PyType_Slot.pfunc
  • PyType_GetSlot(type, Py_tp_token) will return NULL if a static type is given.

  • A helper function will be:

    int PyType_GetBaseByToken(PyTypeObject *type, void *token,
                              PyTypeObject **result)

    Scan only the heaptypes that have a non-NULL token, walking the type's tp_mro if exists, or walking the tp_bases recursively.

    • On error, set *result to NULL, set an exception, return -1.
    • If there is no type whose token is equal to the given one, set *result to NULL and return 0.
    • Otherwise: set *result to the first found type, return 1.
    • (UPDATE) Raise SystemError when token is NULL.
    • (UPDATE) Raise TypeError when PyType_Check(type) returns false.

    The result argument accepts NULL not to assign a reference (check only mode).

Reference implementation

Performance

A subclass check in a slot method currently consists of the following steps:

  1. PyType_GetModuleByDef (walks MRO)
  2. PyModule_GetState
  3. Py*_CheckExact
  4. PyType_IsSubtype (walks MRO)

PyType_GetBaseByToken is cheaper than (1)+(2)+(3), but a little more expensive than 4.PyType_IsSubtype2. Mostly, using the new function alone will be efficient enough except when staying in C functions and repeating (3)(4) with a module state passed around3.

Backwards Compatibility

  • One new pointer, ht_token, is added to heap types.
  • One slot ID, Py_tp_token, is added with an identifier, Py_TP_USE_SPEC.
  • One helper function, PyType_GetBaseByToken, is added, whose documentation will mention the new slot above.

UPDATE: Py_tp_token, Py_TP_USE_SPEC and PyType_GetBaseByToken will be documented individually.

Previous discussions

Footnotes

  1. The GC can clear the module state or can erase the references to the module from heaptypes: gh-115874

  2. PyType_IsSubtype can be slower on recent Windows PGO builds due to the unstable optimization.

  3. PyType_GetModuleState is available after PyType_GetBaseByToken.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions