Skip to content

Stream support for exporting pdbs not working with OTHERS record #141

@gate-tec

Description

@gate-tec

Describe the bug

When trying to export pdb data with ATOM and OTHERS entries using .to_pdb_stream I always get a pandas.errors.IntCastingNaNError (cf. Steps/Code to Reproduce).
As I need to maintain the TER markers in the resulting pdb data, the content of the OTHERS frame is necessary.

When writing directly to a pdb file with .to_pdb there is no such issue. A possible approach in fixing could be an abstract base function for both methods or to specify the desired output (i.e. file or stream) in to_pdb as mentioned in #108

Steps/Code to Reproduce

Example:

from biopandas.pdb import PandasPdb

pdb_df = PandasPdb().fetch_pdb('1ou5')
out_string = pdb_df.to_pdb_stream(records=('ATOM', 'OTHERS'))

Expected Results

Stream containing the specified records in pdb format.

Actual Results

A pandas.errors.IntCastingNaNError stemming from Line 909 in pandas_pdb.py

df.residue_number = df.residue_number.astype(int)

which is executed on the entire concatenated DataFrame.
As the OTHERS frame doesn't contain residue number entries, these cells are always NaN after concatenating.

Versions

biopandas 0.5.0dev
Linux-5.4.0-91-generic-x86_64-with-glibc2.31
Python 3.10.12 | packaged by conda-forge | (main, Jun 23 2023, 22:40:32) [GCC 12.3.0]
Scikit-learn 1.3.0
NumPy 1.23.5
SciPy 1.11.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions