You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To achieve high performance where vectorization with NumPy is not possible, Biotite currently uses Cython code. However, there are some limitations in Cython:
Apart from fused types Cython does not support generics
Separation into multiple modules can be a bit cumbersome
Fast string operations are basically impossible unless one wants to use rather unsafe ASCII-only C-strings
To get a very high performance, safe guards need to be disabled leading to potential memory issues (out-of-bounds, leaks, etc.)
In my opinion Rust is much cleaner than Cython, when low-level, typed operations are involved.
Hence, this issue should initiate the discussion if Rust code using PyO3 should be allowed in Biotite, as it has become quite mature in recent years. This would address all the issues mentioned above. More specifically, these are the places in the code base where the limitations become quite clear:
PDBFile: There already is fastpdb written in Rust, but it needs to be maintained separately in the moment.
connect_via_residue_names(): The Python string operations in this Cython function makes it quite slow and it is actually the bottleneck in pdbx.get_structure(), when the input is a BinaryCIFFile.
Probably even more places in Biotite would benefit from routines written in Rust.
However there would also be a few disadvantages:
Cython is easier to learn than Rust, as it is close to Python.
Development would become more complex, as there would be 3 programming languages (Python, Cython, Rust), as long as Cython code still exists in the code base.
The build process would become more complex as the Rust compiler needs to be involved, but this should not change the user experience.
I lean towards accepting Rust in Biotite (otherwise I would not have opened this issue 😉), but I really like to hear your opinion about this @t0mdavid-m@JHKru@MaxGreil and other contributors/users with an opinion on this topic.
To achieve high performance where vectorization with NumPy is not possible, Biotite currently uses Cython code. However, there are some limitations in Cython:
Hence, this issue should initiate the discussion if Rust code using PyO3 should be allowed in Biotite, as it has become quite mature in recent years. This would address all the issues mentioned above. More specifically, these are the places in the code base where the limitations become quite clear:
PDBFile: There already isfastpdbwritten in Rust, but it needs to be maintained separately in the moment.connect_via_residue_names(): The Python string operations in this Cython function makes it quite slow and it is actually the bottleneck inpdbx.get_structure(), when the input is aBinaryCIFFile.CIFFile: It has been addressed multiple times now to make it faster (Handle embedded quote in mmcif #619, Update _split_one_line and remove whitespace parameter #686), but this short function is still the bottleneck when reading CIF files.Probably even more places in Biotite would benefit from routines written in Rust.
However there would also be a few disadvantages:
People cannot install Biotite from a source distribution, if they do not have a Rust compiler installed(solved by Self contained rust boostrapping PyO3/maturin#2421).I lean towards accepting Rust in Biotite (otherwise I would not have opened this issue 😉), but I really like to hear your opinion about this @t0mdavid-m @JHKru @MaxGreil and other contributors/users with an opinion on this topic.