Conversation
|
Thanks for adding that! For the test, just curious, does this happen with other structures as well? |
|
It's consistent across a few examples. I narrowed it down to E.g. Now, I'm not sure how reliably TER records are placed in PDB files (some info here ) and if we can be super comfortable just ignoring it. I imagine it would cause a bit of a headache trying to make selections consistent between the same structure in PDB & .cif formats. |
|
Hmm... I think that we could have it without the TER records and just a note about that in the docs for now. So that people are aware. Otherwise, inferring the TER records would be some additional work (I guess that could be done based on the CIF structure index? But I don't have that much experience with TER). My guess is the primary usecase of BioPandas is data analysis, and the TER records are not super necessary anyways. So, for now I'd say let's warn people about the TER entries but not worry about it. |
|
Hello @a-r-j! Thanks for updating this PR.
Comment last updated at 2022-09-03 21:28:28 UTC |
|
Sure, I added an offset arg so people at least have an access point. Hopefully that's a sufficient remedy for most usecases. |
|
I think that works! Do you have a unit test handy? Otherwise, I can help looking into this! |
|
Think I managed to infer it automatically based on the number of chains. There was also an issue in ATOM atom numbering for multichain proteins (again, TER records). So I figured we can infer the appropriate offsets for the ATOM and HETATM dfs from the number of chains. Added appropriate unittest and seems to check out. |
|
Wow this looks good! Currently traveling and will take one more close look in a calm moment, when I am not out and about. Thanks a lot! |
|
Arg totally missed this and am currently traveling (on mobile). I will make a reminder to check this out as soon as I am back on my computer again! |
rasbt
left a comment
There was a problem hiding this comment.
Looks great, there is just this one little thing regarding the keys I noticed.
biopandas/mmcif/pandas_mmcif.py
Outdated
| """ | ||
| pandaspdb = PandasPdb() | ||
| # for a in ["ATOM", "HETATM"]: | ||
| for a in self.df.keys(): |
There was a problem hiding this comment.
Hm, but doesn't that include the OTHERS key as well? Maybe that could be more specific to the keys we actually care about
There was a problem hiding this comment.
Hmm, yes, good point. We could add an arg that defaults to ["ATOM", "HETATM"] to allow some more control. What do you think?
There was a problem hiding this comment.
Yes, I think that's the "safer" choice
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
|
Just did a little documentation update (sorry for the delay, I was traveling during the last couple of weeks, and there's tons of stuff to catch up on) |
|
Happy New Year @rasbt ! Anything I can do to get this PR merged? :) |
rasbt
left a comment
There was a problem hiding this comment.
That looks awesome. Thanks so much. And sorry for the long delay. Somehow it slipped through the cracks and I totally missed this PR.
There's just a small comment about the naming above.
biopandas/mmcif/pandas_mmcif.py
Outdated
| self.code = self.data["entry"]["id"][0].lower() | ||
| return self | ||
|
|
||
| def get_pandas_pdb(self, offset_chains: bool = True, records: List[str] = ["ATOM", "HETATM"]) -> PandasPdb: |
There was a problem hiding this comment.
should this perhaps be called convert_to_pandas_pdb? Just so that it doesn't indicate it's fetching the PDB
docs/CHANGELOG.md
Outdated
|
|
||
| ##### New Features | ||
|
|
||
| - Added a new `PandasMmcif.get_pandas_pdb()` class that converts the mmCIF file into a PDB structure. (Via [Arian Jamasb](https://github.com/a-r-j), PR #[107](https://github.com/rasbt/biopandas/pull/107/files)) |
biopandas/__init__.py
Outdated
| # | ||
|
|
||
| __version__ = "0.4.2" | ||
| __version__ = "0.5.0" |
There was a problem hiding this comment.
Ah sorry to come back to this, but
__version__ = "0.5.0" should probably be __version__ = "0.5.0dev" so that we can collect some of the other things in the PR, and then we can change it to 0.5.0 when the the next release version goes out
There was a problem hiding this comment.
Yep good point, I'll try to wrap the other two PRs tonight too :)
Fixes #97 #108
Description
Adds support for writing PDB files to filestreams.
I started adding the conversion method proposed by @mrauha.
I ran into an issue writing the test:
Which gives an error relating to atom numbering in the HETATM dataframes:
Any idea of the source of the discrepancy?
Update: This seems to be fixed now! 🥳
Related issues or pull requests
#97
Pull Request Checklist
./docs/sources/CHANGELOG.mdfile (if applicable)./biopandas/*/testsdirectories (if applicable)biopandas/docs/sources/(if applicable)PYTHONPATH='.' pytest ./biopandas -svand make sure that all unit tests pass (for small modifications, it might be sufficient to only run the specific test file, e.g.,PYTHONPATH='.' pytest ./biopandas/classifier/tests/test_stacking_cv_classifier.py -sv)flake8 ./biopandas