Skip to content
This repository was archived by the owner on Apr 8, 2022. It is now read-only.

Conversation

@davidlmobley
Copy link
Member

After discussion with @jchodera and others, we concluded we wanted to get the Michel lab/BioSimSpace/Cresset modified version synced back up. Huge thanks to lohedges for drawing together all the work represented in this PR. Here's the change list I got from him, below:

A summary of the key changes are given below:

Cresset (Mark Mackey):

The main change we made recently was the ability to provide link scores as an input to LOMAP, rather than have it calculate them. That means that for large data sets we can split the NxN matrix up into smaller submatrices, run parallel LOMAP jobs to get the scores/maps etc for each submatrix, and then run one big LOMAP job at the end to compute the optimal graph. Without this doing a LOMAP run on data sets with >100 molecules was very very painful.

  • Links file (see above)
  • Added support for SDF
  • Improved cyclisation algorithm
  • Improved handling of chirality
  • Added rules:
  • sulfonamide
  • heterocycle
  • methyl-to-ring
  • ring-size-change
  • hybridisation
  • Removed alkyl-to-halogen rule
  • Removed fingerprint functionality

(Just to note that most of the new rules are quite specific to single-topology FEP and in some cases are specific to AMBER/SOMD.)

BioSimSpace (Jenke Scheen / Lester Hedges):

  • Make package relocatable so that it can be bundled.
  • Remove Cresset hydrogen ordering restriction which was required for an RDKit bug that has since been fixed. (If preferable, this could be re-added with an option to disable in order to catch any regressions.)
  • Let RDKit figure out the bondLineWidth when drawing graphs.

@jchodera said he would try and review this or get his folks to do so. Our tentative plan is to get this merged in, then work with OpenEye, the Chodera Lab and others to think through a larger revision of tools in this area (which may or may not still be called LOMAP) which would use a modular, object oriented architecture and a simple API, along with class inheritance/overrides to allow customization.

cc @nividic

Lester, let me know if anyone else should be tagged. I'll be out of the office for a bit so if anyone wants to review without me that would be much appreciated.

@davidlmobley
Copy link
Member Author

Also tagging @JenkeScheen

@davidlmobley
Copy link
Member Author

Ah, and I forgot but @jchodera wanted me to tag @mikemhenry and @ijpulidos who he thought might be able to help get this reviewed and merged because it would relate to Perses work.

@davidlmobley davidlmobley requested a review from mikemhenry July 24, 2021 15:44
@jchodera
Copy link

@mikemhenry @ijpulidos @dominicrufa : The context here is that LOMAP is a tool for constructing molecule-to-molecule atom mappings and useful networks of transformations between molecules, but its original repository has been long neglected. Multiple forks have been in constant development for particular applications, and there is now motivation to collaboratively develop an independent package that unifies some of the API but allows others to customize functionality as needed.

However, the API is a disaster. There are hundreds of public methods, and no clear software idioms or design patterns in use. There's no way to collaboratively support a codebase like this without a dedicated maintainer.

Our goal is to

  • Provide some input on this set of updates so it can get merged into the original LOMAP repository
  • Work with the various stakeholders to identify what the MVP is in terms of key tasks and customizable components
  • Identify the minimal public API that must be supported in order to allow this
  • Eventually, someone (us? Open Free Energy Consortium?) will refactor this to use portable OpenFF toolkit objects and capture the usable code under the hood so that we can iteratively refine and refactor the internals without breaking the public API

As a reference, my refactor of the atom mapping code for perses may be a useful example: This refactor exposes just a few public methods, utilizes a factory design pattern (where an AtomMapper factory with a customizable score_mapping function generates an AtomMapping object for a desired pair of openff.toolkit.topology.Molecule objects).

@davidlmobley
Copy link
Member Author

Yes. So, for now, we're probably after giving this a minimal/quick review to look for anything which obviously needs to be addressed before merge. However, the code which this PR brings in is functional, whereas the code in the master branch is a couple of years out of date, so it's critical to get this merged in fairly soon to restore functionality. :)

@jchodera
Copy link

@davidlmobley : I have a different proposal: Instead of merging this into master, why not

  • Retire master into legacy or lomap-1.0-long-term-maintenance or something to indicate it is the deprecated branch. (The other branches should similarly be retired but retained)
  • Tell GitHub that main is the new default branch, keeping the name main
  • We (me, @mikemhenry, @ijpulidos) create a PR to update it to the latest MolSSI Cookiecutter to get GitHub Actions unit tests running

We can then go from there.

@jmichel80
Copy link

@davidlmobley @jchodera both options seem fine from our perspective. We would like to eventually refactor our implementation in BSS to pull in 'canonical' lomap (or whatever it will be called) from conda. If your updates are backwards compatible, or if you keep @lohedges in the loop about API changes that should ensure we are actually able to share the same codebase.

@jchodera
Copy link

@jmichel80 : I think the major question is who is willing to maintain this version of "canonical" LOMAP on conda-forge. If you folks want to do that independently, I think that would be fantastic.

As it stands, however, the public API appears to be too extensive and diffuse to be something we could collectively support, so I don't see us being able to contribute to supporting a stable version of the current API in any kind of long-term manner. My objectives are articulated here:
#54 (comment)

Basically, I think it's a useful starting point, I foresee us helping support a new-generation version of LOMAP with an enormously simplified public API that can be tailored for specific applications.

@lohedges
Copy link
Collaborator

@davidlmobley : I have a different proposal: Instead of merging this into master, why not

  • Retire master into legacy or lomap-1.0-long-term-maintenance or something to indicate it is the deprecated branch. (The other branches should similarly be retired but retained)
  • Tell GitHub that main is the new default branch, keeping the name main

This is exactly what I had in mind.

@jmichel80
Copy link

@jchodera we would be happy not to maintain LOMAP ! We can work with you to make sure that a new version of the code would still give us the same functionality we rely on at the moment.

@lohedges
Copy link
Collaborator

One other thing to note is that the Cresset fork has removed some functionality from the original master branch, e.g.:

Removed alkyl-to-halogen rule
Removed fingerprint functionality

I'm no LOMAP expert so don't know whether these options are widely used by other groups. If so, it might be worth discussing the functionality that would be expected from a canonical LOMAP implementation.

Future API changes would be perfectly fine by me, as long as things are clearly documented. We currently only make use of LOMAP within BioSimSpace.Align.generateNetwork and calls to the LOMAP API are entirely hidden from the user. As such, I can just update code within this function to reflect any changes going forward. (Also, our functionality is still experimental, so @JenkeScheen might have more ideas of what might be needed in future.)

What is important to us is having a conda installable version of LOMAP that works within the conda-forge ecosystem using the dependencies required by BioSimSpace . The version on main satisfies this requirement, whereas the master branch does not. (Mostly because of outdated NetworkX functionality.)

@davidlmobley
Copy link
Member Author

OK, so I like the option of retiring master and telling GitHub that this is the new master branch.

@lohedges those functionalities don't seem important to me, or at least, anyone using them certainly hasn't contributed back here and isn't chiming in, so I'm fine with removing those for now.

Plans for long-term maintenance are still developing, but:

  • I'm on board with major API changes/rearchitecting
  • We are going to have to do some new things which touch LOMAP (and which have funding), so will be investing at least SOME developer time here (or towards a rearchitected version) in the near future
  • We'd long term like to shift to something modular/maintainable/rebuilt that can serve to meet everyone's needs, though I'm not sure we can handle that rearchitecting; maybe it can happen through OpenFE.

There's also been some discussion elsewhere as to whether the successor would be called LOMAP, or something else, and I'm not particular -- the main thing I care about is that there be a tool we can use (and contribute to) for this stuff which is maintainable.

@davidlmobley
Copy link
Member Author

In other words, I think my current plan is:

a) Make "main" the new default branch
b) Happily contribute to any efforts to make this LOMAP obsolete by replacing it with something better architected which maintains key functionality
c) Deprecate LOMAP once its successor is available

@lohedges
Copy link
Collaborator

lohedges commented Aug 3, 2021

This sounds good to me. One other thing that I noted is that the README.md on main lists several additional authors. My contribution is certainly very small, so it might be better to divide the list into two categories, e.g. authors and contributors to make clear who did the bulk of the work, and who was responsible for the original concept and implementation.

Just to check, are you planning on building a conda-forge package from main, or is this something that you will only do once the refactoring and whatnot is in place?

@davidlmobley
Copy link
Member Author

Good idea on the contributors/etc., this is woefully out of date.

I'd want to build a conda forge package from main, though I don't think anyone on my end getting involved here has done that before, so we could probably use help and/or tips if you happen to have time and interest.

@lohedges
Copy link
Collaborator

lohedges commented Aug 5, 2021

Yes, I'd be happy to help with that. Given that it's pure python with standard dependencies it shouldn't be an issue. When I get a chance I'll try to make an initial recipe for you.

@davidlmobley
Copy link
Member Author

One issue I noticed with this is that it makes the examples nonfunctional (removes getMapping).

@lohedges
Copy link
Collaborator

Ah, I didn't notice that. Hopefully it's easy to re-add the functionality.

@lohedges
Copy link
Collaborator

Okay, I reinstated the getMapping method using the code from the master branch.

@mikemhenry
Copy link

@jchodera @davidlmobley This is a pretty big PR :)

How would you like me to review this? In this initial PR are we looking to clean up the API as well? Do we want to switch from travis CI to github actions?

@davidlmobley
Copy link
Member Author

@mikemhenry this may not be worth reviewing; I think the path forward is that, well, the current master is nonfunctional and this branch is functional, so we need to switch to this branch, probably without a detailed review. Then we should begin to think about API issues and switch from Travis CI to GItHub actions, etc.

@davidlmobley
Copy link
Member Author

OK, as per discussion I have updated the default branch from master to main and thus the code in this PR is now "live" I think. I will close this PR, and also update the changelog in a separate PR.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants