Skip to content
This repository was archived by the owner on Aug 9, 2023. It is now read-only.

Add refs_to_tokens command#12

Closed
ivyleavedtoadflax wants to merge 3 commits intomasterfrom
feature/ivyleavedtoadflax/reference_to_tokens
Closed

Add refs_to_tokens command#12
ivyleavedtoadflax wants to merge 3 commits intomasterfrom
feature/ivyleavedtoadflax/reference_to_tokens

Conversation

@ivyleavedtoadflax
Copy link
Copy Markdown
Contributor

@ivyleavedtoadflax ivyleavedtoadflax commented Feb 13, 2020

Now that there is https://github.com/wellcometrust/deep_reference_parser, I'm moving the associated utilities out of datalabs into wellcomeml.nlp so we can access them easily on any platform. This is quite large, so let me know if you would prefer me to chop it up into smaller PRs, I think most of it is tests though.

This class allows you to convert manual reference annotations in prodigy into token annotations.

The other significant change here is that I had added a __main__.py inspired by SpaCy. This basically gives us an entry point into the package functions like so:

$ python -m wellcomeml

ℹ Available commands
refs_to_token_annotations

This plugs into the cli of each of these functions, so you can also do:

$ python -m wellcomeml refs_to_tokens --help
usage: wellcomeml refs_to_tokens [-h] input_file output_file

 Converts a file output by prodigy (using prodigy db-out) from
    references level annotations to individual level annotations. The rationale
    for this is that reference level annotations are much easier for humans to
    do, but not useful when training a token level model.

    This function is predominantly useful fot tagging reference spans, but may
    also have a function with other references annotations.
    

positional arguments:
  input_file   Path to jsonl file containing chunks of references in prodigy
               format.
  output_file  Path to jsonl file into which fully annotate files will be
               saved.

optional arguments:
  -h, --help   show this help message and exit

I quite like this pattern for getting CLI access to the code without creating explicit entry points in setup.py. I think some of these functions are borderline in terms of their fit in WellcomeML, see what you think, happy to put them in deep_reference_parser if not here.

Allows CLI functions defined by plac (or argparse) to be called using the
following syntax:

`python -m wellcomeml <command>`
@ivyleavedtoadflax
Copy link
Copy Markdown
Contributor Author

Moved to wellcometrust/deep_reference_parser#5

@ivyleavedtoadflax ivyleavedtoadflax deleted the feature/ivyleavedtoadflax/reference_to_tokens branch February 13, 2020 23:13
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant