Add refs_to_tokens command by ivyleavedtoadflax · Pull Request #12 · wellcometrust/WellcomeML

ivyleavedtoadflax · 2020-02-13T18:48:57Z

Now that there is https://github.com/wellcometrust/deep_reference_parser, I'm moving the associated utilities out of datalabs into wellcomeml.nlp so we can access them easily on any platform. This is quite large, so let me know if you would prefer me to chop it up into smaller PRs, I think most of it is tests though.

This class allows you to convert manual reference annotations in prodigy into token annotations.

The other significant change here is that I had added a __main__.py inspired by SpaCy. This basically gives us an entry point into the package functions like so:

$ python -m wellcomeml

ℹ Available commands
refs_to_token_annotations

This plugs into the cli of each of these functions, so you can also do:

$ python -m wellcomeml refs_to_tokens --help
usage: wellcomeml refs_to_tokens [-h] input_file output_file

 Converts a file output by prodigy (using prodigy db-out) from
    references level annotations to individual level annotations. The rationale
    for this is that reference level annotations are much easier for humans to
    do, but not useful when training a token level model.

    This function is predominantly useful fot tagging reference spans, but may
    also have a function with other references annotations.
    

positional arguments:
  input_file   Path to jsonl file containing chunks of references in prodigy
               format.
  output_file  Path to jsonl file into which fully annotate files will be
               saved.

optional arguments:
  -h, --help   show this help message and exit

I quite like this pattern for getting CLI access to the code without creating explicit entry points in setup.py. I think some of these functions are borderline in terms of their fit in WellcomeML, see what you think, happy to put them in deep_reference_parser if not here.

Allows CLI functions defined by plac (or argparse) to be called using the following syntax: `python -m wellcomeml <command>`

ivyleavedtoadflax · 2020-02-13T23:13:18Z

Moved to wellcometrust/deep_reference_parser#5

ivyleavedtoadflax added 3 commits February 13, 2020 15:33

chg: Rename wellcomeml.spacy to wellcomeml.prodigy

e1e76bb

new: Add reference_to_token_annotations

f548d84

new: Add spacy style entry points to functions

64a5a68

Allows CLI functions defined by plac (or argparse) to be called using the following syntax: `python -m wellcomeml <command>`

ivyleavedtoadflax requested review from aCampello, lizgzil and nsorros February 13, 2020 18:49

ivyleavedtoadflax marked this pull request as ready for review February 13, 2020 18:50

This was referenced Feb 13, 2020

Add prodigy_to_tsv command #13

Closed

Add reach_to_prodigy #14

Closed

Add numbered_reference_annotator #15

Closed

ivyleavedtoadflax closed this Feb 13, 2020

ivyleavedtoadflax deleted the feature/ivyleavedtoadflax/reference_to_tokens branch February 13, 2020 23:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add refs_to_tokens command#12

Add refs_to_tokens command#12
ivyleavedtoadflax wants to merge 3 commits intomasterfrom
feature/ivyleavedtoadflax/reference_to_tokens

ivyleavedtoadflax commented Feb 13, 2020 •

edited

Loading

Uh oh!

ivyleavedtoadflax commented Feb 13, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ivyleavedtoadflax commented Feb 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ivyleavedtoadflax commented Feb 13, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ivyleavedtoadflax commented Feb 13, 2020 •

edited

Loading