Skip to content

Add deep_reference_parser utilities#5

Merged
ivyleavedtoadflax merged 9 commits intomasterfrom
feature/ivyleavedtoadflax/prodigy_utilities
Feb 17, 2020
Merged

Add deep_reference_parser utilities#5
ivyleavedtoadflax merged 9 commits intomasterfrom
feature/ivyleavedtoadflax/prodigy_utilities

Conversation

@ivyleavedtoadflax
Copy link
Copy Markdown
Contributor

@ivyleavedtoadflax ivyleavedtoadflax commented Feb 13, 2020

This is a long PR, but includes no new code. It is all just a move from datalabs, so please just skim if you are interested.

Moves utilities from the datalabs deep_reference_parser project to here. There utilities handle:

  • Conversion from reach, spacy, prodigy, and tsv format.
  • Conversion of manual reference annotations to token annotations in the IOBE schema.
  • Rough annotation of numbered reference sections based on a deterministic splitting method.

Command line tools are accessible from:

$ python -m deep_reference_parser.prodigy                      
Using TensorFlow backend.

ℹ Available commands
annotate_numbered_refs, prodigy_to_tsv, reach_to_prodigy,
refs_to_token_annotations

Additional help is available with the --help flag for individual commands:

$ python -m deep_reference_parser.prodigy prodigy_to_tsv --help
Using TensorFlow backend.
usage: deep_reference_parser prodigy_to_tsv [-h] input_file output_file

    Convert token annotated jsonl to token annotated tsv ready for use in the
    Rodrigues model.
    

positional arguments:
  input_file   Path to jsonl file containing prodigy docs.
  output_file  Path to output tsv file.

optional arguments:
  -h, --help   show this help message and exit

@ivyleavedtoadflax ivyleavedtoadflax force-pushed the feature/ivyleavedtoadflax/prodigy_utilities branch from 8b13948 to 101b16f Compare February 13, 2020 23:17
@ivyleavedtoadflax ivyleavedtoadflax changed the title Add utilities for handling data in prodigy format Add deep_reference_parser utilities Feb 13, 2020
@ivyleavedtoadflax ivyleavedtoadflax force-pushed the feature/ivyleavedtoadflax/prodigy_utilities branch from 101b16f to 0855a5c Compare February 13, 2020 23:24
@ivyleavedtoadflax ivyleavedtoadflax marked this pull request as ready for review February 13, 2020 23:25
@ivyleavedtoadflax ivyleavedtoadflax self-assigned this Feb 13, 2020
@ivyleavedtoadflax ivyleavedtoadflax merged commit dc40e8d into master Feb 17, 2020
@ivyleavedtoadflax ivyleavedtoadflax deleted the feature/ivyleavedtoadflax/prodigy_utilities branch February 17, 2020 13:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants