Skip to content

Need help understanding the labels of the parser model #104

@sujoung

Description

@sujoung

Hello! Firstly I have to say that I love this project. Really helping me exploring syntax of different kinds of text. So thank you so much!

I have a question regarding tagsets. I am using swedish model, and few years back, I remember it used to be based on Swedish treebank tagset called Mamba. But it seems like it has been changed in the new version (benepar-sv2).

I tried to print what kind of labels have been used to train the core model, and I got these results.

>>> parser._parser.label_vocab
{'': 0,
 'AP': 1,
 'AP::AP': 2,
 'AP::XP': 3,
 'AVP': 4,
 'AVP::XP': 5,
 'NP': 6,
 'NP::AP': 7,
 'NP::NP': 8,
 'NP::NP::AP': 9,
 'NP::NP::NP::NP::XP': 10,
 'NP::NP::S': 11,
 'NP::NP::VP': 12,
 'NP::PP': 13,
 'NP::S': 14,
 'NP::XP': 15,
 'NP::XP::NP': 16,
 'NP::XP::S': 17,
 'PP': 18,
 'PP::AVP': 19,
 'PP::AVP::XP': 20,
 'PP::NP': 21,
 'PP::XP': 22,
 'PSEUDO': 23,
 'S': 24,
 'S::AVP': 25,
 'S::NP': 26,
 'S::NP::NP': 27,
 'S::NP::NP::NP::NP': 28,
 'S::NP::S': 29,
 'S::NP::XP': 30,
 'S::NP::XP::S': 31,
 'S::PP': 32,
 'S::PP::NP': 33,
 'S::S': 34,
 'S::S::NP': 35,
 'S::S::NP::NP': 36,
 'S::VP': 37,
 'S::XP': 38,
 'VP': 39,
 'VP::AP': 40,
 'VP::PP': 41,
 'VP::S': 42,
 'VP::VP': 43,
 'VP::XP': 44,
 'XP': 45,
 'XP::AVP': 46,
 'XP::NP': 47,
 'XP::PP': 48,
 'XP::S': 49}
>>> parser._parser.tag_vocab
{'AB': 1,
 'DT': 2,
 'HA': 3,
 'HD': 4,
 'HP': 5,
 'HS': 6,
 'IE': 7,
 'IN': 8,
 'JJ': 9,
 'KN': 10,
 'MAD': 11,
 'MID': 12,
 'NN': 13,
 'P': 14,
 'PAD': 15,
 'PC': 16,
 'PL': 17,
 'PM': 18,
 'PN': 19,
 'PS': 20,
 'RG': 21,
 'RO': 22,
 'SN': 23,
 'UNK': 0,
 'UO': 24,
 'VB': 25}

What is the difference between NP::NP::S and S::NP::NP ?

Screenshot 2023-09-20 at 10 45 57

In this example ( In English: Hello, I am a banana)
There is a S (simple declarative clause) which has 2 NPs as children. Would this be NP::NP::S or S::NP::NP ? And what is happening with AUX? Because, for me it is hard to think about any structure where S has only 2 NPs. Because at least one VP is required to become a S.

Also, general question: I saw from #30 that you are using this for training: http://surdeanu.cs.arizona.edu//mihai/teaching/ista555-fall13/readings/PennTreebankConstituents.html Is it same for Swedish model and other language's models? For example unlike English model, I see there is no FRAG in labels for Swedish models. Is this because of the nature of the language itself? Or did you use different label set for different languages?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions