Need help understanding the labels of the parser model

Hello! Firstly I have to say that I love this project. Really helping me exploring syntax of different kinds of text. So thank you so much!

I have a question regarding tagsets. I am using swedish model, and few years back, I remember it used to be based on Swedish treebank tagset called Mamba. But it seems like it has been changed in the new version (benepar-sv2). 

I tried to print what kind of labels have been used to train the core model, and I got these results.

```
>>> parser._parser.label_vocab
{'': 0,
 'AP': 1,
 'AP::AP': 2,
 'AP::XP': 3,
 'AVP': 4,
 'AVP::XP': 5,
 'NP': 6,
 'NP::AP': 7,
 'NP::NP': 8,
 'NP::NP::AP': 9,
 'NP::NP::NP::NP::XP': 10,
 'NP::NP::S': 11,
 'NP::NP::VP': 12,
 'NP::PP': 13,
 'NP::S': 14,
 'NP::XP': 15,
 'NP::XP::NP': 16,
 'NP::XP::S': 17,
 'PP': 18,
 'PP::AVP': 19,
 'PP::AVP::XP': 20,
 'PP::NP': 21,
 'PP::XP': 22,
 'PSEUDO': 23,
 'S': 24,
 'S::AVP': 25,
 'S::NP': 26,
 'S::NP::NP': 27,
 'S::NP::NP::NP::NP': 28,
 'S::NP::S': 29,
 'S::NP::XP': 30,
 'S::NP::XP::S': 31,
 'S::PP': 32,
 'S::PP::NP': 33,
 'S::S': 34,
 'S::S::NP': 35,
 'S::S::NP::NP': 36,
 'S::VP': 37,
 'S::XP': 38,
 'VP': 39,
 'VP::AP': 40,
 'VP::PP': 41,
 'VP::S': 42,
 'VP::VP': 43,
 'VP::XP': 44,
 'XP': 45,
 'XP::AVP': 46,
 'XP::NP': 47,
 'XP::PP': 48,
 'XP::S': 49}
>>> parser._parser.tag_vocab
{'AB': 1,
 'DT': 2,
 'HA': 3,
 'HD': 4,
 'HP': 5,
 'HS': 6,
 'IE': 7,
 'IN': 8,
 'JJ': 9,
 'KN': 10,
 'MAD': 11,
 'MID': 12,
 'NN': 13,
 'P': 14,
 'PAD': 15,
 'PC': 16,
 'PL': 17,
 'PM': 18,
 'PN': 19,
 'PS': 20,
 'RG': 21,
 'RO': 22,
 'SN': 23,
 'UNK': 0,
 'UO': 24,
 'VB': 25}
```

What is the difference between `NP::NP::S` and  `S::NP::NP` ? 

<img width="429" alt="Screenshot 2023-09-20 at 10 45 57" src="https://github.com/nikitakit/self-attentive-parser/assets/31689453/3fa59aad-d494-4c59-98a7-028c658a187f">

In this example ( In English: Hello, I am a banana)
There is a `S` (simple declarative clause) which has 2 `NP`s as children. Would this be `NP::NP::S` or `S::NP::NP` ? And what is happening with `AUX`? Because, for me it is hard to think about any structure where `S` has only 2 `NP`s. Because at least one `VP` is required to become a `S`.

Also, general question: I saw from #30  that you are using this for training: http://surdeanu.cs.arizona.edu//mihai/teaching/ista555-fall13/readings/PennTreebankConstituents.html Is it same for Swedish model and other language's models? For example unlike English model, I see there is no `FRAG` in labels for Swedish models. Is this because of the nature of the language itself? Or did you use different label set for different languages?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Need help understanding the labels of the parser model #104

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Need help understanding the labels of the parser model #104

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions