-
Notifications
You must be signed in to change notification settings - Fork 158
Description
Hello! Firstly I have to say that I love this project. Really helping me exploring syntax of different kinds of text. So thank you so much!
I have a question regarding tagsets. I am using swedish model, and few years back, I remember it used to be based on Swedish treebank tagset called Mamba. But it seems like it has been changed in the new version (benepar-sv2).
I tried to print what kind of labels have been used to train the core model, and I got these results.
>>> parser._parser.label_vocab
{'': 0,
'AP': 1,
'AP::AP': 2,
'AP::XP': 3,
'AVP': 4,
'AVP::XP': 5,
'NP': 6,
'NP::AP': 7,
'NP::NP': 8,
'NP::NP::AP': 9,
'NP::NP::NP::NP::XP': 10,
'NP::NP::S': 11,
'NP::NP::VP': 12,
'NP::PP': 13,
'NP::S': 14,
'NP::XP': 15,
'NP::XP::NP': 16,
'NP::XP::S': 17,
'PP': 18,
'PP::AVP': 19,
'PP::AVP::XP': 20,
'PP::NP': 21,
'PP::XP': 22,
'PSEUDO': 23,
'S': 24,
'S::AVP': 25,
'S::NP': 26,
'S::NP::NP': 27,
'S::NP::NP::NP::NP': 28,
'S::NP::S': 29,
'S::NP::XP': 30,
'S::NP::XP::S': 31,
'S::PP': 32,
'S::PP::NP': 33,
'S::S': 34,
'S::S::NP': 35,
'S::S::NP::NP': 36,
'S::VP': 37,
'S::XP': 38,
'VP': 39,
'VP::AP': 40,
'VP::PP': 41,
'VP::S': 42,
'VP::VP': 43,
'VP::XP': 44,
'XP': 45,
'XP::AVP': 46,
'XP::NP': 47,
'XP::PP': 48,
'XP::S': 49}
>>> parser._parser.tag_vocab
{'AB': 1,
'DT': 2,
'HA': 3,
'HD': 4,
'HP': 5,
'HS': 6,
'IE': 7,
'IN': 8,
'JJ': 9,
'KN': 10,
'MAD': 11,
'MID': 12,
'NN': 13,
'P': 14,
'PAD': 15,
'PC': 16,
'PL': 17,
'PM': 18,
'PN': 19,
'PS': 20,
'RG': 21,
'RO': 22,
'SN': 23,
'UNK': 0,
'UO': 24,
'VB': 25}
What is the difference between NP::NP::S and S::NP::NP ?
In this example ( In English: Hello, I am a banana)
There is a S (simple declarative clause) which has 2 NPs as children. Would this be NP::NP::S or S::NP::NP ? And what is happening with AUX? Because, for me it is hard to think about any structure where S has only 2 NPs. Because at least one VP is required to become a S.
Also, general question: I saw from #30 that you are using this for training: http://surdeanu.cs.arizona.edu//mihai/teaching/ista555-fall13/readings/PennTreebankConstituents.html Is it same for Swedish model and other language's models? For example unlike English model, I see there is no FRAG in labels for Swedish models. Is this because of the nature of the language itself? Or did you use different label set for different languages?