Thanks for this solid work.
In the clean_str, it seems that Every dataset is lower cased except for TREC but in the example, in Table 6 the sentence is cased. This looks like a conflict to me.
|
def clean_str(string, TREC=False): |
Also in
clean_str say
Tokenization/string cleaning for all datasets except for SST.
Did you train the model on a cleaned uncased dataset but test it on a cased raw dataset? But the split 1000 dataset in 'data' is uncased. I'm really confused. Is there something I have missed?
I apologize that I didn't go through your code before directly asking the question. That would be very generous and helpful. Thanks in advance~
Thanks for this solid work.
In the
clean_str, it seems thatEvery dataset is lower cased except for TRECbut in the example, in Table 6 the sentence is cased. This looks like a conflict to me.TextFooler/dataloader.py
Line 10 in 6aeec20
Also in
clean_strsayTokenization/string cleaning for all datasets except for SST.Did you train the model on a cleaned uncased dataset but test it on a cased raw dataset? But the split 1000 dataset in 'data' is uncased. I'm really confused. Is there something I have missed?
I apologize that I didn't go through your code before directly asking the question. That would be very generous and helpful. Thanks in advance~