Questions about string cleaning

Thanks for this solid work. 
In the `clean_str`, it seems that `Every dataset is lower cased except for TREC` but in the example, in Table 6 the sentence is cased. This looks like a conflict to me. 
https://github.com/jind11/TextFooler/blob/6aeec20f9fd37f5865e580de669e1263a7cd49d3/dataloader.py#L10
Also in `clean_str` say `Tokenization/string cleaning for all datasets except for SST.`
Did you train the model on a cleaned uncased dataset but test it on a cased raw dataset? But the split 1000 dataset in 'data' is uncased. I'm really confused. Is there something I have missed?
I apologize that I didn't go through your code before directly asking the question. That would be very generous and helpful. Thanks in advance~

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about string cleaning #38

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Questions about string cleaning #38

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions