Prevent TypeError on `model.predict` when using string labels #331

tomaarsen · 2023-03-13T20:37:13Z

Hello!

Pull Request overview

Prevent TypeError on model.predict when using string labels.
Added a test case to show correct behaviour.

Details

When training with string labels (which is not strictly recommended, but possible), then model.predict broke as of the latest version. See the following script to reproduce:

Reproduction

from datasets import Dataset, load_dataset
from setfit import SetFitModel, SetFitTrainer

dataset = Dataset.from_dict(
    {"text": ["positive sentence", "negative sentence"], "label": ["positive", "negative"]}
)
model = SetFitModel.from_pretrained("sentence-transformers/paraphrase-albert-small-v2")
trainer = SetFitTrainer(
    model=model,
    train_dataset=dataset,
    eval_dataset=dataset,
    num_iterations=1,
)
trainer.train()
# This used to fail due to "TypeError: can't convert np.ndarray of type numpy.str_.
# The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool."
model.predict(["another positive sentence"])

This resulted in

Traceback (most recent call last):
  File "[sic]demo_string_issue.py", line 17, in <module>
    model.predict(["another positive sentence"])
  File "[sic]src\setfit\modeling.py", line 419, in predict
    outputs = torch.from_numpy(outputs)
TypeError: can't convert np.ndarray of type numpy.str_. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool.

See also #329, which shows this same issue, but for evaluate (which calls predict behind the scenes).

Why do we get this error?

Consider the following lines in the predict method:

setfit/src/setfit/modeling.py

Lines 414 to 421 in 0420165

    
           outputs = self.model_head.predict(embeddings) 
        
           if as_numpy and self.has_differentiable_head: 
        
               outputs = outputs.detach().cpu().numpy() 
        
           elif not as_numpy and not self.has_differentiable_head: 
        
               outputs = torch.from_numpy(outputs) 
        
           return outputs

And consider the scenario with the (default) non-differentiable head and as_numpy=False. In this case, we reach line 419 and call torch.from_numpy. However, outputs has dtype <U8, where the U indicates that the type is a unicode string. There is no Torch tensor equivalent of this type, and thus we get the error shown above.

The fix

The fix is simply to prevent calling torch.from_numpy if the head outputs a numpy array with strings.

Note

The issue from #329 isn't exactly fixed, calling evaluate using string labels still fails, as the evaluate library does not support string labels in its accuracy metric. This can be counteracted by supplying a different metric, e.g. a function, that computes some metric with support of strings.

Tom Aarsen

on non-differentiable heads. This used to error out, especially causing issues for model.evaluate()

HuggingFaceDocBuilderDev · 2023-03-13T20:41:02Z

The documentation is not available anymore as the PR was closed or merged.

tomaarsen · 2023-03-13T21:09:22Z

Test failures are unrelated, solved by #332.

…fix/string_predict_error

tomaarsen added 2 commits March 13, 2023 21:18

Allow predictions using string labels

78cdc87

on non-differentiable heads. This used to error out, especially causing issues for model.evaluate()

Add previously failing test

664c275

tomaarsen added the bug Something isn't working label Mar 13, 2023

tomaarsen mentioned this pull request Mar 13, 2023

SetFit trainer.evaluate() TypeError #329

Closed

tomaarsen mentioned this pull request Mar 13, 2023

Refactor to introduce Trainer & TrainingArguments, add SetFit ABSA #265

Merged

4 tasks

Merge branch 'main' of https://github.com/huggingface/setfit into hot…

5a3951d

…fix/string_predict_error

tomaarsen merged commit 83e3cf9 into huggingface:main Apr 12, 2023

tomaarsen deleted the hotfix/string_predict_error branch April 12, 2023 11:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Prevent TypeError on `model.predict` when using string labels #331

Prevent TypeError on `model.predict` when using string labels #331

Uh oh!

tomaarsen commented Mar 13, 2023

Uh oh!

HuggingFaceDocBuilderDev commented Mar 13, 2023 •

edited

Loading

Uh oh!

tomaarsen commented Mar 13, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	outputs = self.model_head.predict(embeddings)

	if as_numpy and self.has_differentiable_head:
	outputs = outputs.detach().cpu().numpy()
	elif not as_numpy and not self.has_differentiable_head:
	outputs = torch.from_numpy(outputs)

	return outputs

Prevent TypeError on model.predict when using string labels #331

Prevent TypeError on model.predict when using string labels #331

Uh oh!

Conversation

tomaarsen commented Mar 13, 2023

Pull Request overview

Details

Reproduction

Why do we get this error?

The fix

Note

Uh oh!

HuggingFaceDocBuilderDev commented Mar 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tomaarsen commented Mar 13, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Prevent TypeError on `model.predict` when using string labels #331

Prevent TypeError on `model.predict` when using string labels #331

HuggingFaceDocBuilderDev commented Mar 13, 2023 •

edited

Loading