Skip to content

Conversation

@tomaarsen
Copy link
Member

Hello!

Pull Request overview

  • When pytest is used:
    • Always display test coverage.
    • Always display the 10 slowest tests.
  • Add passing data.py tests.
  • Add skipped tests for onnx that fail currently.
  • Ensure that model.onnx is always deleted, even on test failures.

Details

pytest improvements

This PR introduces .coveragerc, the recommended configuration file for coverage. It's fairly standard, except for the exclusion of if TYPE_CHECKING: branches. As these are never true for users, we don't need to worry about the True branch here, so we can safely exclude these branches.

Additionally, setup.cfg is expanded with:

[tool:pytest]
testpaths = tests
addopts = --cov=setfit --durations=10

This implies pytest will now be equivalent to pytest --cov=setfit --durations=10 ./tests.

  • The --cov=setfit will print an ascii coverage report at the end of the pytest run, like so:
    ---------- coverage: platform win32, python 3.10.8-final-0 -----------
    Name                                 Stmts   Miss Branch BrPart  Cover
    ----------------------------------------------------------------------
    src\setfit\__init__.py                   5      0      0      0   100%
    src\setfit\data.py                     114      0     54      0   100%
    src\setfit\exporters\__init__.py         0      0      0      0   100%
    src\setfit\exporters\onnx.py            95     22     34      8    72%
    src\setfit\exporters\utils.py            4      2      0      0    50%
    src\setfit\integrations.py              23      0      4      2    93%
    src\setfit\logging.py                  142     61     26      2    52%
    src\setfit\modeling.py                 313     50    102     20    81%
    src\setfit\pipeline.py                   9      9      0      0     0%
    src\setfit\trainer.py                  194     25     86     22    82%
    src\setfit\trainer_distillation.py      69     17     22      7    67%
    src\setfit\utils.py                     55     26      8      0    52%
    ----------------------------------------------------------------------
    TOTAL                                 1023    212    336     61    77%
    
  • The --durations=10 will display the 10 slowest tests, like so:
    ====================================================================================== slowest 10 durations ====================================================================================== 
    14.96s call     tests/exporters/test_onnx.py::test_export_onnx_sklearn_head
    9.91s call     tests/test_trainer.py::TrainerHyperParameterOptunaIntegrationTest::test_hyperparameter_search
    4.96s call     tests/test_trainer_distillation.py::DistillationSetFitTrainerTest::test_trainer_works_with_default_columns
    3.80s setup    tests/test_modeling.py::SetFitModelDifferentiableHeadTest::test_max_length_is_larger_than_max_acceptable_length
    3.69s call     tests/test_trainer.py::SetFitTrainerTest::test_raise_when_metric_value_is_invalid
    3.07s call     tests/test_trainer.py::SetFitTrainerTest::test_trainer_works_with_model_init
    2.97s call     tests/test_trainer.py::SetFitTrainerTest::test_trainer_support_callable_as_metric
    2.92s call     tests/test_trainer.py::SetFitTrainerMultilabelTest::test_trainer_multilabel_support_callable_as_metric
    2.66s call     tests/test_data.py::test_create_fewshot_splits_with_augmentation
    2.23s call     tests/test_trainer_distillation.py::DistillationSetFitTrainerTest::test_column_mapping_multilabel
    

Additionally, pytest --cov-report=html can be used to create a HTML report in a coverage_report_html folder.

New tests

For data.py I have introduced several new tests. Although this code likely isn't frequently used by users of SetFit, I wanted to try and understand it better, and what better way to understand code than to try and verify that it works correctly? 😄

For the ONNX tests I created a simple Trainer that trains a differentiable head, and then I export that model to ONNX in the same way as the existing test. The reasoning here was that exporting a model was only tested for the sklearn head. I discovered that the tests fail regardless of the out_features passed to the differentiable head.

FAILED tests/exporters/test_onnx.py::test_export_onnx_torch_head[1] - ValueError: Using a target size (torch.Size([4])) that is different to the input size (torch.Size([4, 1])) is deprecated. Please ensure they have the same size.
FAILED tests/exporters/test_onnx.py::test_export_onnx_torch_head[2] - assert False
FAILED tests/exporters/test_onnx.py::test_export_onnx_torch_head[3] - assert False

At out_features of 1 there is an issue with training, and at 2 and 3 the following errors pop up, respectively:

            # Compare the results and ensure that we get the same predictions.
>           assert np.array_equal(onnx_preds, pytorch_preds)
E           assert False
E            +  where False = <function array_equal at 0x000001559F262EF0>(array([[7.2937156e-03, 9.9270624e-01],\n       [9.9964154e-01, 3.5840381e-04]], dtype=float32), array([1, 0], dtype=int64))
E            +    where <function array_equal at 0x000001559F262EF0> = np.array_equal
            # Compare the results and ensure that we get the same predictions.
>           assert np.array_equal(onnx_preds, pytorch_preds)
E           assert False
E            +  where False = <function array_equal at 0x000001559F262EF0>(array([[4.0040349e-04, 9.4026297e-01, 5.9336655e-02],\n       [8.8793278e-01, 9.9355258e-02, 1.2711961e-02]], dtype=float32), array([1, 0], dtype=int64))
E            +    where <function array_equal at 0x000001559F262EF0> = np.array_equal

As you can see, the actual model predictions stay 1-dimensional: simply an array with the number of samples to predict for as the length. However, the ONNX predictions are 2-dimensional of the shape (num_samples, out_features). Regardless of whether this is an issue on the PyTorch model, the ONNX model should always be equivalent to the PyTorch model.

I've skipped these tests with the reason of "ONNX exporting of SetFit model with Torch head not yet supported."

Additionally, I've encompassed the body of the ONNX tests with a try-finally, where the model.onnx file is deleted in the finally clause. That way, the file gets deleted even when the test fails.

All in all, this has improved the test coverage from 72% to 77%. It would increase much more when ONNX tests pass, and I've now identified some key code that is never executed, like SetFitBaseModel, SKLearnWrapper, SetFitPipeline and a majority of logging.py code.

  • Tom Aarsen

`pytest` now displays coverage stats and `pytest --cov-report html` produces a useful coverage report
However, these must be skipped currently, as they all fail!
@tomaarsen tomaarsen added CI Regarding the Continuous Integration tests Regarding the test suite labels Dec 20, 2022
tomaarsen and others added 2 commits December 20, 2022 11:38
For some reason, locally I get different warnings for ./make quality than the CI does.
@tomaarsen
Copy link
Member Author

My local machine's and the CI's isort seem to disagree on what "correctly formatted" constitutes. I'll investigate.

@tomaarsen
Copy link
Member Author

tomaarsen commented Dec 20, 2022

Resolved it - One of the pytest plugins that I was experimenting with had created a folder named datasets, so isort considered datasets as a local import. This causes it to, for example, be grouped alongside from setfit import ... rather than alongside from sentence_transformers import ....

Should be all set now.

Copy link
Member

@lewtun lewtun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this feature and catching these issues with the ONNX exporter @tomaarsen ! The PR LGTM, so I'll go ahead and merge this :)

@lewtun lewtun merged commit 35c0511 into huggingface:main Dec 28, 2022
@tomaarsen tomaarsen deleted the enhancement/improve_tests branch December 28, 2022 09:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI Regarding the Continuous Integration tests Regarding the test suite

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants