Skip to content

Improve LFW download error message with alternative manual download link (Kaggle)#9463

Open
wei06159 wants to merge 6 commits intopytorch:mainfrom
wei06159:fix-dataset-link
Open

Improve LFW download error message with alternative manual download link (Kaggle)#9463
wei06159 wants to merge 6 commits intopytorch:mainfrom
wei06159:fix-dataset-link

Conversation

@wei06159
Copy link
Copy Markdown

Summary

torchvision.datasets.LFWPeople/LFWPairs currently raises a ValueError indicating that the LFW dataset - http://vis-www.cs.umass.edu/lfw/ is no longer available for download and must be obtained manually. 
This PR keeps the existing behavior intact, but improves the error message to include a pointer to a commonly used dataset mirror (Kaggle) so users can find the dataset more easily.

Changes

Existing behavior preserved; only message text and link updated.

Related Issue

#8888

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Mar 31, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/9463

Note: Links to docs will display an error until the docs builds have been completed.

❌ 4 New Failures

As of commit 05d4d7f with merge base 4e58149 (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the cla signed label Mar 31, 2026
@spzala
Copy link
Copy Markdown

spzala commented Apr 1, 2026

cc @NicolasHug @atalman - review request. Thanks!

Copy link
Copy Markdown
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @wei06159 , thanks for the PR, happy to add a note in the docstring and in the error message that https://www.kaggle.com/datasets/jessicali9530/lfw-dataset in a common mirror, but I think it's best to keep the original link for reference - we can mention that they are broken though.


base_folder = "lfw-py"
download_url_prefix = "http://vis-www.cs.umass.edu/lfw/"
download_url_prefix = "https://www.kaggle.com/datasets/jessicali9530/lfw-dataset"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably not be changed because it changes the download URL logic, but we are still raising a valueerror on download.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @NicolasHug , thank you for your review. I put back the original link and I added a note to docstring and error message that the common mirror at https://www.kaggle.com/datasets/jessicali9530/lfw-dataset.

@wei06159
Copy link
Copy Markdown
Author

wei06159 commented Apr 8, 2026

Hi @NicolasHug, could I get a review on this again? I put back the original link and added a note to docstring and error message that the common mirror is at https://www.kaggle.com/datasets/jessicali9530/lfw-dataset.
Thanks.

Copy link
Copy Markdown
Contributor

@zy1git zy1git left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wei06159 Hi, thanks a lot for creating this PR.
I left some review comments. Feel free to take a look. Nicolas is on a conference now so he might take a look after coming back.

raise ValueError(
"LFW dataset is no longer available for download."
"LFW dataset is no longer available for automatic download."
"Please download the dataset manually and place it in the specified directory"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Please download the dataset manually and place it in the specified directory"
"Please download the dataset manually and place it in the specified directory."

To be consistent with the rest of the messages, I think we need to add a period.


The LFW dataset is no longer available for automatic download. Please
download it manually and place it in the specified directory.
A commonly used mirror is available at: https://www.kaggle.com/datasets/jessicali9530/lfw-dataset
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part looks good to me.

However, the warning message does not show up on the webpage (you can click the link of Preview Python docs built from this PR), but this is a pre-existing bug, not introduced by this PR.

If you want, feel free to fix in this PR, or open a new PR to fix it. Otherwise, you don't need to do anything on this part and I will fix it after merging your PR.


The LFW dataset is no longer available for automatic download. Please
download it manually and place it in the specified directory.
A commonly used mirror is available at: https://www.kaggle.com/datasets/jessicali9530/lfw-dataset
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part looks good to me.

However, the warning message does not show up on the webpage (you can click the link of Preview Python docs built from this PR), but this is a pre-existing bug, not introduced by this PR.

If you want, feel free to fix in this PR, or open a new PR to fix it. Otherwise, you don't need to do anything on this part and I will fix it after merging your PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants