-
Notifications
You must be signed in to change notification settings - Fork 29
Closed
Description
Hi Team,
I found an issue when pulling the Ratings table by calling the misc.get_pomeroy_ratings(browser) method. It seems that some of the team names are being cut off. I noticed that team names with > 1 whitespace (ex. Cal St. Northridge) and team names that contain characters that aren't "A-Z" or "." (ex. Saint Mary's, Texas A&M). Below are a few of the problematic team names I found:
| Correct Team Name | Team Name from data pull |
|---|---|
| Saint Mary's | Saint Mary |
| San Diego St. | San Diego |
| Sam Houston St. | Sam Houston |
| Texas A&M | Texas A |
| etc. | etc. |
I found this issue that has been closed and it seems to be related: #9
I believe the issue has to do with the regex in misc.py line 35:
tmp = ratings_df['Team'].str.extract(r'(?P<Team>[a-zA-Z.]+\s*[a-zA-Z]+\.*)\s*(?P<Seed>\d*)')
Let me know what y'all think. Happy to provide a more complete list of team names that are not correct if needed.
Thanks!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels