Low Accuracy on ALEs Evaluation Compared to Reported

I attempted to reproduce the results for ALEs using LAION-CLAP for encoding both audio and hypotheses (reformulated with GPT-4o). I then selected the best hypothesis based on cosine similarity, following the exact procedure described in the paper. However, when running the provided evaluation code, I only achieve 25% accuracy, whereas the paper reports 45.10% for the "sound" category.

Could you provide more details on this evaluation step, or would you like me to share my implementation for review?

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Low Accuracy on ALEs Evaluation Compared to Reported #8

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Low Accuracy on ALEs Evaluation Compared to Reported #8

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions