Skip to content

Low Accuracy on ALEs Evaluation Compared to Reported #8

@marcelgibier

Description

@marcelgibier

I attempted to reproduce the results for ALEs using LAION-CLAP for encoding both audio and hypotheses (reformulated with GPT-4o). I then selected the best hypothesis based on cosine similarity, following the exact procedure described in the paper. However, when running the provided evaluation code, I only achieve 25% accuracy, whereas the paper reports 45.10% for the "sound" category.

Could you provide more details on this evaluation step, or would you like me to share my implementation for review?

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions