Skip to content

Issue: eval.py Does Not Support All Question Types for Robutness Evaluation #6

@curryqka

Description

@curryqka

Description

The eval.py script currently only supports specific question types (perception, prediction, and planning) for fog evaluation. The self.results dictionary only includes these categories, which causes issues when evaluating fog JSON files that may contain other question types.

Current Behavior

The eval.py script initializes the scores dictionary with only the following categories:

scores = {
    "perception": {"MCQ": {}, "VQA": {}},
    "prediction": {"VQA": {}},
    "planning": {"VQA": {}},
    "behavior": {"MCQ": {}}
}

This limits the evaluation to only perception, prediction, and planning question types, and excludes other potential types.

Expected Behavior

The eval.py script should support all relevant question types for fog evaluation, including any additional types that may be present in the JSON files.

Steps to Reproduce

  1. Attempt to evaluate a fog JSON file containing question types outside of perception, prediction, and planning.
  2. Observe that the evaluation results are incomplete or incorrect due to the limited scores dictionary.

Suggested Improvements

  1. Update the scores Dictionary: Include all relevant question types that may be present in fog evaluation JSON files.
  2. Modify the Evaluation Logic: Ensure the script can handle and evaluate all supported question types dynamically.

Example

Here’s a potential modification to the scores dictionary:

scores = {
    "perception": {"MCQ": {}, "VQA": {}},
    "prediction": {"VQA": {}},
    "planning": {"VQA": {}},
    "behavior": {"MCQ": {}},
    "robust_qas": {"VQA": {}}  # Add fog-specific question types
}

Questions

  1. How should the scores dictionary be updated to support all question types for [fog/rain/etc.] evaluation?
  2. What is the recommended approach for dynamically handling different question types in the evaluation script?
  3. How should the Robustness Analysis results be interpreted when evaluating fog data?

Additional Notes

This issue affects the accuracy and completeness of the evaluation results when working with fog data. Updating the script to support all relevant question types would improve the robustness analysis.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions