Issue: eval.py Does Not Support All Question Types for Robutness Evaluation


## Description
The `eval.py` script currently only supports specific question types (perception, prediction, and planning) for fog evaluation. The `self.results` dictionary only includes these categories, which causes issues when evaluating fog JSON files that may contain other question types.

## Current Behavior
The `eval.py` script initializes the `scores` dictionary with only the following categories:
```python
scores = {
    "perception": {"MCQ": {}, "VQA": {}},
    "prediction": {"VQA": {}},
    "planning": {"VQA": {}},
    "behavior": {"MCQ": {}}
}
```
This limits the evaluation to only perception, prediction, and planning question types, and excludes other potential types.

## Expected Behavior
The `eval.py` script should support all relevant question types for fog evaluation, including any additional types that may be present in the JSON files.

## Steps to Reproduce
1. Attempt to evaluate a fog JSON file containing question types outside of perception, prediction, and planning.
2. Observe that the evaluation results are incomplete or incorrect due to the limited `scores` dictionary.

## Suggested Improvements
1. **Update the `scores` Dictionary**: Include all relevant question types that may be present in fog evaluation JSON files.
2. **Modify the Evaluation Logic**: Ensure the script can handle and evaluate all supported question types dynamically.

## Example
Here’s a potential modification to the `scores` dictionary:
```python
scores = {
    "perception": {"MCQ": {}, "VQA": {}},
    "prediction": {"VQA": {}},
    "planning": {"VQA": {}},
    "behavior": {"MCQ": {}},
    "robust_qas": {"VQA": {}}  # Add fog-specific question types
}
```

## Questions
1. How should the `scores` dictionary be updated to support all question types for [fog/rain/etc.] evaluation?
2. What is the recommended approach for dynamically handling different question types in the evaluation script?
3. How should the Robustness Analysis results be interpreted when evaluating fog data?

## Additional Notes
This issue affects the accuracy and completeness of the evaluation results when working with fog data. Updating the script to support all relevant question types would improve the robustness analysis.
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Issue: eval.py Does Not Support All Question Types for Robutness Evaluation #6

Description

Current Behavior

Expected Behavior

Steps to Reproduce

Suggested Improvements

Example

Questions

Additional Notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue: eval.py Does Not Support All Question Types for Robutness Evaluation #6

Description

Description

Current Behavior

Expected Behavior

Steps to Reproduce

Suggested Improvements

Example

Questions

Additional Notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions