Should we include some data-to-text tasks? Human should do reasonably well on these tasks. Suggestion datasets: 1. [web_nlg](https://huggingface.co/datasets/web_nlg) 2. [e2e](https://huggingface.co/datasets/e2e_nlg_cleaned)