Great work on the memory benchmark!
But I noticed LOCOMO repo/paper didn't state the mapping relation between category number and type.
I notice you're using single-hop, multi-hop, temporal, open-domain, adversial, but what are number of those type?
I looked into the dataset, and I think category 2 is temporal, but I can't assert the rest.