-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy patherrors.json
More file actions
2667 lines (2667 loc) · 147 KB
/
errors.json
File metadata and controls
2667 lines (2667 loc) · 147 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
[
{
"question_id": "locomo_0_qa1",
"question": "When did Melanie paint a sunrise?",
"golden_answer": "2022",
"category": 2,
"error_type": "WRONG_CITATION",
"cited_evidence": [
"D1:12"
],
"correct_evidence": [
"D1:14"
],
"reasoning": "D1:12 (cited) says 'By the way, take a look at this' with an image of a painting, but does NOT mention when it was painted. D1:14 (not cited) says 'Yeah, I painted that lake sunrise last year! It's special to me.' Session 1 is May 8, 2023, so 'last year' = 2022. The golden answer '2022' is correct, but the citation should be D1:14, not D1:12.",
"correct_answer": "2022"
},
{
"question_id": "locomo_0_qa2",
"question": "What fields would Caroline be likely to pursue in her educaton?",
"golden_answer": "Psychology, counseling certification",
"category": 3,
"error_type": "HALLUCINATION",
"cited_evidence": [
"D1:9",
"D1:11"
],
"correct_evidence": [
"D1:9",
"D1:11"
],
"reasoning": "D1:9 says 'Gonna continue my edu and check out career options'. D1:11 says 'I'm keen on counseling or working in mental health'. Neither 'psychology' nor 'counseling certification' appears anywhere in the transcript. The golden answer infers specific academic fields that Caroline never mentions. A more accurate answer would be 'counseling or mental health'.",
"correct_answer": "Counseling or mental health"
},
{
"question_id": "locomo_0_qa4",
"question": "What is Caroline's identity?",
"golden_answer": "Transgender woman",
"category": 1,
"error_type": "WRONG_CITATION",
"cited_evidence": [
"D1:5"
],
"correct_evidence": [
"D3:1",
"D11:14"
],
"reasoning": "D1:5 (cited) says 'The transgender stories were so inspiring!' which shows Caroline finds transgender stories inspiring, but does NOT explicitly state she IS transgender. D3:1 says 'I talked about my transgender journey' which directly confirms Caroline's identity. D11:14 says 'Art's allowed me to explore my transition and my changing body'. The golden answer is correct but D1:5 is insufficient evidence.",
"correct_answer": "Transgender woman"
},
{
"question_id": "locomo_0_qa5",
"question": "When did Melanie run a charity race?",
"golden_answer": "The sunday before 25 May 2023",
"category": 2,
"error_type": "TEMPORAL_ERROR",
"cited_evidence": [
"D2:1"
],
"correct_evidence": [
"D2:1"
],
"reasoning": "D2:1 says 'I ran a charity race for mental health last Saturday'. Session 2 is 1:14 pm on 25 May, 2023, which is a Thursday. 'Last Saturday' before Thursday May 25 is Saturday May 20, 2023. The golden answer says 'The sunday before 25 May 2023' (Sunday May 21), but D2:1 explicitly says 'Saturday', not 'Sunday'.",
"correct_answer": "The Saturday before 25 May 2023 (approximately May 20, 2023)"
},
{
"question_id": "locomo_0_qa23",
"question": "What books has Melanie read?",
"golden_answer": "\"Nothing is Impossible\", \"Charlotte's Web\"",
"category": 1,
"error_type": "HALLUCINATION",
"cited_evidence": [
"D7:8",
"D6:10"
],
"correct_evidence": [
"D7:8",
"D6:10"
],
"reasoning": "D7:8 says 'This book I read last year reminds me to always pursue my dreams' but does NOT name any book title. D6:10 says 'I loved reading Charlotte\u2019s Web as a kid' which supports 'Charlotte\u2019s Web'. The title 'Nothing is Impossible' does not appear anywhere in the conversation transcript. It is fabricated in the golden answer. The correct answer should reference the unnamed book from D7:8 and 'Charlotte\u2019s Web' from D6:10. D7:11 also mentions 'Becoming Nicole' which the golden answer omits.",
"correct_answer": "Charlotte's Web, an unnamed book about pursuing dreams, and Becoming Nicole"
},
{
"question_id": "locomo_0_qa26",
"question": "When did Melanie read the book \"nothing is impossible\"?",
"golden_answer": "2022",
"category": 2,
"error_type": "HALLUCINATION",
"cited_evidence": [
"D7:8"
],
"correct_evidence": [
"D7:8"
],
"reasoning": "The question premise is based on the fabricated book title 'Nothing is Impossible' which does not appear anywhere in the transcript. D7:8 says 'This book I read last year' without naming any title. Session 7 is July 12, 2023, so 'last year' = 2022. The date '2022' is inferable for the unnamed book, but the question itself contains a hallucinated title, making it unanswerable as stated.",
"correct_answer": "2022 (but the book title 'Nothing is Impossible' is fabricated; the transcript only says 'This book I read last year')"
},
{
"question_id": "locomo_0_qa32",
"question": "What LGBTQ+ events has Caroline participated in?",
"golden_answer": "Pride parade, school speech, support group",
"category": 1,
"error_type": "INCOMPLETE",
"cited_evidence": [
"D5:1",
"D8:17",
"D3:1",
"D1:3"
],
"correct_evidence": [
"D5:1",
"D8:17",
"D3:1",
"D1:3",
"D7:1",
"D9:2",
"D10:3"
],
"reasoning": "The golden answer lists only 3 events (pride parade, school speech, support group) but Caroline also participated in: an LGBTQ conference (D7:1: 'I went to an LGBTQ conference two days ago'), a mentorship program for LGBTQ youth (D9:2), and an LGBTQ activist group (D10:3). The answer is incomplete. While the 3 listed events are correct, the question asks broadly 'What LGBTQ+ events has Caroline participated in?' and the answer omits several significant events.",
"correct_answer": "Pride parade, school speech, support group, LGBTQ conference, mentorship program, activist group"
},
{
"question_id": "locomo_0_qa37",
"question": "What did Melanie paint recently?",
"golden_answer": "sunset",
"category": 1,
"error_type": "WRONG_CITATION",
"cited_evidence": [
"D8:6; D9:17"
],
"correct_evidence": [
"D8:6",
"D9:17"
],
"reasoning": "The evidence ID 'D8:6; D9:17' is a malformed compound ID that doesn't resolve in the transcript (error: NOT FOUND IN TRANSCRIPT). The correct citations are D8:6 and D9:17 as separate IDs. D8:6 shows a painting of a sunset with a palm tree (blip_caption). D9:17 says 'My kids and I just finished another painting like our last one.' The golden answer 'sunset' is correct based on D8:6's blip_caption.",
"correct_answer": "sunset"
},
{
"question_id": "locomo_0_qa38",
"question": "What activities has Melanie done with her family?",
"golden_answer": "Pottery, painting, camping, museum, swimming, hiking",
"category": 1,
"error_type": "WRONG_CITATION",
"cited_evidence": [
"D8:4",
"D8:6",
"D9:1",
"D6:4",
"D1:18",
"D3:14"
],
"correct_evidence": [
"D8:4",
"D8:6",
"D9:1",
"D6:4",
"D1:18",
"D4:8"
],
"reasoning": "D3:14 (cited) says 'I'm lucky to have my husband and kids; they keep me motivated' which does NOT mention any specific activity. It provides no evidence for any of the listed activities. The citation for hiking should be D4:8 which says 'We explored nature, roasted marshmallows around the campfire and even went on a hike.' The golden answer is factually correct but D3:14 is the wrong citation.",
"correct_answer": "Pottery, painting, camping, museum, swimming, hiking"
},
{
"question_id": "locomo_0_qa43",
"question": "What kind of art does Caroline make?",
"golden_answer": "abstract art",
"category": 1,
"error_type": "AMBIGUOUS",
"cited_evidence": [
"D11:12",
"D11:8",
"D9:14"
],
"correct_evidence": [
"D17:13",
"D11:12",
"D14:5",
"D13:11",
"D14:17"
],
"reasoning": "The cited evidence shows: D11:12 (painting of a woman with a red shirt - representational, not abstract), D11:8 (painting with brush - indeterminate), D9:14 (painting of a tree with a bright sun - representational, not abstract). Caroline mentions 'abstract stuff' in D17:13 ('I've been trying out abstract stuff recently') but this is a recent experiment, not her primary art form. Her art is predominantly representational: portraits (D13:11 self-portrait), women (D11:12), sunsets (D14:5), stained glass (D14:17). Labeling her art as 'abstract art' based on one mention of trying abstract stuff is reductive and not supported by the cited evidence.",
"correct_answer": "Paintings including portraits, figurative works, nature scenes, and stained glass; she has recently experimented with abstract art"
},
{
"question_id": "locomo_0_qa48",
"question": "What types of pottery have Melanie and her kids made?",
"golden_answer": "bowls, cup",
"category": 1,
"error_type": "WRONG_CITATION",
"cited_evidence": [
"D12:14",
"D8:4",
"D5:6"
],
"correct_evidence": [
"D5:6",
"D5:8",
"D8:4",
"D12:4"
],
"reasoning": "D12:14 (cited) says 'I appreciate our friendship too, Caroline. You've always been there for me.' This has NOTHING to do with pottery types. The correct evidence for bowls includes D5:6/D5:8 (bowl with black and white flower design), D12:4 (bowl with colorful design). D8:4 supports the cup (cup with dog face). The golden answer is factually correct but D12:14 is completely wrong as a citation. Also missing: pots (D8:2: 'We all made our own pots').",
"correct_answer": "bowls, cup, pots"
},
{
"question_id": "locomo_0_qa56",
"question": "What symbols are important to Caroline?",
"golden_answer": "Rainbow flag, transgender symbol",
"category": 1,
"error_type": "HALLUCINATION",
"cited_evidence": [
"D14:15",
"D4:1"
],
"correct_evidence": [
"D14:15"
],
"reasoning": "D14:15 explicitly mentions 'The rainbow flag mural is important to me' and 'The eagle symbolizes freedom and pride'. The 'rainbow flag' part of the golden answer is correct. However, 'transgender symbol' does not appear anywhere in the transcript text. D4:1 shows a necklace (blip: 'a photo of a person holding a necklace with a cross and a heart') but Caroline describes it in D4:3 as symbolizing 'love, faith and strength' as a gift from her grandma - not as a transgender symbol. The image search query for D4:1 was 'pendant transgender symbol' but that is metadata, not part of the conversation. The eagle from D14:15 would be a more accurate second symbol.",
"correct_answer": "Rainbow flag, eagle (symbolizing freedom and pride)"
},
{
"question_id": "locomo_0_qa66",
"question": "What does Melanie do with her family on hikes?",
"golden_answer": "Roast marshmallows, tell stories",
"category": 1,
"error_type": "AMBIGUOUS",
"cited_evidence": [
"D16:4",
"D10:12"
],
"correct_evidence": [
"D4:8",
"D8:34",
"D10:12",
"D16:2",
"D16:4"
],
"reasoning": "The question asks what Melanie does 'on hikes' but the golden answer describes camping/campfire activities, not hiking activities. D10:12: 'We always look forward to our family camping trip. We roast marshmallows, tell stories around the campfire and just enjoy each other's company.' D16:4: 'We roasted marshmallows and shared stories around the campfire.' Both cited evidence lines explicitly place these activities 'around the campfire' during camping trips. Actual hiking activities are described separately: D4:8: 'We explored nature...and even went on a hike. The view from the top was amazing!' D8:34: 'We enjoy hiking in the mountains and exploring forests.' The golden answer conflates camping with hiking.",
"correct_answer": "On hikes, Melanie's family explores nature, enjoys mountain views, and explores forests (D4:8, D8:34). The marshmallow roasting and storytelling happen around the campfire during camping trips, not on hikes."
},
{
"question_id": "locomo_0_qa70",
"question": "What transgender-specific events has Caroline attended?",
"golden_answer": "Poetry reading, conference",
"category": 1,
"error_type": "WRONG_CITATION",
"cited_evidence": [
"D17:19",
"D15:13"
],
"correct_evidence": [
"D17:19",
"D7:1"
],
"reasoning": "D15:13 (cited) says 'Wow! Did you see that band?' which has NOTHING to do with transgender events or conferences. The conference evidence should be D7:1 ('I went to an LGBTQ conference two days ago') or D5:13 ('I'm going to a transgender conference this month'). D17:19 correctly supports the poetry reading. The golden answer is factually correct but D15:13 is completely wrong as a citation for 'conference'.",
"correct_answer": "Poetry reading, conference"
},
{
"question_id": "locomo_0_qa72",
"question": "When did Melanie's friend adopt a child?",
"golden_answer": "2022",
"category": 2,
"error_type": "WRONG_CITATION",
"cited_evidence": [
"D17:3"
],
"correct_evidence": [
"D17:4"
],
"reasoning": "D17:3 (cited) is Caroline speaking: 'Do you have any experience with adoption, or know anyone who's gone through the process?' This is a question, not evidence of when someone adopted. D17:4 is Melanie's answer: 'Yeah, a buddy of mine adopted last year.' Session 17 is October 13, 2023, so 'last year' = 2022. The golden answer is correct but the citation should be D17:4, not D17:3.",
"correct_answer": "2022"
},
{
"question_id": "locomo_0_qa77",
"question": "Would Melanie go on another roadtrip soon?",
"golden_answer": "Likely no; since this one went badly",
"category": 3,
"error_type": "AMBIGUOUS",
"cited_evidence": [
"D18:3",
"D18:1"
],
"correct_evidence": [
"D18:1",
"D18:3",
"D18:5",
"D18:17"
],
"reasoning": "The golden answer says 'Likely no; since this one went badly' but the evidence tells a more nuanced story. D18:1 says 'We were so lucky he was okay' and D18:3 says 'that was a reminder that life is precious and to cherish our family'. Crucially, the family CONTINUED the trip after the accident: D18:5 says 'Thankfully, they enjoyed the Grand Canyon a lot!' and D18:17 confirms they went hiking the next day. The trip started badly but ended positively. Melanie took it as a reminder to cherish family, not as a reason to avoid future trips. One could argue she would be cautious, but the evidence equally supports that she would continue family adventures.",
"correct_answer": "Uncertain; although the trip started badly with the accident, the family continued and enjoyed the Grand Canyon, suggesting Melanie values family trips"
},
{
"question_id": "locomo_0_qa94",
"question": "What is Melanie's hand-painted bowl a reminder of?",
"golden_answer": "art and self-expression",
"category": 4,
"error_type": "ATTRIBUTION_ERROR",
"cited_evidence": [
"D4:5"
],
"correct_evidence": [
"D4:5"
],
"reasoning": "The question says 'Melanie's hand-painted bowl' but D4:5 is CAROLINE speaking: 'I've got some other stuff with sentimental value, like my hand-painted bowl. A friend made it for my 18th birthday ten years ago. The pattern and colors are awesome-- it reminds me of art and self-expression.' The bowl belongs to Caroline, not Melanie. The golden answer ('art and self-expression') correctly reflects what D4:5 says the bowl reminds of, but the question wrongly attributes the bowl to Melanie.",
"correct_answer": "art and self-expression (but this is Caroline's bowl, not Melanie's)"
},
{
"question_id": "locomo_0_qa106",
"question": "What are the new shoes that Melanie got used for?",
"golden_answer": "Running",
"category": 4,
"error_type": "WRONG_CITATION",
"cited_evidence": [
"D7:19"
],
"correct_evidence": [
"D7:20"
],
"reasoning": "D7:19 (cited) is Caroline asking 'Love that purple color! For walking or running?' This is a question, not the answer. D7:20 is Melanie answering: 'Thanks, Caroline! These are for running. Been running longer since our last chat - a great way to destress and clear my mind.' The golden answer 'Running' is correct but the citation should be D7:20, not D7:19.",
"correct_answer": "Running"
},
{
"question_id": "locomo_0_qa107",
"question": "What is Melanie's reason for getting into running?",
"golden_answer": "To de-stress and clear her mind",
"category": 4,
"error_type": "WRONG_CITATION",
"cited_evidence": [
"D7:21"
],
"correct_evidence": [
"D7:20",
"D7:22"
],
"reasoning": "D7:21 (cited) is Caroline asking 'Wow! What got you into running?' This is a question, not the answer. The answer comes from D7:20 ('a great way to destress and clear my mind') and D7:22 ('I've been running farther to de-stress, which has been great for my headspace'). The golden answer 'To de-stress and clear her mind' is correct but the citation should be D7:20 or D7:22, not D7:21.",
"correct_answer": "To de-stress and clear her mind"
},
{
"question_id": "locomo_0_qa111",
"question": "What creative project do Mel and her kids do together besides pottery?",
"golden_answer": "painting",
"category": 4,
"error_type": "WRONG_CITATION",
"cited_evidence": [
"D8:5"
],
"correct_evidence": [
"D8:6"
],
"reasoning": "D8:5 (cited) is Caroline asking 'What other creative projects do you do with them, besides pottery?' This is a question, not the answer. D8:6 is Melanie answering: 'We love painting together lately, especially nature-inspired ones.' The golden answer 'painting' is correct but the citation should be D8:6, not D8:5.",
"correct_answer": "painting"
},
{
"question_id": "locomo_0_qa132",
"question": "How long has Melanie been creating art?",
"golden_answer": "7 years",
"category": 4,
"error_type": "WRONG_CITATION",
"cited_evidence": [
"D16:7"
],
"correct_evidence": [
"D16:8"
],
"reasoning": "D16:7 (cited) is Caroline speaking: 'Since I was 17 or so... How long have you been into art?' This is Caroline answering about herself and then asking Melanie. D16:8 is Melanie answering: 'Seven years now, and I've finally found my real muses: painting and pottery.' The golden answer '7 years' is correct but the citation should be D16:8, not D16:7.",
"correct_answer": "7 years"
},
{
"question_id": "locomo_0_qa135",
"question": "What setback did Melanie face in October 2023?",
"golden_answer": "She got hurt and had to take a break from pottery.",
"category": 4,
"error_type": "TEMPORAL_ERROR",
"cited_evidence": [
"D17:8"
],
"correct_evidence": [
"D17:8"
],
"reasoning": "Session 17 takes place on October 13, 2023. In D17:8, Melanie says: 'recently I had a setback. Last month I got hurt and had to take a break from pottery, which I use for self-expression and peace.' 'Last month' from October 13 = September 2023. The injury occurred in September 2023, not October. The question incorrectly frames this as an October setback; Melanie merely reported the September injury during an October session.",
"correct_answer": "The setback (getting hurt, taking a break from pottery) occurred in September 2023, not October. Melanie reported it on October 13, 2023."
},
{
"question_id": "locomo_0_qa137",
"question": "What painting did Melanie show to Caroline on October 13, 2023?",
"golden_answer": "A painting inspired by sunsets with a pink sky.",
"category": 4,
"error_type": "INCOMPLETE",
"cited_evidence": [
"D17:12"
],
"correct_evidence": [
"D17:12",
"D17:14"
],
"reasoning": "Melanie showed TWO paintings in Session 17 (October 13, 2023). D17:12: 'Here's one I did last week. It's inspired by the sunsets.' (blip_caption: 'a photo of a painting of a sunset with a pink sky'). D17:14: 'I've done an abstract painting too, take a look!' (blip_caption: 'a photo of a painting on a wall with a blue background'). The golden answer only mentions the sunset painting, omitting the abstract painting with blue background.",
"correct_answer": "Two paintings: (1) a sunset-inspired painting with a pink sky (D17:12), and (2) an abstract painting with a blue background (D17:14)."
},
{
"question_id": "locomo_0_qa138",
"question": "What kind of painting did Caroline share with Melanie on October 13, 2023?",
"golden_answer": "An abstract painting with blue streaks on a wall.",
"category": 4,
"error_type": "ATTRIBUTION_ERROR",
"cited_evidence": [
"D17:14"
],
"correct_evidence": [
"D17:14",
"D17:21"
],
"reasoning": "D17:14 is MELANIE speaking: 'I've done an abstract painting too, take a look!' with blip_caption 'a photo of a painting on a wall with a blue background'. The abstract painting with blue is Melanie's work, NOT Caroline's. Caroline mentions trying abstract stuff in D17:13 but doesn't share an abstract painting. Caroline's images in session 17 are: a poster (D17:17), a 'Trans Lives Matter' sign (D17:19), and a drawing of a woman in a dress (D17:21). The golden answer incorrectly attributes Melanie's abstract painting to Caroline.",
"correct_answer": "A drawing of a woman in a dress (D17:21), a poster (D17:17), and a 'Trans Lives Matter' sign (D17:19)"
},
{
"question_id": "locomo_0_qa139",
"question": "What was the poetry reading that Caroline attended about?",
"golden_answer": "It was a transgender poetry reading where transgender people shared their stories.",
"category": 4,
"error_type": "WRONG_CITATION",
"cited_evidence": [
"D17:18"
],
"correct_evidence": [
"D17:19"
],
"reasoning": "D17:18 (cited) is Melanie asking: 'Nope, never been to something like that. What was it about? What made it so special?' This is a question, not the answer. D17:19 is Caroline answering: 'It was a transgender poetry reading where transgender people shared their stories through poetry.' The golden answer is correct but the citation should be D17:19, not D17:18.",
"correct_answer": "It was a transgender poetry reading where transgender people shared their stories."
},
{
"question_id": "locomo_0_qa144",
"question": "How did Melanie's son handle the accident?",
"golden_answer": "He was scared but reassured by his family",
"category": 4,
"error_type": "ATTRIBUTION_ERROR",
"cited_evidence": [
"D18:6",
"D18:7"
],
"correct_evidence": [
"D18:1",
"D18:3"
],
"reasoning": "D18:7 says 'They were scared but we reassured them and explained their brother would be OK. They're tough kids.' Here 'They' refers to the OTHER children (siblings), NOT the son. 'Their brother' is the son. The evidence describes the siblings' fear about their brother, not the son's own emotional reaction. The golden answer claims 'He was scared' (the son) but the evidence says 'They were scared' (the siblings). D18:1 and D18:3 only tell us the son 'got into an accident' and that he's 'ok' - not that he was 'scared'.",
"correct_answer": "The son was in the accident and is OK (D18:1, D18:3); the evidence does not describe the son's emotional reaction directly"
},
{
"question_id": "locomo_0_qa149",
"question": "What do Melanie's family give her?",
"golden_answer": "Strength and motivation",
"category": 4,
"error_type": "HALLUCINATION",
"cited_evidence": [
"D18:9"
],
"correct_evidence": [
"D18:9"
],
"reasoning": "D18:9 says 'They're really amazing. Wish I was that resilient too. But they give me the strength to keep going.' The word 'strength' is supported by the evidence. However, 'motivation' does not appear in D18:9 or anywhere nearby in the transcript in this context. The golden answer adds 'motivation' which is not present in the cited evidence. A search of the broader transcript shows D3:14 mentions 'they keep me motivated' but that is not cited and is from a different context months earlier.",
"correct_answer": "Strength (to keep going)"
},
{
"question_id": "locomo_1_qa24",
"question": "Which events has Jon participated in to promote his business venture?",
"golden_answer": "fair, networking events, dance competition",
"category": 1,
"error_type": "WRONG_CITATION",
"cited_evidence": [
"D10:1",
"D16:6",
"D8:4"
],
"correct_evidence": [
"D10:1",
"D16:6",
"D8:13"
],
"reasoning": "D8:4 is Gina speaking about her own store: 'Oof, that's tough, Jon. I got some new offers and promotions going on my online store to try and bring in new customers. It's been a wild ride starting my business, but I'm not giving up!' This has nothing to do with Jon's promotional events. The 'dance competition' part of the golden answer is supported by D8:13, where Jon says: 'I'm also hosting a dance competition next month to showcase local talent and bring more attention to my studio.' The golden answer is factually correct, but the citation D8:4 should be D8:13.",
"correct_answer": "fair, networking events, dance competition"
},
{
"question_id": "locomo_1_qa31",
"question": "How long did it take for Jon to open his studio?",
"golden_answer": "six months",
"category": 1,
"error_type": "TEMPORAL_ERROR",
"cited_evidence": [
"D1:2",
"D15:13"
],
"correct_evidence": [
"D1:2",
"D15:5"
],
"reasoning": "D1:2 (session_1, 20 January 2023): Jon says 'Lost my job as a banker yesterday, so I'm gonna take a shot at starting my own business.' He announced his plan to start a business on 20 January 2023 (lost his job on 19 January). D15:5 (session_15, 19 June 2023): Jon says 'The official opening night is tomorrow.' So the studio opens on 20 June 2023. From 20 January 2023 to 20 June 2023 is exactly 5 months. Even counting from 19 January (when he actually lost his job) to 20 June 2023 is 5 months and 1 day. The golden answer of 'six months' is incorrect; it should be 'five months'.",
"correct_answer": "five months"
},
{
"question_id": "locomo_1_qa43",
"question": "What do the dancers in the photo represent?",
"golden_answer": "They are performing at the festival",
"category": 4,
"error_type": "WRONG_CITATION",
"cited_evidence": [
"D1:25"
],
"correct_evidence": [
"D1:26"
],
"reasoning": "D1:25 is Gina asking a question: 'Wow, it looks awesome! Are they yours at the festival? They're so graceful!' This is an interrogative sentence, not a factual assertion. The actual confirmation that the dancers are performing at the festival comes from Jon in D1:26: 'Yeah, they're the ones performing at the festival! They've been practicing hard and will definitely impress with their grace and skill.' The golden answer is factually correct but cites the question (D1:25) rather than the confirming answer (D1:26).",
"correct_answer": "They are performing at the festival"
},
{
"question_id": "locomo_1_qa44",
"question": "What does Gina say about the dancers in the photo?",
"golden_answer": "They look graceful",
"category": 4,
"error_type": "WRONG_CITATION",
"cited_evidence": [
"D1:26"
],
"correct_evidence": [
"D1:25"
],
"reasoning": "The cited evidence D1:26 is Jon speaking: 'Yeah, they're the ones performing at the festival! They've been practicing hard and will definitely impress with their grace and skill.' The question asks what GINA says about the dancers. Gina's actual comment is in D1:25: 'Wow, it looks awesome! Are they yours at the festival? They're so graceful!' The golden answer 'They look graceful' correctly reflects Gina's words ('They're so graceful!'), but the wrong dialog ID is cited -- D1:26 (Jon) instead of D1:25 (Gina).",
"correct_answer": "They look graceful"
},
{
"question_id": "locomo_1_qa47",
"question": "What did Gina find for her clothing store on 1 February, 2023?",
"golden_answer": "The perfect spot for her store",
"category": 4,
"error_type": "AMBIGUOUS",
"cited_evidence": [
"D3:2"
],
"correct_evidence": [
"D3:2",
"D3:3"
],
"reasoning": "D3:2 (Gina): 'Hi Jon! So happy you're pushing forward with dancing! Inspiring. I emailed some wholesalers and one replied and said yes today! I'm over the moon because now I can expand my clothing store and get closer to my customers. Check it out - here's a pic!' The blip_caption for D3:2 describes 'a photography of a shopping mall with a glass entrance and a sign.' Gina's text explicitly mentions finding a WHOLESALER who agreed, not a physical store location. However, Jon in D3:3 responds: 'Wow, Gina! You found the perfect spot for your store. Way to go, hard work's paying off!' Jon's interpretation (finding a physical spot) conflicts with Gina's explicit words (finding a wholesaler). The golden answer 'The perfect spot for her store' comes from Jon's response (D3:3) rather than Gina's own statement (D3:2). The attached image showing a shopping mall adds ambiguity -- it could represent the wholesaler's location or a new store spot.",
"correct_answer": "A wholesaler agreed to supply her store (per Gina's own words in D3:2), though Jon interpreted the news as finding 'the perfect spot' (D3:3)"
},
{
"question_id": "locomo_1_qa57",
"question": "What advice does Gina give to Jon about running a successful business?",
"golden_answer": "build relationships with customers, create a strong brand image, stay positive",
"category": 4,
"error_type": "ATTRIBUTION_ERROR",
"cited_evidence": [
"D7:5"
],
"correct_evidence": [
"D7:5"
],
"reasoning": "D7:5 is Jon speaking to Gina: 'Yeah, brand identity is key. Make sure yours stands out. Also be sure to build relationships with your customers - let them know you care. And don't forget to stay positive and motivate others. Your energy will be contagious!' This is clearly JON giving advice TO GINA about running her clothing store, not the other way around. D7:6 confirms this attribution with Gina responding: 'Thanks for the advice, Jon! Building relationships and creating a strong brand image for my store is something I'm always working on.' The question asks what advice GINA gives to JON, but the cited evidence and the actual advice content show the reverse -- it is JON advising GINA. Searching the rest of the transcript, Gina does not give Jon this specific three-part advice anywhere.",
"correct_answer": "This advice ('build relationships with customers, create a strong brand image, stay positive') was given by JON to GINA in D7:5, not by Gina to Jon. The attribution is reversed."
},
{
"question_id": "locomo_1_qa63",
"question": "What kind of professional experience did Gina get accepted for on May 23, 2023?",
"golden_answer": "fashion internship",
"category": 4,
"error_type": "AMBIGUOUS",
"cited_evidence": [
"D12:1"
],
"correct_evidence": [
"D12:1"
],
"reasoning": "The question states Gina was accepted 'on May 23, 2023' but D12:1 is from session_12 dated '7:18 pm on 27 May, 2023.' Gina says 'I just got accepted for a fashion internship!' -- using 'just' to indicate it happened very recently, on or around 27 May. There is no session or dialog on May 23 in the transcript. The golden answer 'fashion internship' is correct regarding what she was accepted for, but the date premise in the question (May 23) does not match the evidence (May 27). This makes the question itself misleading, though the answer content is accurate.",
"correct_answer": "fashion internship (but the acceptance was announced on 27 May 2023, not May 23 as stated in the question)"
},
{
"question_id": "locomo_2_qa5",
"question": "When did Maria go to the beach?",
"golden_answer": "December 2022",
"category": 2,
"error_type": "WRONG_CITATION",
"cited_evidence": [
"D3:15"
],
"correct_evidence": [
"D3:14"
],
"reasoning": "The cited evidence D3:15 is John's line: \"Wow, nature can be so beautiful! It reminds me of the film camera I had as a kid, I took plenty of beach pics. Thanks for sharing.\" This is about John's childhood film camera, not about Maria going to the beach. The correct evidence is D3:14, where Maria says: \"I took it at the beach last month. Watching the sunset was so peaceful, it made me feel connected to nature and appreciate life's small moments.\" Since session_3 is January 1, 2023, \"last month\" = December 2022. The golden answer \"December 2022\" is correct, but the citation points to the wrong dialog.",
"correct_answer": "December 2022 (answer is correct, citation is wrong)"
},
{
"question_id": "locomo_2_qa10",
"question": "When did Maria meet Jean?",
"golden_answer": "February 24, 2023",
"category": 2,
"error_type": "WRONG_CITATION",
"cited_evidence": [
"D7:1"
],
"correct_evidence": [
"D7:5"
],
"reasoning": "The cited evidence D7:1 is Maria saying: \"Hey John, how's it going? Just wanted to give you the heads up on what's been happening lately- I took a creative writing class recently, and it was super enlightening!\" This says nothing about meeting Jean. The correct evidence is D7:5, where Maria says: \"While volunteering yesterday, I met this amazing woman, Jean, who had been through a lot, yet stayed optimistic and resilient.\" Session_7 is February 25, 2023. \"yesterday\" = February 24, 2023. The golden answer \"February 24, 2023\" is correct, but the citation is wrong.",
"correct_answer": "February 24, 2023 (answer is correct, citation is wrong)"
},
{
"question_id": "locomo_2_qa16",
"question": "When did John get his degree?",
"golden_answer": "The week before 2 April 2023",
"category": 2,
"error_type": "WRONG_CITATION",
"cited_evidence": [
"D9:2"
],
"correct_evidence": [
"D9:4"
],
"reasoning": "The cited evidence D9:2 is John saying: \"Hey Maria! Awesome to hear from you. Sounds like a great way to delve into your feelings. Since we spoke last, I've had quite the adventure!\" with a blip_caption of \"a photo of a certificate of completion of a university degree.\" While the photo implies a degree, the temporal information (\"the week before\") comes from D9:4, where John says: \"I graduated last week!\" Session_9 is April 2, 2023, so \"last week\" = the week before April 2, 2023. The golden answer is correct but the citation should include D9:4 for the temporal claim.",
"correct_answer": "The week before 2 April 2023 (answer is correct, citation is incomplete)"
},
{
"question_id": "locomo_2_qa32",
"question": "What outdoor activities has John done with his colleagues?",
"golden_answer": "Hiking, mountaineering",
"category": 1,
"error_type": "WRONG_CITATION",
"cited_evidence": [
"D18:2",
"D16:2"
],
"correct_evidence": [
"D18:2",
"D16:1"
],
"reasoning": "D18:2 correctly supports mountaineering (John: \"I went on a mountaineering trip last week with some workmates\"). However, D16:2 is Maria's line: \"Hey John! Cool that it's going well - you and your friends look like a great team! I'm busy at the shelter getting ready for a fundraiser next week.\" This does not mention hiking with colleagues. The correct evidence for hiking is D16:1, where John says: \"I got this picture of my workmates when we went on a hiking trip.\" The golden answer \"Hiking, mountaineering\" is correct, but D16:2 should be D16:1.",
"correct_answer": "Hiking, mountaineering (answer is correct, citation D16:2 should be D16:1)"
},
{
"question_id": "locomo_2_qa44",
"question": "What activities has Maria done with her church friends?",
"golden_answer": "Hiking, picnic, volunteer work",
"category": 1,
"error_type": "WRONG_CITATION",
"cited_evidence": [
"D25:2",
"D24:6",
"D28:5"
],
"correct_evidence": [
"D25:2",
"D24:6",
"D28:8"
],
"reasoning": "D25:2 (hiking with church friends) and D24:6 (picnic with friends from church) are correct citations. However, D28:5 is John's line about finding a tech company job: \"Thanks Maria! I may have found a job at a tech company I like that needs my mechanical skills for their hardware team.\" This has nothing to do with Maria's activities with church friends. The correct evidence for volunteer/community work is D28:8, where Maria says: \"Yesterday, I took up some community work with my friends from church. It was super rewarding!\"",
"correct_answer": "Hiking, picnic, volunteer work (answer is correct, D28:5 should be D28:8)"
},
{
"question_id": "locomo_2_qa48",
"question": "When did John have his first firefighter call-out?",
"golden_answer": "The sunday before 3` July 2023",
"category": 2,
"error_type": "TEMPORAL_ERROR",
"cited_evidence": [
"D26:4"
],
"correct_evidence": [
"D26:4"
],
"reasoning": "D26:4 is from session_26, dated \"1:59 pm on 31 July, 2023\" (not July 3). John says: \"Last Sunday we had our first call-out.\" July 31, 2023 is a Monday, so \"Last Sunday\" = July 30, 2023. The golden answer states \"The sunday before 3 July 2023\" which would be July 2, 2023 -- a completely different date. The answer also contains a typographical backtick (3` instead of 3). The correct answer should be \"The Sunday before 31 July 2023\" (i.e., July 30, 2023).",
"correct_answer": "The Sunday before 31 July 2023 (July 30, 2023)"
},
{
"question_id": "locomo_2_qa49",
"question": "What food item did Maria drop off at the homeless shelter?",
"golden_answer": "Cakes",
"category": 1,
"error_type": "WRONG_CITATION",
"cited_evidence": [
"D26:1",
"D25:19"
],
"correct_evidence": [
"D26:1",
"D25:20"
],
"reasoning": "D26:1 correctly states Maria dropped off baked goods at the shelter (\"last week I dropped off that stuff I baked at the homeless shelter\") but does not specify what she baked. D25:19 is John's line: \"Yeah, it's been great for me. Let me know if you need any advice to get started\" -- completely unrelated to baking or cakes. The correct evidence for the specific item \"cakes\" is D25:20, where Maria says: \"I'm off to bake some cakes. Talk to you soon!\" This establishes that the baked items were cakes.",
"correct_answer": "Cakes (answer is correct, D25:19 should be D25:20)"
},
{
"question_id": "locomo_2_qa63",
"question": "How many weeks passed between Maria adopting Coco and Shadow?",
"golden_answer": "two weeks",
"category": 2,
"error_type": "AMBIGUOUS",
"cited_evidence": [
"D30:1",
"D31:2"
],
"correct_evidence": [
"D30:1",
"D31:2"
],
"reasoning": "D30:1 (Aug 11, 2023): Maria says she got Coco \"two weeks ago,\" giving a hard anchor of ~July 28, 2023. D31:2 (Aug 13, 2023): Maria says she adopted Shadow \"last week,\" a vague range of approximately Aug 4-12, 2023. The hard date (July 28) does not fall within the vague range, so the adoptions are definitively separate events, but the gap cannot be resolved to a point: it spans approximately 7-15 days (~1 to ~2 weeks) depending on when during \"last week\" Shadow was adopted. The golden answer of \"two weeks\" is the upper extreme of this range, not a confirmed value.",
"correct_answer": "Approximately 1-2 weeks (7-15 days); one adoption date is hard (~July 28, 2023) but the other is a vague range (\"last week\" = ~Aug 4-12, 2023), so the exact gap is unresolvable"
},
{
"question_id": "locomo_2_qa68",
"question": "What type of workout class did Maria start doing in December 2023?",
"golden_answer": "aerial yoga",
"category": 4,
"error_type": "TEMPORAL_ERROR",
"cited_evidence": [
"D1:3"
],
"correct_evidence": [
"D1:3"
],
"reasoning": "The question asks about \"December 2023\" but the cited evidence D1:3 is from session_1, dated \"11:01 am on 17 December, 2022\" (December 2022, not 2023). Maria says: \"Just started doing aerial yoga, it's great.\" The entire conversation dataset spans December 2022 to August 2023, so there is no December 2023 data. The golden answer \"aerial yoga\" is correct for the evidence, but the question references the wrong year. The question should say \"December 2022.\"",
"correct_answer": "aerial yoga (answer is correct, but question says 2023 when it should say 2022)"
},
{
"question_id": "locomo_2_qa69",
"question": "What did Maria donate to a homeless shelter in December 2023?",
"golden_answer": "old car",
"category": 4,
"error_type": "TEMPORAL_ERROR",
"cited_evidence": [
"D2:1"
],
"correct_evidence": [
"D2:1"
],
"reasoning": "The question asks about \"December 2023\" but the cited evidence D2:1 is from session_2, dated \"6:10 pm on 22 December, 2022\" (December 2022, not 2023). Maria says: \"I donated my old car to a homeless shelter I volunteer at yesterday.\" The conversation dataset ends in August 2023, so December 2023 does not exist in the data. The golden answer \"old car\" is correct for the evidence, but the question references the wrong year. The question should say \"December 2022.\"",
"correct_answer": "old car (answer is correct, but question says 2023 when it should say 2022)"
},
{
"question_id": "locomo_2_qa116",
"question": "Why did Maria need to help her cousin find a new place to live?",
"golden_answer": "Her cousin had to leave and find a new place in a hurry.",
"category": 4,
"error_type": "WRONG_CITATION",
"cited_evidence": [
"D21:5"
],
"correct_evidence": [
"D21:5",
"D21:7"
],
"reasoning": "The cited evidence D21:5 says: \"my cousin just had a tough time recently, so I'm lending a hand in helping her find a new place.\" This gives the general context but does not contain the specific detail about leaving \"in a hurry.\" The golden answer states \"Her cousin had to leave and find a new place in a hurry\" which comes from D21:7 (uncited): \"Things have been tough for her lately. She had to leave and find a new place in a hurry, which has been really stressful, but she's making progress.\" The citation should include D21:7.",
"correct_answer": "Her cousin had to leave and find a new place in a hurry (answer is correct, citation should include D21:7)"
},
{
"question_id": "locomo_2_qa129",
"question": "What does John think about trying new classes at the yoga studio?",
"golden_answer": "Trying new classes is a fun way to switch up the exercise routine.",
"category": 4,
"error_type": "ATTRIBUTION_ERROR",
"cited_evidence": [
"D25:14"
],
"correct_evidence": [
"D25:15"
],
"reasoning": "The cited evidence D25:14 is Maria's line (not John's): \"Cool, John! Trying new classes sounds like a fun way to switch up your exercise routine - I should give it a go!\" The golden answer directly quotes Maria's words but attributes them to John. John's actual response is in D25:15: \"Yeah, Maria! Trying new stuff is a great way to push yourself and mix things up. Let me know if you need any suggestions!\" The question asks what John thinks, but the answer uses Maria's phrasing.",
"correct_answer": "Trying new stuff is a great way to push yourself and mix things up (from D25:15, John's actual words)"
},
{
"question_id": "locomo_3_qa4",
"question": "What pets wouldn't cause any discomfort to Joanna?",
"golden_answer": "Hairless cats or pigs,since they don't have fur, which is one of the main causes of Joanna's allergy.",
"category": 3,
"error_type": "HALLUCINATION",
"cited_evidence": [
"D2:23"
],
"correct_evidence": [
"D2:23",
"D5:11"
],
"reasoning": "D2:23: 'I'm allergic to most reptiles and animals with fur. It can be a bit of a drag, but I find other ways to be happy.' D5:11: 'I wish I wasn't allergic! I would get two turtles today if I could! I found out recently I'm allergic to cockroaches as well, so who knows if I'll ever get a pet.' The golden answer fabricates specific pet recommendations ('hairless cats or pigs') that appear nowhere in the transcript. Neither 'hairless', 'cats', nor 'pigs' is mentioned anywhere in the conversation. The transcript only establishes Joanna's allergies (reptiles, furry animals, cockroaches) without suggesting any specific safe pets.",
"correct_answer": "The transcript does not suggest specific pets. Joanna is allergic to reptiles, animals with fur, and cockroaches (D2:23, D5:11). No specific safe pet types are proposed in the conversation."
},
{
"question_id": "locomo_3_qa5",
"question": "What are Joanna's hobbies?",
"golden_answer": "Writing, watchingmovies, exploringnature, hanging withfriends.",
"category": 1,
"error_type": "INCOMPLETE",
"cited_evidence": [
"D1:10",
"D2:25"
],
"correct_evidence": [
"D1:10",
"D2:25"
],
"reasoning": "The golden answer lists writing, watching movies, exploring nature, and hanging with friends. This is incomplete. D1:10 (cited evidence) explicitly says \"Besides writing, I also enjoy reading, watching movies, and exploring nature\" but reading is omitted from the golden answer. Additional hobbies mentioned elsewhere in the transcript include hiking (D8:4, D11:3), cooking and baking (D10:9, D10:13), acting (D9:7), and DIY/crafts (D22:19, D22:21). The golden answer is not fabricated but substantially incomplete.",
"correct_answer": "Writing, reading, watching movies, exploring nature, hiking, cooking and baking, hanging with friends, acting (past passion), DIY/crafts"
},
{
"question_id": "locomo_3_qa24",
"question": "When is Nate hosting a gaming party?",
"golden_answer": "The weekend after 3June, 2022.",
"category": 2,
"error_type": "TEMPORAL_ERROR",
"cited_evidence": [
"D14:20"
],
"correct_evidence": [
"D14:20"
],
"reasoning": "D14:20 (session: 5:44 pm on 3 June, 2022) states: 'I'm organizing a gaming party two weekends later'. 'Two weekends later' from June 3 means approximately June 17-18, 2022 (two weeks after). The golden answer says 'The weekend after 3 June, 2022' which would be June 4-5 (one weekend after), not two weekends later.",
"correct_answer": "Two weekends after 3 June, 2022 (approximately June 17-18, 2022)."
},
{
"question_id": "locomo_3_qa34",
"question": "What book recommendations has Joanna given to Nate?",
"golden_answer": "\"Little Women\",'A Court of Thorns andRoses'.",
"category": 1,
"error_type": "HALLUCINATION",
"cited_evidence": [
"D3:17",
"D19:14",
"D19:16"
],
"correct_evidence": [
"D1:16",
"D3:17",
"D9:14",
"D19:14"
],
"reasoning": "The golden answer claims Joanna recommended 'A Court of Thorns and Roses' to Nate, but this title never appears in the transcript text. In D19:14-16, Joanna generically recommends 'finding a fantasy book series', then NATE shows a photo of a specific series (D19:15, whose image query reveals it as 'A Court of Thorns and Roses'), and Joanna approves with 'That's a great one!' This is Nate selecting a book and Joanna validating his choice, not Joanna recommending it. Also, 'Little Women' in D3:17 is a movie recommendation ('I just watched'), not a book recommendation as the golden answer implies.",
"correct_answer": "'Eternal Sunshine of the Spotless Mind' movie (D1:16), 'Little Women' movie (D3:17). Joanna also generically recommended finding a fantasy book series (D19:14), but never named a specific title."
},
{
"question_id": "locomo_3_qa43",
"question": "How long did it take for Joanna to finish writing her book?",
"golden_answer": "four months",
"category": 2,
"error_type": "TEMPORAL_ERROR",
"cited_evidence": [
"D17:14",
"D22:9"
],
"correct_evidence": [
"D17:14",
"D22:9"
],
"reasoning": "D17:14 (session: 2:34 pm on 10 July, 2022) shows Joanna saying 'I actually started on a book recently'. D22:9 (session: 11:15 am on 6 October, 2022) says 'I finished up my writing for my book last week' (approximately late September 2022). From mid-July to late September is approximately 2.5-3 months, not four months. The golden answer of 'four months' overcounts the duration.",
"correct_answer": "Approximately three months (mid-July to late September 2022)."
},
{
"question_id": "locomo_3_qa50",
"question": "What is something Nate gave to Joanna that brings her a lot of joy?",
"golden_answer": "stuffed toy pup",
"category": 1,
"error_type": "WRONG_CITATION",
"cited_evidence": [
"D13:9",
"D24:2"
],
"correct_evidence": [
"D13:9",
"D24:4",
"D24:8"
],
"reasoning": "The cited evidence D24:2 reads: 'Hey Nate! I have been revising and perfecting the recipe I made for my family and it turned out really tasty. What's been happening with you?' This is about recipe revision and has nothing to do with the stuffed toy pup. The correct evidence for the stuffed pup bringing joy should include D24:4 ('I still have that stuffed animal dog you gave me! I named her Tilly, and she's always with me while I write') and D24:8 ('Tilly helps me stay focused and brings me so much joy').",
"correct_answer": "stuffed toy pup (Tilly)"
},
{
"question_id": "locomo_3_qa51",
"question": "When did Nate get Tilly for Joanna?",
"golden_answer": "25 May, 2022",
"category": 1,
"error_type": "WRONG_CITATION",
"cited_evidence": [
"D13:9",
"D24:2"
],
"correct_evidence": [
"D13:9",
"D24:4"
],
"reasoning": "The cited evidence D24:2 reads: 'Hey Nate! I have been revising and perfecting the recipe I made for my family and it turned out really tasty.' This has nothing to do with when Nate got the stuffed pup. The name 'Tilly' is first revealed in D24:4: 'I still have that stuffed animal dog you gave me! I named her Tilly'. D13:9 correctly shows Nate giving the stuffed animal on May 25, 2022.",
"correct_answer": "25 May, 2022"
},
{
"question_id": "locomo_3_qa52",
"question": "How many of Joanna's writing have made it to the big screen?",
"golden_answer": "two",
"category": 1,
"error_type": "HALLUCINATION",
"cited_evidence": [
"D15:1",
"D25:2"
],
"correct_evidence": [
"D15:1",
"D25:2",
"D25:4"
],
"reasoning": "The golden answer claims 'two' of Joanna's writings made it to the big screen, citing D15:1 and D25:2. However, D25:4 (Joanna, session: 8:16 pm on 25 October, 2022) explicitly states: 'I know this is the third time it's happened, but its just so awesome!' This clearly indicates THREE instances of her writing appearing on the big screen, not two. The third instance is mentioned but its evidence was not cited separately; Joanna herself counts it as the third time.",
"correct_answer": "Three (as stated by Joanna in D25:4: 'I know this is the third time it's happened')."
},
{
"question_id": "locomo_3_qa54",
"question": "When was Joanna's second movie script shown on the big screens?",
"golden_answer": "The Sunday before 25October, 2022.",
"category": 2,
"error_type": "WRONG_CITATION",
"cited_evidence": [
"D25:1"
],
"correct_evidence": [
"D25:2"
],
"reasoning": "The cited evidence D25:1 reads: 'Hey Joanna, what's been up since we last chatted? How's it going?' This is Nate's greeting and contains no information about when the movie was shown. The actual evidence is D25:2 (Joanna): 'Another movie script that I contributed to was shown on the big screen last Sunday for the first time!' The session date is October 25, 2022, so 'last Sunday' = the Sunday before October 25 = October 23, 2022. The golden answer 'The Sunday before 25 October, 2022' is factually correct but cites the wrong dialog.",
"correct_answer": "The Sunday before 25 October, 2022 (October 23, 2022)."
},
{
"question_id": "locomo_3_qa55",
"question": "What is Joanna inspired by?",
"golden_answer": "Personal experiences,her own journey ofself discovery, Nate,nature, validation,stories about findingcourage and takingrisks, people she knows, stuff she sees, imagination",
"category": 1,
"error_type": "WRONG_CITATION",
"cited_evidence": [
"D4:6",
"D7:6",
"D11:11",
"D26:3",
"D26:7",
"D25:10"
],
"correct_evidence": [
"D4:16",
"D7:6",
"D11:11",
"D25:10",
"D26:3",
"D26:7"
],
"reasoning": "The cited evidence D4:6 reads: 'Yeah, definitely! I'm keen to try your recipe. Always up for something sweet.' This is about trying a dessert recipe and has nothing to do with Joanna's sources of inspiration. The golden answer includes 'Personal experiences, her own journey of self-discovery' - the correct evidence for this is D4:16: 'It was inspired by personal experiences and my own journey of self-discovery.' D4:6 should be replaced with D4:16.",
"correct_answer": "Personal experiences, her own journey of self-discovery, Nate, nature, validation, stories about finding courage and taking risks, people she knows, stuff she sees, imagination"
},
{
"question_id": "locomo_3_qa58",
"question": "What things has Nate reccomended to Joanna?",
"golden_answer": "A pet,\"The Lord of the Rings\" movies,a dragon book series,coconut flavoring,\"Project Hail Mary\" book,Xenoblade Chronicles, dairy-free margarine, coconut oil",
"category": 1,
"error_type": "HALLUCINATION",
"cited_evidence": [
"D2:14",
"D9:12",
"D9:14",
"D10:11",
"D19:17",
"D27:23",
"D10:19"
],
"correct_evidence": [
"D2:14",
"D9:12",
"D9:14",
"D10:11",
"D19:17",
"D20:15",
"D27:23"
],
"reasoning": "The golden answer includes 'Project Hail Mary book' and 'a dragon book series' but neither title appears anywhere in the transcript text. D19:17 recommends a series with 'awesome battles and interesting characters' (image query: 'space opera book series'), but never names it as 'Project Hail Mary'. D9:14 mentions a series with 'adventures, magic, and great characters' (image query: 'fantasy novels dragon cover series'), but never calls it a 'dragon book series'. Additionally, evidence ID D10:19 does not exist in the dialog index. The golden answer also lists 'dairy-free margarine, coconut oil' which comes from D20:15, not D10:19.",
"correct_answer": "A pet (D2:14), 'The Lord of the Rings' movies (D9:12), a fantasy book series (D9:14), coconut flavoring (D10:11), a book series with battles and characters (D19:17), Xenoblade Chronicles (D27:23), dairy-free margarine or coconut oil (D20:15)."
},
{
"question_id": "locomo_3_qa61",
"question": "What mediums does Nate use to play games?",
"golden_answer": "Gamecube, PC,Playstation.",
"category": 1,
"error_type": "HALLUCINATION",
"cited_evidence": [
"D22:2",
"D27:21",
"D27:15"
],
"correct_evidence": [
"D27:15",
"D27:17",
"D27:23"
],
"reasoning": "The golden answer lists 'Gamecube, PC, Playstation' as mediums Nate uses to play games. The word 'Gamecube' never appears anywhere in the transcript text. 'Playstation' also never appears. The cited evidence consists of photos (trophy with controller, headphones with controller, desk with monitor), but the blip captions only show generic 'game controller' descriptions without identifying specific consoles. A PC is inferable from D27:15-17 (computer setup described as where he practices and competes). A Nintendo Switch might be inferable from D27:23 (Xenoblade Chronicles is a Nintendo game). But 'Gamecube' and 'Playstation' are not supported by any transcript evidence.",
"correct_answer": "PC (D27:15-17), Nintendo console (D27:23 - Xenoblade Chronicles). Other platforms cannot be determined from the transcript text alone."
},
{
"question_id": "locomo_3_qa66",
"question": "What alternative career might Nate consider after gaming?",
"golden_answer": "an animalkeeper at a localzoo and workingwith turtles; as heknows a great dealabout turtles andhow to care for them,and he enjoys it.",
"category": 3,
"error_type": "HALLUCINATION",
"cited_evidence": [
"D5:8",
"D19:3",
"D25:19",
"D28:25"
],
"correct_evidence": [
"D5:8",