Temporal question answering (QA) involves time constraints, with phrases such as “. . . in 2019” or “. . . before COVID”. In the former, time is an explicit condition, in the latter it is implicit. State-of-the- art methods have limitations along three dimensions. First, with neural inference, time constraints are merely soft-matched, giving room to invalid or inexplicable answers. Second, questions with implicit time are poorly supported. Third, answers come from a single source: either a knowledge base (KB) or a text corpus. We propose FAITH (FAIful Temporal Question Answering over Heterogeneous Sources) a temporal QA system that addresses these shortcomings. First, it enforces temporal constraints for faithful answering with tangible evidence. Second, it properly handles implicit questions. Third, it operates over heterogeneous sources, covering KB, text and web tables in a unified manner. The method has three stages: (i) understanding the question and its temporal conditions, (ii) retrieving evidence from all sources, and (iii) faithfully answering the question. As implicit questions are sparse in prior benchmarks, we introduce a principled method for generating diverse questions. Experiments show superior performance over a suite of baselines.
Overview of the FAITH pipeline. The figure illustrates the process of answering q3 (“Queen’s record company when recording Bohemian Rhapsody?” ) and q1 (“Record company of Queen in 1975?” ). For answering q3, two intermediate questions q31 and q32 are generated, and run recursively through the entire FAITH pipeline.Existing benchmarks for temporal QA focus on a single information source (either a KB or a text corpus), and include only few questions with implicit constraints. we devise a new method for automatically creating temporal questions with implicit constraints, with systematic controllability of different aspects, including the relative importance of different source types (text, infoboxes, KB), fractions of prominent vs. long-tail entities, question complexity, and more.
We construct a new dataset named TIQ with 10,000 questions and answers accompanied by supporting evidence.
Sample questions from TIQ:
Topic Entity | Chris Brown |
Evidence | Main: Chris Brown, His fifth album, Fortune, released in 2012, also topped the Billboard 200. [Text] |
Constraint: Chris Brown, Chris Brown, Brown performing in Sydney, 2012 [Infobox] | |
Question | Which album released by Chris Brown topped the Billboard 200 when he was performing in Sydney? |
Answer | Fortune |
Topic Entity | Hulk Hogan |
Evidence | Main: Hulk Hogan, He starred in his own television series, Thunder in Paradise, in 1994. [Text] |
Constraint: Hulk Hogan, he was lured back to the ring when he signed with rival promotion World Championship Wrestling (WCW) in 1994. [Text] | |
Question | What television series was Hulk Hogan starring in when he signed with World Championship Wrestling? |
Answer | Thunder in Paradise |
We construct a new benchmark, TIQ (Temporal Implicit Questions), for temporal QA with 10,000 implicit questions. Questions are derived from heterogeneous sources: Wikipedia text, Wikipedia tables and the Wikidata KB. You can download it below:
Train Set (6000 Questions) Dev Set (2000 Questions) Test Set (2000 Questions)Method | P@1 | MRR | Hit@5 |
---|---|---|---|
FAITH |
0.491 | 0.603 | 0.752 |
EXPLAIGNN Christmann et al. '23 UniK-QA Oğuz et al. '22 |
0.446 0.425 |
0.584 0.480 |
0.765 0.540 |
GPT-4 Open AI. '23 InstructGPT Ouyang et al. '22 |
0.286 0.236 |
--- --- |
--- --- |
UNIQORN Pramanik et al. '21 |
0.237 |
0.255 |
0.277 |
EXAQT Jia et al. '21 TempoQR Mavromatis et al. '22 CRONKGQA Saxena et al. '21 |
0.232 0.011 0.006 |
0.378 0.018 0.011 |
0.587 0.022 0.014 |
To know more about our group, please visit https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/question-answering/.