Advances in Citation Text Generation: Leveraging Multi-Source Seq2Seq Models and Large Language Models

Abstract

Citation Text Generation (CTG) in scientific documents often relies on standard summarization techniques, which may not fully capture the nuanced relationship between the citing and cited papers. To address this, we present a Multi-Source Citation Text Generation (M-CTG) architecture, leveraging a Seq2Seq transformer framework enhanced with keyphrase embeddings, graph embeddings, and text representations. This approach aims to produce more contextually relevant and accurate citation texts by integrating multiple sources of information. Our methodology is tested using the newly created CTG-S2ORC dataset, consisting of English-language computer science research papers. In a comparative analysis, we explore the performance of traditional Language Models (LMs) and demonstrate how Large Language Models (LLMs), particularly when integrated with various prompting techniques and Knowledge Graphs, offer superior capabilities in analyzing and generating citation texts. In addition to traditional evaluation metrics, we introduce a custom metric that emphasizes the overlap of key terms and semantic similarity, providing a more comprehensive assessment of our model’s performance. Our code and data are available at https://github.com/midas-research/M-CTG/tree/main.

Publication
CIKM
Yaman Kumar Singla
Yaman Kumar Singla
Senior Research Scientist