Hybrid Graph Neural Network and Large Language Model Framework for Robust Knowledge Graph Question Answering via Retrieval-Augmented Generation

Authors: Sohail Khan, Syed Sibtain Khalid, Naseem Rao, Safdar Tanweer

DOI Link: https://doi.org/10.22214/ijraset.2026.82233

Abstract

Knowledge graphs (KGs) hold facts about the world as connected triples, and they have become a backbone for any system that needs to reason over linked information. The task of Knowledge Graph Question Answering (KGQA) is to map a natural-language question onto the right entity, or set of entities, somewhere inside such a graph. Two communities have been pulling at this problem from opposite ends. Large language models (LLMs) read a question fluently but tend to invent facts and stumble on multi-hop chains they cannot verify. Graph neural networks (GNNs), on the other side, are good at walking through neighbourhoods and weighing relations, but cannot phrase an answer the way a person would. In this work, we describe a practical hybrid that places the two inside a retrieval- augmented generation (RAG) loop. A GNN first prunes a small subgraph around the question\'s seed entities and ranks candidate answers; we then extract shortest paths between the seeds and the top candidates, score each path with a lightweight function that combines GNN attention with degree centrality, and finally verbalise the surviving paths into plain-English sentences before passing them to a 7-8B open-source LLM. The proposed entity-priority scoring step is training-free and runs in milliseconds, yet it lifts Hits@1 by roughly 4-5 percentage points on the harder questions of ComplexWebQuestions. Experiments on WebQuestionsSP and ComplexWebQuestions show competitive or superior results against recent baselines, with the largest gains on multi-entity and three-or-more hop queries. The pipeline uses about one LLM call per question, runs comfortably on a single mid-range GPU, and exposes its reasoning as a short list of human-readable paths that anyone can audit. We argue that this combination of grounded retrieval, lightweight path scoring, and modest model size makes the approach particularly suited to academic and resource-constrained settings.

Introduction

The text introduces Knowledge Graph Question Answering (KGQA) and explains how it has evolved from traditional methods to modern hybrid AI systems that combine graph reasoning with large language models (LLMs).

Knowledge graphs like Wikidata, DBpedia, and Freebase store structured facts as relationships between entities. KGQA systems aim to answer complex natural language questions by traversing these graphs and combining multiple facts (multi-hop reasoning), which cannot be found in a single entry.

Early KGQA approaches included:

Semantic parsing, which converts questions into formal graph queries (e.g., SPARQL) but is fragile to language variation.
Embedding-based methods, which map entities into vector spaces and rank answers, but lack interpretability and struggle with complex reasoning.

With the rise of LLMs (e.g., GPT, LLaMA, Mistral), KGQA shifted again. While LLMs are fluent and flexible, they often produce hallucinated or incorrect answers because they rely on internal training knowledge rather than grounded graph facts.

To address this, newer approaches combine both worlds:

Retrieval-Augmented Generation (RAG) uses external knowledge sources to ground LLM responses.
Graph-based RAG (GraphRAG) specifically retrieves structured graph paths instead of text.
Hybrid GNN–LLM systems use Graph Neural Networks (GNNs) to explore and score paths in the knowledge graph, and LLMs to generate natural language answers from those paths.

However, current hybrid systems are often expensive, depend on large proprietary models, or require heavy computation. They also suffer from issues like poor path selection, especially in dense graphs.

Proposed idea in the paper

The work introduces a simple improvement called entity-priority scoring, which re-ranks graph paths using:

GNN attention signals, and
node importance (degree centrality)

This helps select more meaningful reasoning paths without extra training or computational cost.

Key contributions

A lightweight, training-free scoring method for better path selection.
A practical KGQA pipeline that works with small open-source LLMs (7B–8B models).
Improved performance on complex multi-hop question datasets (notably CWQ).
Better explainability through visible reasoning paths.
A fully reproducible system that runs on modest hardware.

Conclusion

This paper presented a practical hybrid framework for Knowledge Graph Question Answering that combines a question-conditioned graph neural network with a small open-source large language model under a retrieval-augmented generation paradigm. The pipeline links seed entities, builds a focused 2,000-node subgraph by personalised PageRank, scores candidate answers with a three-layer GNN, extracts shortest paths to the top candidates, re-ranks those paths with a lightweight entity-priority scoring step, verbalises the survivors into plain English, and hands them to a 7B-8B LLM under a constrained instruction prompt that explicitly forbids the use of outside knowledge. The central technical contribution is the entity-priority scoring step. It is training-free, runs in milliseconds, and is plug-compatible with any GNN-RAG-style backbone. On the harder ComplexWebQuestions benchmark, it lifts Hits@1 by roughly 4.7 percentage points overall, and by larger margins on three-or-more-hop and multi-entity queries. Across both benchmarks, the framework matches or beats recent hybrid baselines while using around 1.1 LLM calls per question, less than half of what iterative methods require, and runs comfortably on a single mid-range GPU. Beyond the headline numbers, the system has two properties we consider equally important. It is auditable: every prediction is accompanied by the verbalised paths that the LLM was actually shown, which makes failure analysis and viva-style review straightforward. And it is reproducible: every component is open-source, the GNN is small enough to train on a single GPU in a few hours, and the LLM is in a parameter range that any research group can run locally. Taken together, these properties make the framework a viable starting point for academic work on KGQA in resource-constrained settings.

References

[1] Bollacker, K., Evans, C., Paritosh, P., Sturge, T., & Taylor, J. (2008). Freebase: A collaboratively created graph database for structuring human knowledge. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 1247-1250. [2] Vrande?i?, D., & Krötzsch, M. (2014). Wikidata: A free collaborative knowledgebase. Communications of the ACM, 57(10), pp. 78-85. [3] Berant, J., Chou, A., Frostig, R., & Liang, P. (2013). Semantic parsing on Freebase from question-answer pairs. In Proceedings of EMNLP, pp. 1533-1544. [4] Wu, S., Li, Y., Zhu, D., Zhou, G., & Yin, B. (2024). A survey on Knowledge Graph Question Answering: Recent advances and challenges. ACM Computing Surveys, 56(4). [5] Yih, W., Richardson, M., Meek, C., Chang, M.-W., & Suh, J. (2016). The value of semantic parse labeling for knowledge base question answering. In Proceedings of ACL (Short Papers), pp. 201-206. [6] Liang, C., Berant, J., Le, Q., Forbus, K., & Lao, N. (2017). Neural symbolic machines: Learning semantic parsers on Freebase with weak supervision. In Proceedings of ACL, pp. 23-33. [7] Bordes, A., Usunier, N., Garcia-Durán, A., Weston, J., & Yakhnenko, O. (2013). Translating embeddings for modeling multi-relational data. In Advances in Neural Information Processing Systems (NeurIPS), 26. [8] Saxena, A., Tripathi, A., & Talukdar, P. (2020). Improving multi-hop question answering over knowledge graphs using knowledge base embeddings. In Proceedings of ACL, pp. 4498-4507. [9] Chen, Z., Yih, W., Wang, Z., Tang, B., & Cohen, W. W. (2021). A literature review of knowledge graph question answering. International Journal on Semantic Web and Information Systems, 17(4), pp. 1-25. [10] Brown, T. B., Mann, B., Ryder, N., et al. (2020). Language models are few-shot learners. In Advances in Neural Information Processing Systems (NeurIPS), 33, pp. 1877-1901. [11] Touvron, H., Martin, L., Stone, K., et al. (2023). LLaMA 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288. [12] Ji, Z., Lee, N., Frieske, R., et al. (2023). Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12), pp. 1-38. [13] Mallen, A., Asai, A., Zhong, V., Das, R., Khashabi, D., & Hajishirzi, H. (2023). When not to trust language models: Investigating effectiveness of parametric and non-parametric memories. In Proceedings of ACL, pp. 9802-9822. [14] Sun, K., Xu, Y. E., Zha, H., Liu, Y., & Dong, X. L. (2024). Head-to-tail: How knowledgeable are large language models (LLMs)? In Proceedings of NAACL, pp. 311-325. [15] Wang, S., Wei, Z., Choi, Y., & Ren, X. (2023). Can language models solve graph problems in natural language? In Advances in Neural Information Processing Systems (NeurIPS), 36. [16] Petroni, F., Rocktäschel, T., Lewis, P., et al. (2019). Language models as knowledge bases? In Proceedings of EMNLP-IJCNLP, pp. 2463-2473. [17] Kipf, T. N., & Welling, M. (2017). Semi-supervised classification with graph convolutional networks. In Proceedings of ICLR. [18] Veli?kovi?, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., & Bengio, Y. (2018). Graph attention networks. In Proceedings of ICLR. [19] Lewis, P., Perez, E., Piktus, A., et al. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. In Advances in Neural Information Processing Systems (NeurIPS), 33, pp. 9459-9474. [20] ] Shi, W., Min, S., Yasunaga, M., et al. (2024). REPLUG: Retrieval-augmented black-box language models. In Proceedings of NAACL, pp. 8364-8377. [21] Ovadia, O., Brief, M., Mishaeli, M., & Elisha, O. (2024). Fine-tuning or retrieval? Comparing knowledge injection in LLMs. arXiv preprint arXiv:2312.05934. [22] Gao, Y., Xiong, Y., Gao, X., et al. (2024). Retrieval-augmented generation for large language models: A survey. arXiv preprint arXiv:2312.10997. [23] Talmor, A., & Berant, J. (2018). The web as a knowledge-base for answering complex questions. In Proceedings of NAACL-HLT, pp. 641-651. [24] Sun, H., Bedrax-Weiss, T., & Cohen, W. W. (2019). PullNet: Open domain question answering with iterative retrieval on knowledge bases and text. In Proceedings of EMNLP-IJCNLP, pp. 2380-2390. [25] Yasunaga, M., Ren, H., Bosselut, A., Liang, P., & Leskovec, J. (2021). QA-GNN: Reasoning with language models and knowledge graphs for question answering. In Proceedings of NAACL-HLT, pp. 535-546. [26] Guu, K., Lee, K., Tung, Z., Pasupat, P., & Chang, M. (2020). REALM: Retrieval-augmented language model pre-training. In Proceedings of ICML, pp. 3929-3938. [27] Izacard, G., Lewis, P., Lomeli, M., et al. (2023). Atlas: Few-shot learning with retrieval augmented language models. Journal of Machine Learning Research, 24(251), pp. 1-43. [28] Edge, D., Trinh, H., Cheng, N., et al. (2024). From local to global: A graph RAG approach to query-focused summarization. arXiv preprint arXiv:2404.16130. [29] Asai, A., Wu, Z., Wang, Y., Sil, A., & Hajishirzi, H. (2024). Self-RAG: Learning to retrieve, generate, and critique through self-reflection. In Proceedings of ICLR. [30] Mavromatis, C., & Karypis, G. (2024). GNN-RAG: Graph neural retrieval for large language model reasoning. arXiv preprint arXiv:2405.20139. [31] Mavromatis, C., & Karypis, G. (2022). ReaRev: Adaptive reasoning for question answering over knowledge graphs. In Findings of EMNLP, pp. 2447-2458. [32] Luo, L., Li, Y.-F., Haffari, G., & Pan, S. (2024). Reasoning on graphs: Faithful and interpretable large language model reasoning. In Proceedings of ICLR. [33] Sanmartin, F. (2024). KG-RAG: Bridging the gap between knowledge and creativity. arXiv preprint arXiv:2405.12035. [34] Sun, J., Xu, C., Tang, L., Wang, S., Lin, C., Gong, Y., Shum, H.-Y., & Guo, J. (2024). Think-on-Graph: Deep and responsible reasoning of large language model on knowledge graph. In Proceedings of ICLR. [35] [Ma, S., Xu, C., Jiang, X., Li, M., Qu, H., & Guo, J. (2024). Think-on-Graph 2.0: Deep and faithful large language model reasoning with knowledge-guided retrieval augmented generation. arXiv preprint arXiv:2407.10805. [36] Ji, Y., Liu, K., Wang, Z., et al. (2024). DualR: Collaborative hybrid GNN-LLM reasoning for knowledge graph question answering. arXiv preprint arXiv:2406.01145. [37] Li, Y., Zhang, R., Wang, J., et al. (2025). RFE-KGQA: A GNN-enhanced retrieve-filter-evaluate framework for knowledge graph question answering. In Proceedings of IEEE BigData. [38] [Peng, B., Zhu, Y., Liu, Y., et al. (2024). Graph retrieval-augmented generation: A survey. arXiv preprint arXiv:2408.08921. [39] Hu, Y., Lei, Z., Zhang, Z., et al. (2025). GRAG: Graph retrieval-augmented generation. In Findings of NAACL. [40] [Chen, R., Wang, J., Wu, Y., & Li, X. (2024). Temporal knowledge graph question answering: A survey. arXiv preprint arXiv:2406.14191.

Copyright

Copyright © 2026 Sohail Khan, Syed Sibtain Khalid, Naseem Rao, Safdar Tanweer. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET82233

Publish Date : 2026-05-09

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here