Knowledge Graph Generation and Multimodal Fusion for Reasoning on Collaboration: A Hybrid Approach Combining Graph Neural Networks, Language Models, and Logical Inference

IMT Atlantique

Theme Data analytics & Artificial Intelligence

Theme Sustainable Transformation of Organisations

Knowledge Graphs

LLM reasoning

Graph Neural Networks

Natural Language Inference

Multimodal Fusion

Practical information

Thesis supervisor

Cécile Bothorel

Supervisors

Cotutelle entre Cécile Bothorel, professeure à IMT Atlantique, et Gregorio Robles, professeur à l'Universidad Rey Juan Carlos (EULiST)

Thesis supervisory team

Équipe Complex Networks, Pole DMID, Lab-STICC

More information
Désolé... Ce formulaire est clos aux nouvelles soumissions.

Description

We are moving towards a world in which software development will increasingly rely on the collaboration of AI agents. This project is the first step in forging effective collaboration between humans and AI agents. By conducting in-depth analysis of online collaboration within large communities, we propose the use of a knowledge graph and reasoning capabilities to improve task management, expertise discovery, knowledge integration and governance transparency within open-source software (OSS) ecosystems. In the near future, when AI and humans develop software systems together, this will enable the incorporation of human accountability into the development process.

Open-source software (OSS) repositories thrive as vibrant, decentralized communities where developers collectively build, refine, and disseminate knowledge. They offer large amounts of semi-structured information to build a knowledge graph (KG) that captures collaboration; to enable advanced reasoning, fusing the disparate modalities into a unified representation is required. The way to encode all types of information (conversation networks (interactions in issues/PRs), unstructured text  (comments, descriptions), code structure, task  flow, etc.) and the fusion of these heterogenous sources of information constitute the main research avenue of our project. The state-of-the-art leverages multimodal fusion frameworks [Li & Tang, 2025]) including graph-based  neural architectures [Liu et al., 2020] [Liu et al., 2023][Wu et al., 2021], and LLMs [Yang et al., 2024] to achieve  holistic, informative, and actionable knowledge graphs that enable advanced reasoning. 

We propose a hybrid framework that integrates Graph Neural Networks (GNNs), Large Language Models (LLMs), and Natural Language Inference (NLI) to construct multimodal knowledge graphs from OSS data. Our approach advances the field through: 1) Novel fusion techniques for heterogeneous data without complete retraining, in particular via model ensembling or knowledge injection techniques, including the use of existing ontologies [Smith et al., 2011] [Nundlall & Nagowah, 2022], 2) NLI-driven methods to infer causal relationships from developer conversations and relationships between issues as workflows, 3) Scalable architectures for real-world deployment, 4) the release of a public benchmark for duplicate detection and causal relationships between issues and 5) Empirical evaluation of performance (accuracy, cost, robustness) on public datasets and industrial cases (use of LLM-based multi-agent (LLM-MA) for complex problem-solving and world simulation [Guo et al., 2024]).
 

Bibliography

[dataset github, 2022] HuggingFace (2022). The Stack – GitHub issues dataset. https://huggingface.co/datasets/bigcode/the-stack-github-issues 

[Graphcodebert, 2020] Guo, D., Ren, S., Lu, S., Feng, Z., Tang, D., Liu, S., ... & Zhou,  M. (2020). Graphcodebert: Pre-training code representations with data  flow. arXiv preprint arXiv:2009.08366.

[Guo et al., 2024] Guo, Taicheng and Chen, Xiuying and Wang, Yaqi and Chang, Ruidi and Pei, Shichao and Chawla,Nitesh V. and Wiest, Olaf and Zhang, Xiangliang, Large Language Model Based Multi-agents: A Survey of Progress and Challenges, IJCAI24 (2024). https://doi.org/10.24963/ijcai.2024/890 

[Li & Tang, 2025] Li, S., & Tang, H. (2025). Multimodal alignment and fusion: A survey. arXiv preprint arXiv:2411.17040. Accepted to IJCV 2025, https://doi.org/10.48550/arXiv.2411.17040 

[Liu et al., 2020] Liu, J., Sui, D., Liu, K., & Zhao, J. (2020, December). Graph-based knowledge integration for question answering over dialogue. In Proceedings of the 28th International Conference on Computational Linguistics (pp. 2425-2435).

[Liu et al., 2023] Liu, Z., Wei, J., Li, R., & Zhou, J. (2023, October). SFusion: Self-attention based n-to-one multimodal fusion block. In International conference on medical image computing and computer-assisted intervention (pp. 159-169). Cham: Springer Nature Switzerland.

[Nundlall & Nagowah, 2022] Nundlall C, Nagowah SD. Task allocation and coordination process in distributed agile software development: an ontology based approach. Inf Technol Manag. 2022;23(3):167-192. https://doi.org/10.1007/s10799-022-00365-9. Epub 2022 May 10. PMID: 37521512; PMCID: PMC9086155.

[Smith et al., 2011] Smith, B. L., Tamma, V., & Wooldridge, M. (2011). An Ontology for Coordination. Applied Artificial Intelligence, 25(3), 235–265. https://doi.org/10.1080/08839514.2011.553376

[Wu et al., 2021] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and P. Yu. A Comprehensive Survey on Graph Neural Networks, IEEE Transactions on Neural Networks and Learning Systems, 32 (1): 4-24 (2021), https://doi.org/10.1109/TNNLS.2020.2978386 

[Yang et al., 2024] Yang, R., Yang, B., Feng, A., Ouyang, S., Blum, M., She, T., ... & Li, I. (2024). Graphusion: A rag framework for knowledge graph construction with a global perspective. arXiv preprint arXiv:2410.17600.

[Zhong et al., 2023] Lingfeng Zhong, Jia Wu, Qian Li, Hao Peng, and Xindong Wu. 2023. A Comprehensive Survey on Automatic Knowledge Graph Construction. ACM Comput. Surv. 56, 4, Article 94 (November 2023), 62 pages. https://doi.org/10.1145/3618295 

[Zhu et al., 2024] Zhu, Y., Wang, X., Chen, J., Qiao, S., Ou, Y., Yao, Y., ... & Zhang,  N. (2024). LLMs for knowledge graph construction and reasoning: Recent  capabilities and future opportunities. World Wide Web, 27(5), 58.

[Nundlall & Nagowah, 2022] Nundlall C, Nagowah SD. Task allocation and coordination process in distributed agile software development: an ontology based approach. Inf Technol Manag. 2022;23(3):167-192. https://doi.org/10.1007/s10799-022-00365-9. Epub 2022 May 10. PMID: 37521512; PMCID: PMC9086155.