A framework for low-resource bias detection and mitigation in Transformer-based AI models

Télécom SudParis

Theme Data analytics & Artificial Intelligence

AI Fairness and Safety

Natural Language Processing

Low-resource Fairness

Bias Mitigation

Accessible AI Safety

Practical information

Thesis supervisor

Luca Benedetto

Supervisors

Luca Benedetto, Amel Bouzeghoub

Thesis supervisory team

ACMES Team, of the Samovar lab

More information
Désolé... Ce formulaire est clos aux nouvelles soumissions.

Description

Modern Large Language Models (LLMs) are increasingly deployed across education [11, 2], healthcare [12], recommender systems [5, 20], and other domains; often, they are used by users with limited technical expertise. While this democratises AI access, ample research has documented that these models exhibit significant biases [7, 8, 13] that vary across tasks [15, 18, 19] and cultural contexts [1, 14]. Methods for bias detection and mitigation exist [9] but are computationally expensive [4, 6, 9, 16], thus creating a fairness divide. While AI access is democratised, the access to AI fairness and safety is not, since many AI adopters lack resources for comprehensive bias evaluation and mitigation. 

This project directly targets this fairness divide, with the objective of developing low-resource bias evaluation and mitigation techniques that are usable by organisations with small budgets, and thus works towards the democratisation of AI fairness. We define low-resource not by the availability of data for a specific language, but by a set of technical constraints: minimal computational cost, minimal model access, minimal model modifications, and minimal human labour. The innovative nature of this research lies in the focus on low-resource environments, and the three-way trade-off between mitigation cost, bias reduction, and task accuracy (a gap in current research). This project will: i) develop low-cost black-box evaluation methods, comparing them with more computationally expensive alternatives, ii) quantify the performance-per-compute-cost curve for bias mitigation across different tasks, iii) identify optimal trade-offs for low-cost mitigation techniques. 

The primary outcome will be a low-resource fairness toolkit, providing AI adopters with the tools to perform low-resource bias evaluation and mitigation on their application domains and tasks. We will primarily focus on black-box models, using both commercial models (which represent the most common AI adoption scenario), and open-weight models. In this setting, there is no access to the internal states of the models, which is a significant obstacle for AI audits [3]. For bias detection, we aim at overcoming this by taking inspiration from psychometrics, modelling exhibited biases as latent traits (analogous to the skills in Item Response Theory [10]) and measuring them with a strategic test (analogous to Computerized Adaptive Testing [17]). For mitigation, we will focus on training-free methods, implementable at pre-inference (input editing) or post-inference (output editing), aiming for a lightweight alternative to expensive alignment techniques. 

Once completed, this toolkit will enable AI adopters to assess the safety of AI models in their domains with limited resources, supporting the democratization of safe AI rather than unsafe AI.

Bibliography

 
[1] Badr Alkhamissi, Muhammad ElNokrashy, Mai Alkhamissi, and Mona Diab. Investigating cultural alignment of large language models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 12404–12422, 2024.  
[2] Andrew Caines, Luca Benedetto, Shiva Taslimipoor, Christopher Davis, Yuan Gao, Øistein E Andersen, Zheng Yuan, Mark Elliott, Russell Moore, Christopher Bryant, et al. On the application of large language models for language teaching and assessment technology. In LLM@ AIED, 2023.  
[3] Stephen Casper, Carson Ezell, Charlotte Siegmann, Noam Kolt, Taylor Lynn Curtis, Benjamin Bucknall, Andreas Haupt, Kevin Wei, Jérémy Scheurer, Marius Hobbhahn, et al. Black-box access is insufficient for rigorous ai audits. In Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency, pages 2254–2272, 2024.  
[4] Sam Corbett-Davies, Emma Pierson, Avi Feller, Sharad Goel, and Aziz Huq. Algorithmic decision making and the cost of fairness. In Proceedings of the 23rd acm sigkdd international conference on knowledge discovery and data mining, pages 797–806, 2017.  
[5] Sunhao Dai, Ninglu Shao, Haiyuan Zhao, Weijie Yu, Zihua Si, Chen Xu, Zhongxiang Sun, Xiao Zhang, and Jun Xu. Uncovering chatgpt’s capabilities in recommender systems. In Proceedings of the 17th ACM Conference on Recommender Systems, pages 1126–1132, 2023.  
[6] MaryBeth Defrance, Maarten Buyl, and Tijl De Bie. Abcfair: an adaptable benchmark approach for comparing fairness methods. Advances in Neural Information Processing Systems, 37:40145–40163, 2024.  
[7] Yashar Deldjoo. Understanding biases in chatgpt-based recommender systems: Provider fairness, temporal stability, and recency. ACM Transactions on Recommender Systems, 2024.  
[8] Eve Fleisig, Genevieve Smith, Madeline Bossi, Ishita Rustagi, Xavier Yin, and Dan Klein. Linguistic Bias in ChatGPT: Language Models Reinforce Dialect Discrimination. In Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen, editors, Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 13541–13564, Miami, Florida, USA, November 2024. Association for Computational Linguistics.  
[9] Isabel O Gallegos, Ryan A Rossi, Joe Barrow, Md Mehrab Tanjim, Sungchul Kim, Franck Dernoncourt, Tong Yu, Ruiyi Zhang, and Nesreen K Ahmed. Bias and fairness in large language models: A survey. Computational Linguistics, 50(3):1097–1179, 2024.  
[10] Ronald K Hambleton, Hariharan Swaminathan, and H Jane Rogers. Fundamentals of item response theory, volume 2. Sage, 1991.  
[11] Enkelejda Kasneci, Kathrin Seßler, Stefan Küchemann, Maria Bannert, Daryna Dementieva, Frank Fischer, Urs Gasser, Georg Groh, Stephan Günnemann, Eyke Hüllermeier, et al. Chatgpt for good? on opportunities and challenges of large language models for education. Learning and individual differences, 103:102274, 2023.  
[12] Jianning Li, Amin Dada, Behrus Puladi, Jens Kleesiek, and Jan Egger. Chatgpt in healthcare: a taxonomy and systematic review. Computer Methods and Programs in Biomedicine, 245:108013, 2024.  
[13] Weicheng Ma, Brian Chiang, Tong Wu, Lili Wang, and Soroush Vosoughi. Intersectional Stereotypes in Large Language Models: Dataset and Analysis. In Houda Bouamor, Juan Pino, and Kalika Bali, editors, Findings of the Association for Computational Linguistics: EMNLP 2023, pages 8589–8597, Singapore, December 2023. Association for Computational Linguistics.  
[14] Reem Masoud, Ziquan Liu, Martin Ferianc, Philip C Treleaven, and Miguel Rodrigues Rodrigues. Cultural alignment in large language models: An explanatory analysis based on hofstede’s cultural dimensions. In Proceedings of the 31st International Conference on Computational Linguistics, pages 8474–8503, 2025.  
[15] Huy Nghiem, John Prindle, Jieyu Zhao, and Hal Daumé Iii. “you gotta be a doctor, lin”: An investigation of name-based bias of large language models in employment recommendations. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 7268–7287, 2024.  
[16] Ricardo Trainotti Rabonato and Lilian Berton. A systematic review of fairness in machine learning. AI and Ethics, 5(3):1943–1954, 2025.  
[17] Mark D Reckase. Computerized adaptive testing: A good idea waiting for the right technology. 1988.  
[18] Iain Weissburg, Sathvika Anand, Sharon Levy, and Haewon Jeong. LLMs are Biased Teachers: Evaluating LLM Bias in Personalized Education, February 2025.  
[19] Kyra Wilson and Aylin Caliskan. Gender, Race, and Intersectional Bias in Resume Screening via Language Model Retrieval, August 2024.  
[20] Zihuai Zhao, Wenqi Fan, Jiatong Li, Yunqing Liu, Xiaowei Mei, Yiqi Wang, Zhen Wen, Fei Wang, Xiangyu Zhao, Jiliang Tang, et al. Recommender systems in the era of large language models (llms). IEEE Transactions on Knowledge and Data Engineering, 36(11):6889–6907, 2024.