Performance Assessment of ChatGPT for the Board Qualification Examination of the Japanese Society for Oral and Maxillofacial Radiology

Takeshita, Yohei; Kawazu, Toshiyuki; Hisatomi, Miki; Okada, Shunsuke; Fujikura, Mamiko; Namba, Yuri; Yoshida, Suzuka; Yoshida, Saori; Yanagi, Yoshinobu; Asaumi, Junichi

doi:10.1007/s10758-025-09891-1

Permalink : https://ousar.lib.okayama-u.ac.jp/69548

ID	69548
フルテキストURL	fulltext.pdf 1.01 MB
著者	Takeshita, Yohei Department of Oral and Maxillofacial Radiology, Faculty of Medicine, Dentistry and Pharmaceutical Sciences, Okayama University Kawazu, Toshiyuki Department of Oral and Maxillofacial Radiology, Faculty of Medicine, Dentistry and Pharmaceutical Sciences, Okayama University Hisatomi, Miki Department of Oral and Maxillofacial Radiology, Okayama University Hospital Kaken ID publons researchmap Okada, Shunsuke Department of Oral and Maxillofacial Radiology, Okayama University Hospital Fujikura, Mamiko Department of Oral and Maxillofacial Radiology, Okayama University Hospital Namba, Yuri Department of Oral and Maxillofacial Radiology, Okayama University Hospital Yoshida, Suzuka Department of Oral and Maxillofacial Radiology, Dentistry and Pharmaceutical Sciences, Okayama University Graduate School of Medicine Yoshida, Saori Preliminary Examination Room, Okayama University Hospital Yanagi, Yoshinobu Preliminary Examination Room, Okayama University Hospital ORCID Kaken ID publons researchmap Asaumi, Junichi Department of Oral and Maxillofacial Radiology, Faculty of Medicine, Dentistry and Pharmaceutical Sciences, Okayama University
抄録	The aim of this study is to assess the performance and utility of ChatGPT for the board qualification examination of the Japanese Society for Oral and Maxillofacial Radiology (JSOMR). We assessed ChatGPT responses to 149 multiple-choice questions written in Japanese for the board qualification examination of the JSOMR for the 3 years from 2020 to 2022. The questions were directly entered into ChatGPT-3.5 and ChatGPT-4 models manually one by one as a prompt. The accuracy rate was calculated and classified by year, type of multiple-choice question, and level of intellectual ability, and significant differences were noted. The accuracy rate of GPT-3.5 for the 3 years was 45.0% (51.0% for 2020, 34.0% for 2021, and 50.0% for 2022), while the accuracy rate of GPT-4 was 68.5% (73.5% for 2020, 62.0% for 2021, and 70.0% for 2022) for the board qualification examination of the JSOMR. GPT-4 had a significantly higher accuracy rate than GPT-3.5 in each year. On performance classified by the type of multiple-choice questions, GPT-4 performed significantly better than GPT-3.5. However, neither model performed well with questions that required interpretation or knowledge of Japanese law. The performance of GPT-4 was significantly superior to GPT-3.5 in the board qualification examination of the JSOMR, suggesting that the use of Chat GPT, especially ChatGPT-4, would be effective as a tool for learning and preparing for the examination.
キーワード	ChatGPT GPT-3.5 GPT-4 Generative AI Large language model Japanese Society for Oral and Maxillofacial Radiology
発行日	2025-08-07
出版物タイトル	Technology, Knowledge and Learning
出版者	Springer Science and Business Media LLC
ISSN	2211-1662
資料タイプ	学術雑誌論文
言語	英語
OAI-PMH Set	岡山大学
著作権者	© The Author(s) 2025
論文のバージョン	publisher
DOI	10.1007/s10758-025-09891-1
Web of Science KeyUT	001545089700001
関連URL	isVersionOf https://doi.org/10.1007/s10758-025-09891-1
ライセンス	http://creativecommons.org/licenses/by/4.0/
Citation	Takeshita, Y., Kawazu, T., Hisatomi, M. et al. Performance Assessment of ChatGPT for the Board Qualification Examination of the Japanese Society for Oral and Maxillofacial Radiology. Tech Know Learn (2025). https://doi.org/10.1007/s10758-025-09891-1
助成情報	( 国立大学法人岡山大学 / Okayama University )