A Threshold Selection Method in Code Plagiarism Checking Function for Code Writing Problem in Java Programming Learning Assistant System Considering AI-Generated Codes

Permatasari, Perwira Annissa Dyah; Mentari, Mustika; Kinari, Safira Adine; Aung, Soe Thandar; Funabiki, Nobuo; Kyaw, Htoo Htoo Sandi; Wai, Khaing Hsu

doi:10.3390/analytics5010002

Permalink : https://ousar.lib.okayama-u.ac.jp/70077

ID	70077
フルテキストURL	fulltext.pdf 602 KB
著者	Permatasari, Perwira Annissa Dyah Graduate School of Environmental, Life, Natural Science and Technology, Okayama University Mentari, Mustika Graduate School of Environmental, Life, Natural Science and Technology, Okayama University Kinari, Safira Adine Graduate School of Environmental, Life, Natural Science and Technology, Okayama University Aung, Soe Thandar Graduate School of Environmental, Life, Natural Science and Technology, Okayama University Funabiki, Nobuo Graduate School of Environmental, Life, Natural Science and Technology, Okayama University Kaken ID publons researchmap Kyaw, Htoo Htoo Sandi Graduate School of Environmental, Life, Natural Science and Technology, Okayama University Wai, Khaing Hsu Graduate School of Engineering Science, Akita University
抄録	To support novice learners, the Java programming learning assistant system (JPLAS) has been developed with various features. Among them, code writing problem (CWP) assigns writing an answer code that passes a given test code. The correctness of an answer code is validated by running it on JUnit. In previous works, we implemented a code plagiarism checking function that calculates the similarity score for each pair of answer codes based on the Levenshtein distance. When the score is higher than a given threshold, this pair is regarded as plagiarism. However, a method for finding the proper threshold has not been studied. In addition, AI-generated codes have become threats in plagiarism, as AI has grown in popularity, which should be investigated. In this paper, we propose a threshold selection method based on Tukey’s IQR fences. It uses a custom upper threshold derived from the statistical distribution of similarity scores for each assignment. To better accommodate skewed similarity distributions, the method introduces a simple percentile-based adjustment for determining the upper threshold. We also design prompts to generate answer codes using generative AI and apply them to four AI models. For evaluation, we used a total of 745 source codes of two datasets. The first dataset consists of 420 answer codes across 12 CWP instances from 35 first-year undergraduate students in the State Polytechnic of Malang, Indonesia (POLINEMA). The second dataset includes 325 answer codes across five CWP assignments from 65 third-year undergraduate students at Okayama University, Japan. The applications of our proposals found the following: (1) any pair of student codes whose score is higher than the selected threshold has some evidence of plagiarism, (2) some student codes have a higher similarity than the threshold with AI-generated codes, indicating the use of generative AI, and (3) multiple AI models can generate code that resembles student-written code, despite adopting different implementations. The validity of our proposal is confirmed.
キーワード	Java programming learning JPLAS JUnit code writing problem plagiarism Levenshtein distance threshold IQR AI-generated
発行日	2025-12-26
出版物タイトル	Analytics
巻	5巻
号	1号
出版者	MDPI AG
開始ページ	2
ISSN	2813-2203
資料タイプ	学術雑誌論文
言語	英語
OAI-PMH Set	岡山大学
著作権者	© 2025 by the authors.
論文のバージョン	publisher
DOI	10.3390/analytics5010002
関連URL	isVersionOf https://doi.org/10.3390/analytics5010002
ライセンス	https://creativecommons.org/licenses/by/4.0/
Citation	Permatasari, P.A.D.; Mentari, M.; Kinari, S.A.; Aung, S.T.; Funabiki, N.; Kyaw, H.H.S.; Wai, K.H. A Threshold Selection Method in Code Plagiarism Checking Function for Code Writing Problem in Java Programming Learning Assistant System Considering AI-Generated Codes. Analytics 2026, 5, 2. https://doi.org/10.3390/analytics5010002