Arabic English Cross-Lingual Plagiarism Detection Based on Keyphrases Extraction, Monolingual and Machine Learning Approach

Al-Suhaiqi, Mokhtar and Hazaa, Muneer A. S. and Albared, Mohammed (2019) Arabic English Cross-Lingual Plagiarism Detection Based on Keyphrases Extraction, Monolingual and Machine Learning Approach. Asian Journal of Research in Computer Science, 2 (3). pp. 1-12. ISSN 2581-8260

[thumbnail of Suhaiqi232019AJRCOS46873.pdf] Text
Suhaiqi232019AJRCOS46873.pdf - Published Version

Download (354kB)

Abstract

Due to rapid growth of research articles in various languages, cross-lingual plagiarism detection problem has received increasing interest in recent years. Cross-lingual plagiarism detection is more challenging task than monolingual plagiarism detection. This paper addresses the problem of cross-lingual plagiarism detection (CLPD) by proposing a method that combines keyphrases extraction, monolingual detection methods and machine learning approach. The research methodology used in this study has facilitated to accomplish the objectives in terms of designing, developing, and implementing an efficient Arabic – English cross lingual plagiarism detection.

This paper empirically evaluates five different monolingual plagiarism detection methods namely i)N-Grams Similarity, ii)Longest Common Subsequence, iii)Dice Coefficient, iv)Fingerprint based Jaccard Similarity and v) Fingerprint based Containment Similarity. In addition, three machine learning approaches namely i) naïve Bayes, ii) Support Vector Machine, and iii) linear logistic regression classifiers are used for Arabic-English Cross-language plagiarism detection. Several experiments are conducted to evaluate the performance of the key phrases extraction methods. In addition, Several experiments to investigate the performance of machine learning techniques to find the best method for Arabic-English Cross-language plagiarism detection.

According to the experiments of Arabic-English Cross-language plagiarism detection, the highest result was obtained using SVM classifier with 92% f-measure. In addition, the highest results were obtained by all classifiers are achieved, when most of the monolingual plagiarism detection methods are used.

Item Type: Article
Subjects: Open Archive Press > Computer Science
Depositing User: Unnamed user with email support@openarchivepress.com
Date Deposited: 06 May 2023 07:07
Last Modified: 17 Jun 2024 06:19
URI: http://library.2pressrelease.co.in/id/eprint/1069

Actions (login required)

View Item
View Item