Reading Time: 4 minutes

Plagiarism continues to be a pervasive concern in academic publishing. Traditional detection systems focus primarily on textual similarity, yet low-similarity submissions may still contain significant academic misconduct, including paraphrasing, structural replication, translation-based reuse, and self-plagiarism. This study examines hidden plagiarism patterns, presents statistical evidence of their prevalence, and discusses the limitations of conventional detection tools. The findings highlight that low similarity scores cannot be equated with originality, emphasizing the need for comprehensive assessment methods to ensure academic integrity.

Introduction

Academic integrity forms the cornerstone of scholarly communication, ensuring that research maintains credibility and contributes meaningfully to knowledge development. Similarity-checking tools are widely employed to assess originality, comparing submissions against extensive databases to detect overlapping content. A prevailing assumption is that a low similarity percentage indicates genuine originality. However, recent evidence demonstrates that low-similarity texts may conceal significant misconduct. Subtle plagiarism involves not only paraphrasing but also the replication of ideas, methodology, or structural organization. Additionally, translation plagiarism, selective citation, and self-plagiarism often escape detection. Understanding the prevalence and patterns of these hidden forms of misconduct is essential for authors, reviewers, and institutions seeking to uphold ethical standards in research.

Statistical Trends in Plagiarism

Large-scale analyses indicate significant variation in plagiarism trends over the past decade. According to PlagiarismSearch, a dataset of 69.89 million submissions from 2018 to 2024 revealed that the average global similarity rate increased from 9.08 % in 2018 to 18.79 % in 2020, before declining to 16.36 % in 2024 (Marketorium, 2024). These fluctuations coincided with external factors such as the shift to remote learning during the COVID-19 pandemic and the increasing availability of AI-based text generation. Regional variations are also notable; Ukraine reported one of the lowest average plagiarism rates in 2024, with only 6.3 % of submissions flagged (Chasdiy, 2024).

Further evidence demonstrates that while high-similarity plagiarism may appear to decline, subtler forms are increasing. Between 2023 and 2024, a study documented a 76 % rise in AI-generated content in student submissions, while measured similarity rates dropped by 51 % (Copyleaks, 2024). These statistics suggest that low textual overlap does not necessarily indicate compliance with ethical standards, as ideas, argumentation, and methodology may still be improperly replicated.

A focused analysis of 310 COVID-19-related manuscripts published in infectious-disease journals revealed that 41.6 % contained plagiarized content, despite 72 % of these papers exhibiting similarity scores of 15 % or lower (SpringerOpen, 2023). In another review of 936 articles published in hijacked or predatory journals, 66 % were found to include elements of plagiarism, with 28 % demonstrating similarity above 25 % (PubMed, 2023). Studies of retracted publications indicate that 9.8 % of retractions were due to plagiarism, accounting for 67.4 % of all retractions related to misconduct rather than technical errors (Uzhnu, 2023). These statistics highlight the prevalence of hidden plagiarism and underscore that reliance on similarity percentages alone can be misleading. Substantial academic misconduct may remain undetected if assessment focuses exclusively on textual overlap.

Why Low Similarity Does Not Guarantee Originality

The limitations of conventional detection systems are significant. Most similarity-checking tools operate using string-matching algorithms, identifying exact or near-exact sequences of words. Paraphrased or structurally altered content often bypasses these systems. Furthermore, databases used for comparison are inherently limited; unpublished works, materials behind paywalls, and content in less-accessible languages often remain undetected. Translation plagiarism is particularly problematic, as content translated from another language may be substantially reworded yet retain the original ideas. Extensive citation or quotation can dilute similarity scores, giving the false impression of originality.

Empirical evidence supports these concerns. In the study of COVID-19 manuscripts, manual review identified plagiarism in texts with similarity scores below 15 % (SpringerOpen, 2023). This illustrates that automatic thresholds, such as 15 % or 20 %, are insufficient to assess originality, and low similarity does not guarantee ethical compliance.

Implications of Hidden Plagiarism

Undetected plagiarism has profound consequences for scholarly communication. Subtle forms of misconduct compromise research integrity, propagate misleading findings, and distort the scientific record. Analyses of retracted publications show that plagiarized papers often continue to receive citations after retraction, sometimes at rates 2.5 times higher than articles removed for other reasons (ArXiv, 2025). Low similarity scores at the submission stage create a false sense of security, allowing papers containing structural, idea-based, or translation plagiarism to remain in circulation. This not only undermines individual studies but also affects subsequent research, policy decisions, and public trust in science.

Recommendations for Academic Practice

Ensuring genuine originality requires a multidimensional approach. Authors should maintain detailed documentation of research processes, including draft histories and the development of ideas, and provide attribution not only for direct quotations but also for methodological and structural inspiration. Reviewers and editors must combine automated similarity detection with manual evaluation, critically assessing argumentation, methodological originality, and the organization of content. The use of multiple detection tools, including translation and stylometric analysis, improves the likelihood of identifying subtle misconduct. Academic institutions should implement policies addressing structural, idea-based, and translation plagiarism while emphasizing training programs that stress originality in both content and structure. Recognizing and addressing hidden plagiarism is essential for maintaining academic credibility and fostering an environment of ethical research practices.

Conclusion

Hidden plagiarism represents a significant challenge in modern academia. Low similarity scores can mask substantial misconduct, including paraphrasing, structural replication, translation-based reuse, and self-plagiarism. Statistical evidence from tens of millions of submissions, analyses of COVID-19-related research, predatory journals, and retracted publications demonstrates that reliance on text similarity alone is insufficient. Upholding academic integrity requires comprehensive evaluation that integrates automated detection, manual review, and robust educational frameworks. True originality encompasses not only unique wording but also authentic intellectual contribution and methodological innovation. Identifying and mitigating hidden plagiarism is therefore critical to sustaining the credibility of scholarly communication.