Author Identification Using Machine Learning

Sikandar Ahmad Khan; Muhammad Asad; Haroon Asif; Amjad Ali; Muhammad Ahsan Jamil

Authors

Sikandar Ahmad Khan Department of Computer Science, National College of Business Administration & Economics Multan Campus, Multan, 60000, Pakistan.
Muhammad Asad Department of Computer Science, National College of Business Administration & Economics Multan Campus, Multan, 60000, Pakistan.
Haroon Asif Department of Computer Science, National College of Business Administration & Economics Multan Campus, Multan, 60000, Pakistan.
Amjad Ali Department of Information Technology, Bahauddin Zakariya University, Multan, 60000, Pakistan.
Muhammad Ahsan Jamil Institute of Computing, Muhammad Nawaz Sharif University of Agriculture, Multan, 60000, Pakistan.

Keywords:

Author identification, SVM, NLP, Machine Learning, Feature Extraction

Abstract

Identifying the writers of a piece of writing, whether anonymous or not, is a procedure that focuses solely on the writing style and not on the content itself. Most of the time, writing and speaking style may also be seen as techniques of underlying sentence construction, which can be evaluated using aspects such as vocabulary, length of sentences, and sequence of words, richness, and word frequency usage. The primary goal of this article is to examine and apply a variety of categorization approaches to research articles that analyze author identity and the content of those texts that are in dispute. Researchers' earlier work is also discussed and elaborated upon. After that, we were able to exhibit better findings from our experiments. With feature spaces, the SVM technique is particularly well-suited to the aforementioned problem. Across all experiments, it was found that the SVM was effective at determining the authorship of research publications. SVM was used to classify two sets of data in that study. Sections A and B of the experiment are referred to as Experiment A and B, respectively. 500 research papers from conferences and the majority of them from Google Scholar are included in Experiment A. After applying data mining techniques to the gathered dataset, we have a final set of 400 research papers. A, B, and C are the three subsets of the 400 research papers for high performance that were further subdivided. Our model was able to train quickly and evaluate good performance on these limited research criteria in these datasets as a result of an increase in both the number of authors and the number of publications that were included in the dataset.

Author Identification Using Machine Learning

Authors

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

License

HJRS

ISSN

Online First

Call for Papers

Make a Submission

Open Access

Information

Conference