SPAM DETECTION IN ROMAN URDU REVIEWS USING SPAMMER BEHAVIOR FEATURES
Main Article Content
Abstract
Reviews have emerged recently as the most important basis on which it is decided whether offered products and services are good or bad. Therefore, customer reviews concern sellers because they may directly affect the growth of their respective businesses. Unfortunately, there is a growing trend towards writing spam reviews to promote certain targeted products. This practice goes well in review spamming. Though the SRD problem has drawn much attention, all the existing studies on SRD work on either an English or Chinese dataset or on any other language. Urdu stands at 10th position in the rankings of most spoken languages in the world. There is a dire need for such a system/model which detects Spam Reviews, specifically typed in Roman Urdu. Therefore, the aim of this research will be spam detection in Roman Urdu review classifications based on various models by using linguistic features and behavioral features. The presented research will mainly focus on the detection of spam in Urdu reviews; first, it will pre-process the data; then, it will apply feature extraction; and train a classification model; thereby introducing innovative methods to carry out Spam Detections in Roman Urdu Reviews. The results help reduce spam and build confidence in customers regarding the service or product. For this work, we train CNN and LSTM on the given roman Urdu review dataset of Daraz. LSTM outperforms as compared to CNN regarding accuracy. Using models of LSTM we achieved an accuracy score of 97%. Furthermore, we have used a comparative approach using a CNN model that has been tried previously. Nevertheless, these results also tend to suggest that the LSTM model outperforms the CNN model.