Sentiment Analysis of Hospital Service Satisfaction
Introduction: Measuring customer satisfaction is one of the most important aspects of every successful enterprise trying to improve its service quality, so accumulating reviews is highly encouraged. But, just collecting this data is not sufficient, without possessing an efficient and reliable automatized system able to analyze this data and take out the priceless information for further enhancement. With the scarcity of similar works in the health area, and especially in Turkish, this study tries to fill this gap by analyzing health service satisfaction. Methods: 2018 positive and 1394 negative comments collected from patients. Binary List, Frequency List, Binary Words and Words Frequencies feature selection methods were used to train and test a classification system by using machine learning methods such as Naïve Bayes, Support Vector Machine (SMO) and J48 tree algorithms. More compact feature subsets are used after eliminating mostly irrelevant common features from both or just one of the positive and negative feature lists. This data elimination may increase the negative miss ratio, being an important measure especially for health reviews domain. Results: The results obtained are very efficient and have high average prediction rates. Discussion: Binary Words feature selection methods outperform the others with the best average accuracy for Naïve Bayes as 98%, while the poorest results are obtained from the Binary List feature selection method and NB classifier. True and False Negative Rates (specificity and miss rates, respectively) are also evaluated to measure the best matching results. Conclusion: Generally speaking, Words (both Binary and Frequency) feature selection methods are superior to Lists ones, providing more detailed information for each comment. Frequency methods in some cases slightly outperform Binary methods, but the shortness of the texts makes this change not very significant. NB, which is a very efficient algorithm in terms of time forms better classification models than SMO. J48, however, is generally better at Frequency Lists compared to the other ML algorithms, acquiring the highest rate of 99% for TNR in Binary Lists when all the features are used.