A Survey and Compare the Performance of IBM SPSS Modeler and Rapid Miner Software for Predicting Liver Disease by Using Various Data Mining Algorithms

Moloud ABDAR
3.791 881

Abstract


Abstract. Today, with the development of industry and mechanized life style, prevalence of diseases is rising steadily, as well. In the meantime, the number of patients with liver diseases (such as fatty liver, cirrhosis and liver cancer, etc.) is rising. Since prevention is better than treatment, early diagnosis can be helpful for the treatment process so it is essential to develop some methods for detecting high-risk individuals who have the chance of getting liver diseases and also to adopt appropriate solutions for early diagnosis and initiation of treatment in early stages of the disease. In this study, we tried to use common data mining techniques that are used nowadays for diagnosis and treatment of different diseases, for the diagnosis and treatment of liver disease. For this purpose, we used Rapid Miner and IBM SPSS Modeler data mining tools together. Accuracy of different data mining algorithms such as C5.0 and C4.5, Decision tree and Neural Network were examined by the two above tools for predicting the prevalence of these diseases or early diagnosis of them using these algorithms. According to the results, the C4.5  and C5.0  algorithms by using IBM SPSS Modeler and Rapid Miner tools had 72.37% and 87.91% of accuracy respectively. Further, Neural Network algorithm by using Rapid Miner had the ability of showing more details.


Keywords


Data mining techniques, Liver diseases, Rapid Miner, IBM SPSS Modeler

Full Text:

PDF


References


Ramona Tarba.” Liver Disease in Canada Report. http://www.liver.ca/support-liver- foundation/advocate/Liver_Disease_in_Canada_Report.aspx. [accessed April 2015].

Rong-Ho Lin, (2009). An intelligent model for liver disease diagnosis. Elsevier B.V, 0933- 3657,pp. 47, 53-62.

Silvia Sookoian, Carlos J. Pirola. (2012). The Genetic Epidemiology of Nonalcoholic Fatty Liver Disease. Elsevier Inc, 1089-3261/12, pp. 467–485.

Han, J. and Kamber, M.( 2006). Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, pp. 1-7.

An Introduction to Data Mining: http://www.thearling.com/, [ accessed September 2014].

Renza Campagni, Donatella Merlini, Renzo Sprugnoli, Maria Cecilia Verri. (2015). Data Mining models for student careers. Published by Elsevier ,S0957-4174(15)00159-1, pp. 1- 21.

Bendi Venkata Ramana, M. Surendra Prasad Babu, N. B. Venkateswarlu. ILPD (Indian Liver Patient Dataset) Data Set.

Quinlan J R. (1986). Induction of decision trees. Machine Learning. pp. (4): 81–106.

Quinlan J R. (1994). C4.5: Programs for machine learning. Machine,Learning. pp. (3): 235–240.

Quinlan J R.(1996). Bagging, Boosting and C4.5. Proceedings of 14th National Conference on Artificial Intelligence, pp. 725–730.

Xindong Wu , Vipin Kumar , J. Ross Quinlan , Joydeep Ghosh, Qiang Yang, Hiroshi Motoda , Geoffrey J. McLachlan, Angus Ng, Bing Liu, Philip S. Yu, Zhi-Hua Zhou, ichael Steinbach, David J. Hand, Dan Steinberg. (2008). Top 10 algorithms in data mining. Springer, DOI 10.1007/s10115-007-0114-2.

Sumit Bhatia, Praveen Prakash, and G.N. Pillai. (2008). SVM Based Decision Support System for Heart Disease Classification with Integer-Coded Genetic Algorithm to Select Critical Features. WCECS. Proceedings of the World Congress on Engineering and Computer Science. ISBN: 978-988-98671-0-2.

Vapnik, V. N. (1995). The nature of statistical learning theory. IEEE, VOL. 10, NO. 5, pp. 988- 999.

Yazdani A, Ebrahimi T, Hoffmann U. (2009). Classification of EEG signals using Dempster Shafer theory and a K-nearest neighbor classifier. IEEE. In: Proc of the 4th int EMBS conf on neural engineering, pp. 327–30.

Daubechies I.(1990). The wavelet transform, time-frequency localization and signal analysis. IEEE. Trans Inform Theor pp. 36:961–1005.

Demuth H, Beale M, Hagan M. (2009). Neural network Toolbox™ user’s guide. The MathWorks, Inc, pp. 1-901.

Leng, G., McGinnity, T.M., Prasad, G. (2006). Design for self-organizing fuzzy neural networks based on genetic algorithms. IEEE. Trans. Fuzzy Syst, Vol 14, No. 6, pp. 755– 766.

Leung, F.H.F., Lam, H.K., Ling, S.H., et al.(2003). Tuning of the structure and parameters of a neural network using an improved genetic algorithm. IEEE. Trans. Neural Networks , Vol 14, No. 1, pp. 79–88.

Kass GV. (1980). An Exploratory Technique for Investigating Large Quantities of Data. Appl Stat, Vol 29, No. 2, pp.119.

Young Sun Kim, SoYoung Sohn, Chang No Yoon. (2003). Screening test data analysis for liver disease prediction model using growth curve. Éditions scientifiques et médicales Elsevier SAS, doi:10.1016, pp.482–488.

A.S.Aneeshkumar, C.Jothi Venkateswaran, (2012),” An Approach of Data Mining for Predicting the Chances of Liver Disease in Ectopic Pregnant Groups”, Special Issue of International Journal of Computer Applications, (0975 – 8887), pp. 19-22.

Sina Bahramirad, Aida Mustapha, Maryam Eshraghi, (2013). Classification of Liver Disease Diagnosis: A Comparative Study. IEEE ,ISBN: 978-1-4673-5256-7/13, pp. 42-46.

Teddy Mantoro, Sa’diyah Noor Novita Alfisahrin, (2013). Data Mining Techniques For Optimatization of Liver Disease Clasification. IEEE, 978-1-4799-2758-6/13, DOI 10.1109/ACSAT.2013.81, pp.379-384.

Anil Kumar Tiwari, Lokesh Kumar Sharma, G. Rama Krishna, (2013). Comparative Study of Artificial Neural Network based Classification for Liver Patient. Journal of Information Engineering and Applications, Vol.3, No.4, ISSN 2224-5782 (print) ISSN 2225-0506 (online), pp. 1-5.

Jankisharan Pahareeya, Rajan Vohra, Jagdish Makhijani, Sanjay Patsariya, (2014),” Liver Patient Classification using Intelligence Techniques. International Journal of Advanced Research in Computer Science and Software Engineering, Volume 4, Issue 2, pp. 295-299.

Hoon Jin, Seoungcheon Kim, Jinhong Kim, (2014). Decision Factors on Effective Liver Patient Data Prediction, International Journal of Bio-Science and Bio-Technology, Vol.6, No.4, pp.167-178.

Manuel Cruz-Ramirez, César Hervas-Martinez, Juan Carlos Fernandez,Javier Brice˜no, Manuel de la Mata, (2013). Predicting patient survival after liver transplantation using evolutionary multi-objective artificial neural networks. Elsevier B.V, 0933-3657, pp. 37- 49.

Michele Berlingerio, Francesco Bonchi, Fosca Giannotti, Franco Turini, (2007). Mining Clinical Data with a Temporal Dimension: a Case Study. IEEE International Conference on Bioinformatics and Biomedicine, 0-7695-3031-1/07, DOI 10.1109/BIBM.2007.42, pp. 429-436.

Christine M. Hunt, Nancy A. Yuen, Heide A. Stirnadel-Farrant, Ayako Suzuki. (2014). Age-related differences in reporting of drug-associated liver injury: Data-mining of WHO Safety Report Database. Elsevier Inc, 0273-2300, pp. 519-526.

Alizadeh S, Ghazanfari M.(2011). Teimorpour B .DataMining and Knowledge Discovery. Publication of IranUniversity of Science and Technology .2nd ed.[Persian].

Han J. Kamber M.(2006) . chapter 1: introdution :DataMining: Concepts and Techniques. Morgan Kaufman Publisher. 2nd ed.

David M W Powers. (2007). Evaluation: From Precision, Recall and F-Factor to ROC, Informedness, Markedness & Correlation. School of Informatics and Engineering, Flinders University • Adelaide • Australia, Technical Report SIE-07-001, pp. 1-24.