Designing a System for Trend Analysis of Users in Website Surfing in Iran Using Data Mining and Text Mining Algorithms

Authors
1 university of tehran
2 Allameh Tabataba’i University
Abstract
Background and Aim: As of the entrance of web surfing to the lifestyle of a vast majority of people in the society and the need for a more accurate social and cultural policy making in the field, authors intended to analyze the behavior of the society users in viewing different websites so as to help politicians and practitioners.

Methods: Design science research method is used in this research. The data sample of research consists of all available users that surf Iranian and foreign websites. For gathering data from various active users, some add-ons were designed and published over browsers so as to gather sufficient data.

Results: Through the utilization of text mining algorithms, the browsed webpages were differentiated and using data mining algorithms, the pages were categorized and interpreted.

Conclusion: Finally, a comprehensive system was designed for the analysis of internet users’ web browsing trends which contains the data gathering phase and innovative report preparation that can be used as an effective sample for analysis, design, and implementation of web-based analytical systems.
Keywords

Abtahi, A., Elahi, F., & Yousefi-Zenouz, R. (2017). An Intelligent System for Fraud Detection in Coin Futures Market’s Transactions of Iran Mercantile Exchange Based on Bayesian Network. Journal Of Information Technology Management, 9(1), 1-20. (Persian)
Ali, W., & Alrabighi, M. (2016). Web Users Clustering Based on Fuzzy C-MEANS. VAWKUM Transac-tions on Computer Sciences, 11(1), 1-09.
Anitha, A. (2016). An Efficient Agglomerative Clus-tering Algorithm for Web Navigation Pattern Identification. Circuits and Systems, 7(09), 2349.
Attardi, G., Gullì, A., & Sebastiani, F. (1999). Auto-matic Web page categorization by link and con-text analysis. In Proceedings of THAI (Vol. 99, No. 99, pp. 105-119).
Chen, R. C., & Hsieh, C. H. (2006). Web page classi-fication based on a support vector machine using a weighted vote schema. Expert Systems with Applications, 31(2), 427-435.
Cheung, Y. M., & Jia, H. (2013). Categorical-and-numerical-attribute data clustering based on a unified similarity metric without knowing cluster number. Pattern Recognition, 46(8), 2228-2238.
Ciarelli, P. M., Oliveira, E., & Salles, E. O. (2014). Multi-label incremental learning applied to web page categorization. Neural Computing and Ap-plications, 24(6), 1403-1419.
Cooley, R., Mobasher, B., & Srivastava, J. (1999). Data preparation for mining world wide web browsing patterns. Knowledge and information systems, 1(1), 5-32.
Deshmukh, S. M., & Adhiya, K. P. (2016). A Review on Finding Users Navigation Behavior Using Web Mining Algorithm. International Journal of Scien-tific Research in Science, Engineering and Tech-nology (IJSRSET), 2(6), 708-712.
Dharmarajan, K., & Dorairangaswamy, M. A. (2016). Discovering User Pattern Analysis from Web Log Data using Weblog Expert. Indian Journal of Science and Technology, 9(42).
Dumais, S., & Chen, H. (2000, July). Hierarchical classification of Web content. In Proceedings of the 23rd annual international ACM SIGIR con-ference on Research and development in infor-mation retrieval (pp. 256-263). ACM.
Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996). From data mining to knowledge discov-ery in databases. AI magazine, 17(3), 37.
Han, J., Pei, J., & Kamber, M. (2011). Data mining: concepts and techniques. Elsevier.
Huang, Z. (1997, February). Clustering large data sets with mixed numeric and categorical values. In Proceedings of the 1st pacific-asia conference on knowledge discovery and data min-ing,(PAKDD) (pp. 21-34).
Huang, Z. (1998). Extensions to the k-means algo-rithm for clustering large data sets with categori-cal values. Data mining and knowledge discov-ery, 2(3), 283-304.
Kosala, R., & Blockeel, H. (2000). Web mining re-search: A survey. ACM Sigkdd Explorations Newsletter, 2(1), 1-15.
Kwon, O. W., & Lee, J. H. (2000, November). Web page classification based on k-nearest neighbor approach. In Proceedings of the fifth interna-tional workshop on on Information retrieval with Asian languages (pp. 9-15). ACM.
Larose, D. (2014). Discovery Knowledge in Data: An Introduction to Data Mining, 2nd edition. John Wiley-Interscience.
Mladenic, D. (1998). Turning yahoo into an auto-matic web-page classifier.
Niknam, F., & Niknafs, A. (2016). Improving Text Mining Methods in Market Prediction via Proto-type Selection Algorithms. Journal Of Infor-mation Technology Management, 8(2), 415-435. (Persian)
Özel, S. A. (2011). A web page classification system based on a genetic algorithm using tagged-terms as features. Expert Systems with Applica-tions, 38(4), 3407-3415.
Peng, X., & Choi, B. (2002). Automatic web page classification in a dynamic and hierarchical way. In Data Mining, 2002. ICDM 2003. Proceedings. 2002 IEEE International Conference on (pp. 386-393). IEEE.
Raj, A. J., Francis, F. S., & Benadit, P. J. (2016). Op-timal Web Page Classification Technique Based on Informative Content Extraction and FA-NBC. Computer Science and Engineering, 6(1), 7-13.
Salton, G., Wong, A., & Yang, C. S. (1975). A vector space model for automatic index-ing. Communications of the ACM, 18(11), 613-620.
Sanoja, A., & Gancarski, S. (2014, April). Block-o-matic: A web page segmentation framework. In Multimedia Computing and Systems (IC-MCS), 2014 International Conference on (pp. 595-600).
Singh, S., & Aswal, M. S. (2016, October). Towards a framework for web page recommendation sys-tem based on semantic web usage mining: A case study. In Next Generation Computing Technolo-gies (NGCT), 2016 2nd International Conference on (pp. 329-334). IEEE.
Wan, M., Jönsson, A., Wang, C., Li, L., & Yang, Y. (2012). Web user clustering and Web prefetching using Random Indexing with weight func-tions. Knowledge and information sys-tems, 33(1), 89-115.
Xie, X., & Wang, B. (2016). Web page recommenda-tion via twofold clustering: considering user be-havior and topic relation. Neural Computing and Applications, 1-9.
Xu, J., & Liu, H. (2010). Web user clustering analysis based on KMeans algorithm. In 2010 Interna-tional Conference on Information, Networking and Automation (ICINA).