Analysis of User query refinement behavior based on semantic features: user log analysis of Ganj database (IranDoc)

Authors
Iranian Research Institute for Information Science and Technology
Abstract
Background and Aim: Information systems cannot be well designed or developed without a clear understanding of needs of users, manner of their information seeking and evaluating. This research has been designed to analyze the Ganj (Iranian research institute of science and technology database) users’ query refinement behaviors via log analysis.

Methods: The method of this research is log analysis. In this research, the Ganj users’ query refinement behaviors were analyzed. User’s logs during 3 months between May and June 2016 were analyzed for semantic features. The study tool was a researcher-made checklist of semantic features. The total searches were 10 milion records which were limited to information science domain. About 106641 records were selected for analysis.

Results: The semantic relationships (based on thesaurus relationships) were revealed between pair’s terms in user’s searches. The results showed that users refined their searches based on some semantic relationships.

Conclusion: The results of this research can be used to improve the Ganj results and for term suggestion for users, so that they be able to choose proper terms while there are several related terms.
Keywords

Anick, P. (2003). Human interaction. Using terminological feedback for web search refinement: a log-based study, Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 88-95.
Arlitt, M. (2000). Characterizing Web user sessions. ACM SIGMETRICS Performance Eval Review, 28(2), 50-63.
BahmanAbadi, A. (2001). Use of thesaurus in information retrieval. Book quarterly, 77-103. (Persian)
Bates, M.J. (1990). Where should the person stop and the information search interface start? Information Processing & Management 26(5), 575–591.
Belkin, N.J. (1984). Cognitive models and information transfer. Social Science Information Studies, 4, 111-129.
Belkin, N.J. (1993). Interaction with texts: Information retrieval as information-seeking behavior. In: Information retrieval '93. Von der Modellierung zur Anwendung. Konstanz: Universitaetsverlag Konstanz, 55-66.
Ben Mustapha, N., Aufaure M-A., Baazaoui, H. and Ben Guezala, H.(2011) ‘Contextual ontology module learning from web snippets and past user queries’, 15th International Conference on Knowledge-Based and Intelligent Information & Engineering Systems, KES 2011, 538–547.
Ben Mustapha, N., Aufaure, M., Baazaoui Zghal, H. and Ben Ghézala, H. (2012) ‘Modular ontological warehouse for adaptative information search’, MEDI , 79–90.
Ben Mustapha, N., Baazaoui-Zghal, H., Moreno, A. and Ben Ghezala, H. (2013). A dynamic composition of ontology modules approach: application to web query reformulation, Int. J. Metadata, Semantics and Ontologies, 8(4), 309–321.
Boldi, P., Bonchi, F., Castillo, C., and Vigna, S. (2009). From ‘dango’ to ‘japanese cakes’: Query reformulation models and patterns. In Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology, Milan, Italy, pages 183–190.
Bollegala, D., Matsuo, Y. and Ishizuka, M. (2007) ‘Measuring semantic similarity between words using web search engines’, WWW’07: Proceedings of the 16th International Conference on World Wide Web, 757–766.
Bozzon, A., Chirita, P. A., Firan, C. S., and Nejdl, W. (2007). Lexical analysis for modeling web query reformulation. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Amsterdam, The Netherlands, pages 739–740.
Bruza, P. and Dennis, S. (1997). Query reformulation on the internet: Empirical data and the hyperindex search engine. In Proceedings of the RIAO’97 Conference on Computer-Assisted Searching on the Internet, Montreal, Canada, pages 488–499.
Burton, M.C. & Walther, J.B. (2001). The value of Web log data in use-based design and testing. Journal of Computer Mediated Communication, 6(1), http://www.ascusc.org/jcmc/vol6/issue3/burton.html
Chen, H. & Dhar, V. (1990). Online query refinement on information retrieval systems: A process model of searcher/system interactions. Proceedings of the 13th Annual ___International ACM SIGIR Conference, 115-132.
Costa, R. P. and Seco, N. (2008). Hyponymy extraction and web search behavior analysis based on query reformulation. In Proceedings of the 11th Ibero-American Conference on AI, Lisbon, Portugal.
Efthimiadis, E. N. (2000). Interactive query expansion: a user-based evaluation in a relevance feedback environment. Journal of the American Society for Information Science, 51(11):989–1003.
Fellbaum, C., editor (1998). WordNet: An electronic lexical database. MIT press, Cambridge, MA, USA.
French J.C. Brown, D.E., Kim N.-H.(1997). A classification approach to Boolean query reformulation. Journal of the American Society for Information Science, 48 (8), 694-706.
García , E., & Sicilia, M.-Á. (2003). User Interface Tactics in Ontology-Based Information Seeking. PsychNology Journal, 1( 3), 242 – 255.
Guo, J., Xu, G., Li, H., and Cheng, X. (2008). A unified and discriminative model for query refinement. In SIGIR ‘08,379-386.
Hariri, N and Haratizade, S (2015). Evaluation of user satisfaction from Islamic science thesaurus as information retrieval tool. Library and information organization national quarterly, 2(26): 141-160. (Persian)
Hariri, N and Mehrban, S (2013). Nanotechnology Database Search Strategies: Transaction Report Analysis. Information processing and managing. 29(1), 233-252. (Persian)
He, D., G¨oker, A., and Harper, D. J. (2002). Combining evidence for automatic web session identification. Information Processing and Management, 38(5),727–742.
Hollink, V., Tsikrika, T., & Vries, A. P. (2010). Semantic search log analysis: a method and a study on professional image search. Journal of the American Society for Information Science and Technology.
Huang, J. and Efthimiadis, E. N. (2009). Analyzing and evaluating query reformulation strategies in web search logs. In Proceeding of the 18th ACM Conference on Information and Knowledge Management, Hong Kong, China, pages 77–86.
Ingwersen, P. (1992). Information Retrieval Interaction, Taylor Graham, London. Online: https://curis.ku.dk/ws/files/47050396/Ingwersen_IRI.pdf
Ingwersen, P. (1996). Cognitive perspectives of information retrieval interaction: Elements of a cognitive IR theory, Journal of Documentation, 25(1), 3-50.
J¨orgensen, C. and J¨orgensen, P. (2005). Image querying by image professionals. Journal of the Americal Society for Information Science and Technology, 56(12),1346–1359.
Jansen, B. J., Booth, D. L., and Spink, A. (2009). Patterns of query reformulation during web searching. Journal of the Americal Society for Information Science and Technology, 60(7),1358–1371
Jones, R. and Fain, D. C. (2003). Query word deletion prediction. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Toronto, Canada, 435–436.
Khosravi, M. and Jamali M, H,R. (2014). Log analysis of Iranian information and scientific document database (IranDoc) its user search behavior. Information management and processing 29(4): 979-1106.
Lau, T. and Horvitz, E. (1999). Patterns of search: analyzing and modeling web query refinement. In Proceedings of the Seventh International Conference on User Modeling, Banff, Canada, 119–128.
Mirzabaigi, M. (2011). Ontology usage in information retrieval: a review on current research and presenting a conceptual model. Information processing and management research journal, (Special for information storage and retrieval), 237-253. (Persian)
Mohammadi, F and Dokht Esmati, M, (2011). Evaluation of web based thesauri of Iranian research institute of science and technology: A descriptive approach. Information processing and management, 26(3), 675-694.
Nicholas, D, P. Huntington, and Jamali. H. R. (2008). User diversity: as demonstrated by deep log analysis. Electronic Library. 26 (1), 21-38.
Ozmutlu, H. C. (2009). Markovian analysis for automatic new topic identification in search engine transaction logs. Applied Stochastic Models in Business and Industry, 25(6), 737–768
Park, M. and T. S. Lee. 2013. Understanding science and technology information users through transaction log analysis. Library Hi Tech. 31 (1), 123-140.
Rajabali begloo, R (2008). Methods of log process analyzing (interaction) in information systems. Library and information science 3(39): 181-204.(Persian)
Rieh, S. Y. and Xie, H. (2006). Analysis of multiple query reformulations on the web: The interactive information retrieval context. Information Processing and Management, 42(3),751–768.
Saracevic, T. (1996). Modeling interaction in information retrieval (IR): A review and proposal. Proceedings of the American Society for Information Science, 33, 3-9, Online: http://www.scils.rutgers.edu/~tefko/ProcASIS1996.doc
Saracevic, T. (1997a). The stratified model of information retrieval interaction: Extension and applications, Proceedings of the American Society for Information Science, 34, 313-327. Online: http://www.scils.rutgers.edu/~tefko/ProcASIS1997.doc.
Saracevic, T. (1997b). "Users lost: Reflections on the past, future, and limits of information science", SIGIR Forum, 31(2), 16-27. Online: http://www.scils.rutgers.edu/~tefko/SIGIR _Forum_97.doc
Silverstein, C., Marais, H., Henzinger, M., and Moricz, M. (1999). Analysis of a very large web search engine query log. SIGIR Forum 33(1), 6-12.
Spink, A., Jansen, B.J., Wolfram, D., & Saracevic, T. (2002). From e-sex to e-commerce: Web search changes. IEEE Computer, 35(3), 107–111
Spink, A., Wolfram, D., Jansen, B.J., & Saracevic, T. (2001). The public and their queries. Journal of the American Society for Information Science and Technology, 52(3), 226–234
Spink, Amanda Jansen, Bernard J. Ozmultu, H. Cenk (2000) Use of query reformulation and relevance feedback by Excite users, Internet Research, 10 (4),317-328
Taheri, M, Shapuri, D, Razmi Shandi, M and Noruzi, E. (2013). Applicablaty of ofline Islamic database based on thesarus based on ISO 9241 and Iso 16982. Information systems and services, 8 (4): 81-92. (Persian)
Teevan, J., Adar, E., Jones, R., and Potts, M.A. (2007). Information re-retrieval: repeat queries in Yahoo's logs. SIGIR ‘07, 151-158.
Valinejad, A and Pasyar, P. (2007). Semantic retrieval challenges and usage of new thesarus. Proceddings of the national conference on thesaurus and its usage in electironic environment (15 November 2006 Qum): 491-520: Tehran, Librarian.
Whittle, M., Eaglestone, B., Ford, N., Gillet, V. J., and Madden, A. (2007). Data mining of search engine logs. Journal of the American Society for Information Science and Technology, 58(14):2382–2400.
Yi, K., J. Behesht, J. E. Leide, and A. Large. (2006). User search behavior of domain- specific information retrieval systems: an analysis of the query logs from psycINFO and ABC-Clio’s Historical Abstracts/America: history and life. American Society for Information Science and Technology. 57 (7), 1208.
ZakerShahrak, M. (2008). Search: user information behavior and information retrieval systems. Information and information seeking, (15), 50-60. (Persian)