Element Retrieval Using Namespace Based on Keyword Search over XML Documents
Yang WANG, Zhikui CHEN, Xiaodi HUANG
DOI: 10.4236/jsea.2010.31008   PDF    HTML     4,780 Downloads   8,994 Views  


Querying over XML elements using keyword search is steadily gaining popularity. The traditional similarity measure is widely employed in order to effectively retrieve various XML documents. A number of authors have already proposed different similarity-measure methods that take advantage of the structure and content of XML documents. However, they do not consider the similarity between latent semantic information of element texts and that of keywords in a query. Although many algorithms on XML element search are available, some of them have the high computational complexity due to searching for a huge number of elements. In this paper, we propose a new algorithm that makes use of the se-mantic similarity between elements instead of between entire XML documents, considering not only the structure and content of an XML document, but also semantic information of namespaces in elements. We compare our algorithm with the three other algorithms by testing on real datasets. The experiments have demonstrated that our proposed method is able to improve the query accuracy, as well as to reduce the running time.

Share and Cite:

WANG, Y. , CHEN, Z. and HUANG, X. (2010) Element Retrieval Using Namespace Based on Keyword Search over XML Documents. Journal of Software Engineering and Applications, 3, 65-72. doi: 10.4236/jsea.2010.31008.

Conflicts of Interest

The authors declare no conflicts of interest.


[1] H. Z. Wang, J. Z. Li, W. Wang, and X. M. Lin, “Cod-ing-based join algorithm for structure queries on graph-structured XML document,” World Wide Web, Vol. 11, pp. 485–510, 2008.
[2] N. Govert, G. Kazai, N. Fuhr, and M. Lalmas, “Evaluating the effectiveness of content-oriented XML retrieval,” In-formation Retrieval, Vol. 9, No. 6, pp. 699–722, 2006.
[3] F. Weigel, H. Meuss, and K. U. Schulz, “Francois bry content and structure in indexing and ranking XML,” WebDB, Vol. 17–18, pp. 68–72, June 2004.
[4] G. Lin, S. Feng, C. Botev, and J. Shanmugasundaram, “XRank: Ranked keyword search over XML documents,” ACM International Conference Proceeding, SIGMOD, pp. 7–11, June 9–12, 2003.
[5] S. A. Yahia, N. Koudas, A. Marian, D. Srivastava, and D. Toman, “Structure and content scoring for XML,” Pro-ceedings of the 31st VLDB Conference, pp. 362–372, 2005.
[6] T. S. Kim, J. H. Lee, J. W. Song, and D. H. Kim, “Similar-ity measurement of XML documents based on structure and contents,” International Conference on Computational Science (ICCS), Part 3, LNCS 4489, pp. 902–905, 2007.
[7] M. Izabel, M. Azevedo, L. P. Amorim, and N. Ziviani, “A universal model for XML information retrieval,” LNCS pp. 312–318, 2005.
[8] Y. T. Zhang, L. Gong, and Y. C. Wang, “An improved TF-IDF approach for text classification,” Journal of Zhejiang University Science, Vol. 6A, No. 1, pp. 49–55, 2005.
[9] S. C. Haw and C. S. Lee, “TwigINLAB: A decomposi-tion-matching-merging approach to improving XML query processing,” American Journal of Applied Sciences, Vol. 5, No. 9, pp. 1199–1205, 2008.
[10] C. D. Manning, P. Raghavan, and H. Schutze,” Introduc-tion to information retrieval,” Cambridge Press, April, 2008.
[11] S. Al-Khalifa, C. Yu, and H. V. Jagadish, “Querying structured text in an XML database,” ACM International Conference Proceeding, SIGMOD, June 9–12, 2003.
[12] D. Li, X. J. Wang, and L. H. Wang, “Indexing temporal XML using semantic tree index,” IEEE Xplore, pp. 448– 451, 2008.
[13] http://www.musicxml.org/xml/elite.xml.
[14] Namespaces in XML Available: http://www.w3schools.com/XML/xml_namespaces.asp/.
[15] J. Pehcevski and J. A. Thom, “HixEval: Highlighting XML retrieval evaluation,” LNCS 3977, pp. 43–57, 2006.
[16] T. S. Kim, J. H. Lee, J. W. Song, and S. L. Lee, “Semantic structural similarity for clustering XML documents,” Inha University Technical Report, 2006. http://webbase.inha.ac.kr/TechnicalReport/tech_04.pdf.
[17] C. Yang and N. Liu, “Measuring similarity of semi-struc-tured documents with context weights,” ACM Interna-tional Conference Proceeding, pp. 719–720, August 6–11, 2006.
[18] S. Feng, G. Lin, C. Botev, and J. Shanmugasundaram, “Efficient keyword search over virtual XML views,” VLDB, pp. 1057–1065, September 23–28, 2007.
[19] K. Sauvagnat, L. Hlaoua, and M. Boughanem, “XML retrieval: What about using contextualrelevance?” SAC, pp. 1114–1115, April 23–27, 2006.
[20] B. Jeong, D. Lee, H. Cho, and J. Lee, “A novel method for measuring semantic similarity for XML schema match-ing,” Expert Systems with Applications, Vol. 24, pp. 1651–1658, 2008.
[21] M. S. Ali, M. P. Consens, and M. Lalmas, “Structural relevance in XML retrieval evaluation,” Proceedings of the SIGIR Workshop on XML and Information Retrieval, pp. 2–8, July 27, 2007.
[22] B. Kimelfeld, E. Kovacs, Y. Sagiv, and D. Yahav, “Using language models and the HITS algorithm for XML re-trieval,” LNCS 4518, Springer-Verlag Berlin Heidelberg, pp. 253–260, 2007.
[23] C. Botev and J. Shanmugasundaram, “Context-sensitive keyword search and ranking for XML,” WebDB, 2005.
[24] B. Sigurbjornsson, J. Kamps, and M. de Rijke, “The effect of structured Queries and selective Indexing on XML re-trieval,” INEX’05, LNCS 3977, pp. 104–118, 2006.
[25] M. Theobald, R. Schenkel, and G. Weikum, TopX & XXL at INEX 2005.
[26] P. Ogilvie and J. Callan, “Parameter estimation for a sim-ple hierarchical generative model for XML component re-trieval,” INEX, 2004.
[27] S. Geva, “GPX-gardens point XML,” IR at INEX 2005.
[28] V. Mihajlovic, G. Ramirez, T. Westerveld, D. Hiemstra, H. E. Blok, and A. P. de Vries, “Vague element selection, image search, overlap, and relevance feedback,” INEX 2005.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.