Compressed XML Database and Query Evaluation over XML Databases

Vijay.S. Gulhane, Dr. M.S. Ali


Extensible Markup Language (XML) [XML 1.0 (Second Edition) W3C Recommendation, October (2000)] is proposed as a standardized data format designed for specifying and exchanging data on the Web. With the proliferation of mobile devices, such as palmtop computers, as a means of communication in recent years, it is reasonable to expect that in the foreseeable future, a massive amount of XML data will be generated and exchanged between applications in order to perform dynamic computations over the Web. However, XML is by nature verbose, since terseness in XML markup is not considered a pressing issue from the design perspective [Smith S. Nair et al]. In practice, XML documents are usually large in size as they often contain much redundant data. The size problem hinders the adoption of XML since it substantially increases the costs of data processing, data storage, and data exchanges over the Web. As the common generic text compressors, such as Gzip, Bzip2, WinZip, PKZIP, or MPEG-7 (BiM) , are not able to produce usable XML compressed data, many XML specific compression technologies have been recently proposed. The essential idea of these technologies is that, by utilizing the exposed structure information in the input XML document during the compression process, they pursue two important goals at the same time. First, they aim at achieving a good compression ratio and time compared to the generic text compressors mentioned above. Second, they aim at generating a compressed XML document that is able to support efficient evaluation of queries over the data.. The aim of this paper is to introduce the system which has the ability of compressing the XML document and retrieving the required
information from the compressed version with less decompression required according to queries.

The system first compressed the XML document by proposed algorithm. The compressed file is divided into different relational databases doing so there is no need to decompress the complete file for retrieving the results of any query. Only the required information is decompressed and submitted to the user.  The average compression ratio of the designed compressor is considered competitive compared to other queriable XML compressors. Based on several experiments, the query processor part had the ability to answer different kinds of queries that require retrieving information from several compressed XML documents.


XML, Compression ratio, compression time, decompression time


A. Arion, A. Bonifati, G. Costa, S. D'Aguanno, I. Manolescu, and A.Pugliese. Efficient Query Evaluation over Compressed XML Data. Proceedings of EDBT (2004).

A.Arion, A. Bonifati, G. Costa, S. D'Aguanno, I. Manolescu, and A. Pugliese. XQueC: Pushing Queries to Compressed XML Data. Proceedings of the 29th International Conference on Very Large Data Bases (VLDB'03), (2003).

Al-Hamadani, B. T., Alwan, R. F., Lu, J. & Yip, J. 2009. Vague Content and Structure (VCAS) Retrieval for XML Electronic Healthcare Records (EHR). Proceeding of the 2009 International Conference on Internet Computing, USA. P: 241-246.

Al Hamadani, Baydaa (2011) Retrieving Information from Compressed XML Documents According to Vague Queries. Doctoral thesis, University of Huddersfield.

Augeri, C. J., Bulutoglu, D. A., Mullins, B. E., Baldwin, R.O. & Leemon C. Baird, I. (2007). An analysis of XML compression efficiency. Proceedings of the 2007

workshop on Experimental computer science, ACM, San Diego, California.

Clarke J (2004) The Expat XML parser. Extensible Markup Language (XML) 1.0 (Second Edition) W3C, Recommendation, October (2000)

G. Antoshenkov. Dictionary-Based Order-Preserving String Compression. VLDB Journal 6, page 26-39, (1997).

Gerlicher, A. R. S. (2007), Developing Collaborative XML Editing Systems, PhD thesis, University of the Arts London, London.

Groppe, J.(2008), SPEEDING UP XML QUERYING, PhD thesis,Zugl Lübeck University, Berlin.

H. Liefke and D. Suciu. XMill: An Efficient Compressor for XML Data. Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 153-164 (2000).

Harrusi, S., Averbuch, A. & Yehudai, A. 2006. XML Syntax Conscious Compression.Proceedings of the Data Compression Conference (DCC’06),

J. Cheng and W. Ng. XQzip: Querying Compressed XML Using Structural Indexing. Proceedings of EDBT (2004).

J. Clark. XML Path Language (XPath), (1999).

J. Gailly and M. Adler. gzip 1.2.4.

J. K. Min, M. J. Park, and C. W. Chung. XPRESS: A Queriable Compression for XML Data. Proceedings of the ACM SIGMOD International Conference on Management of Data (2003).

J.M.Martinez.MPEG-7Overview(version9). http://www.

Liefke, H. & Suciu, D. 2000. XMill: an Efficient Compressor for XML Data. ACM.

Mark nelson, Prinipal of data compression, pub 1999.

Moro, M. M., Ale, P., Vagena, Z. & Tsotras, V. J. 2008. XML Structural Summaries. PVLDB '08, Auckland, New Zealand.

Ng, W., Lam, W.-Y. & Cheng, J. (2006) Comparative Analysis of XML Compression Technologies. World Wide Web: Internet and Web Information Systems, Vol. 9,

Pages 5-33

Norbert, F. & Kai, G. (2004) XIRQL: An XML query language based on information retrieval concepts. ACM Trans. Inf. Syst., 22, 313-356.

P. M. Tolani and J. R. Haritsa. XGRIND: A Query- friendly XML Compressor. IEEE Proceedings of the 18th International Conference on Data Engineering (2002). pkzip.

S. Boag et al. XQuery 1.0: An XML Query Language, Nov. (2002).

Smith S. Nair XML compression techniques: A survey. Department of Computer Science ,University of Iowa, USA

T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley-Interscience, John Wiley &S ons, Inc., New York, (1991). The bzip2 and libbzip2 official home page.

Violleau, T. (2001) Java Technology and XML,ORACLE.

W. Y. Lam, W. Ng, P. T. Wood, and M. Levene. XCQ:

XML Compression and Querying System. Poster

Proceedings, 12th International World-Wide Web

Conference (WWW2003), May (2003).


Full Text: PDF


  • There are currently no refbacks.


All Rights Reserved © 2012 IJARCSEE

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 Unported License.