Parsing xml file using SAX file in python

Again I love simplicity of python. I had a xml file that sometimes can range from 1KB to 100MB so I can't use ElementTree. Python SAX seems similar to java minus the verbose nature of java.

My xml file looks something like
<restorezipmeta messageid="UUID" outputzippath="test.zip" resultendpoint="http://XXX.7080/rest/private/RestoreRestService/1.0" userid="1">

<restoreentry logicalpath="/Shared/kpatel/CustomRules.rtf" physicalpath="/home/kpatel/try/About_Ubuntu_[Russian].rtf"></restoreentry>

<restoreentry logicalpath="/Shared/kpatel/BrowserPAC.doc" physicalpath="/home/kpatel/try/Derivatives_of_Ubuntu.doc"></restoreentry>
</restorezipmeta>

and here is a small sample to parse the xml using SAX api and create the zip. All you need to do is to create the ContentHandler like you do in Java minus the verbose nature


import zipfile
import os, sys
import logging
from xml.sax import make_parser
from xml.sax.handler import ContentHandler
logger = logging.getLogger("restore_consumer")

class RestoreXmlHandler(ContentHandler):
    def __init__ (self):
        print "init"
        
    def startElement(self, name, attrs):
        if name == "RestoreZipMeta":
            self.zipFilePath = attrs.get('outputZipPath')
            logger.info("Creating zip %s" % self.zipFilePath)
            self.restoreZip = zipfile.ZipFile(self.zipFilePath, "w", zipfile.ZIP_DEFLATED)
        elif name == "RestoreEntry":
            physicalPath = attrs.get('physicalPath')
            logicalPath = attrs.get('logicalPath')
            logger.debug("Adding physicalPath=%s to zip %s at logicalPath=%s" % (physicalPath, self.zipFilePath, logicalPath))
            self.restoreZip.write(physicalPath, logicalPath)

    def characters (self, ch):
        return

    def endElement(self, name):
        if name == "RestoreZipMeta":
            self.restoreZip.close()
            logger.info("closed zip %s" % self.zipFilePath)

def processMessage(restoreMetaFile):
    parser = make_parser()   
    curHandler = RestoreXmlHandler()
    parser.setContentHandler(curHandler)
    f = open(restoreMetaFile)
    parser.parse(f) 
    f.close()

if __name__ == '__main__':
    processMessage(sys.argv[1])

Programming fun at startup

Search This Blog

Parsing xml file using SAX file in python

Comments

Post a Comment

Popular posts from this blog

Killing a particular Tomcat thread

Adding Jitter to cache layer

Preparing for an interview after being employed 11 years at a startup