The xmlarch module contains an XML architectural forms processor written in Python. It allows you to process XML architectural forms using any parser that uses the SAX interfaces. The module allows you to process several architectures in one parsing pass. Architectural document events for an architecture can even be broadcasted to multiple DocumentHandlers. (e.g. you can have 2 handlers for the RDF architecture, 3 for the XLink architecture and perhaps one for the HyTime architecture.) The architecture processor uses the SAX DocumentHandler interface which means that you can register the architecture handler (ArchDocHandler) with any SAX 1.0 compliant parser. It currently does not process any meta document type definition documents (meta-DTDs). When a DTD parser module is available the code will be modified to use that in order to process meta-DTD information. Please note that validating and well-formed parsers may report different SAX events when parsing documents.
The xmlarch module contains six classes: ArchDocHandler, Architecture, ArchParseState, ArchException, AttributeParser and Normalizer.
Using the xmlarch module usually means that you have to do the following things:
Python code:
# Import needed modules
from xml.sax import saxexts, saxlib, saxutils
import sys, xmlarch
# Create architecture processor handler
arch_handler = xmlarch.ArchDocHandler()
# Create parser and register architecture processor with it
parser = saxexts.XMLParserFactory.make_parser()
parser.setDocumentHandler(arch_handler)
# Add an document handler to process the html architecture
arch_handler.addArchDocumentHandler("html", xmlarch.Normalizer(sys.stdout))
# Parse (and process) the document
parser.parse("simple.xml")
A sample XML document:
<?xml version="1.0"?> <?IS10744:arch name="html"?> <doc> <title html="h1">My first architectual document</title> <author html="address">Geir Ove Gronmo, grove@infotek.no</author> <para>This is the first paragraph in this document</para> <para html="p">This is the second paragraph</para> </doc>
The result:
<html> <h1>My first architectual document</h1> <address>Geir Ove Gronmo, grove@infotek.no</address> <p>This is the second paragraph</p> </html>
See also the files simple.py and simple.xml in the demo/arch directory of the Python/XML distribution. If you try to process the persons architecture in this document instead you get the following output:
<persons> <author>Geir Ove Grønmo</author><mentioned>Eliot Kimber</mentioned><mentioned>D avid Megginson</mentioned><mentioned>Lars Marius Garshol</mentioned> </persons>
A more complex example:
Python code:
# Import needed modules
from xml.sax import saxexts, saxlib, saxutils
import sys, xmlarch
# create architecture processor handler
arch_handler = xmlarch.ArchDocHandler()
# Create parser and register architecture processor with it
parser = saxexts.XMLParserFactory.make_parser()
parser.setDocumentHandler(arch_handler)
# Add an document handlers to process the html and biblio architectures
arch_handler.addArchDocumentHandler("html", xmlarch.Normalizer(open("html.out",
"w")))
arch_handler.addArchDocumentHandler("biblio", saxutils.ESISDocHandler(open("bib
lio1.out", "w")))
arch_handler.addArchDocumentHandler("biblio", saxutils.Canonizer(open("biblio2.
out", "w")))
# Register a default document handler that just passes through any incoming eve
nts
arch_handler.setDefaultDocumentHandler(xmlarch.Normalizer(sys.stdout))
# Parse (and process) the document
parser.parse("complex.xml")
Because this causes a lot of output I've not included the XML document and the results. See instead the files complex.py and complex.xml in the demo/xml directory of the Python/XML distribution and try it yourself.