The easiest way to get a DOM tree is to have it built for you. One of the modules in the xml.dom package is sax_builder.py, which provides a SaxBuilder class that will construct a DOM tree from its input. You must create a SaxBuilder instance and a SAX parser, associate the instance as the parser's document handler, and then retrieve the resulting tree.
import sys from xml.sax import saxexts from xml.dom.sax_builder import SaxBuilder # Create a SAX parser and a SaxBuilder instance p = saxexts.make_parser() dh = SaxBuilder() p.setDocumentHandler(dh) # Parse the input, and close the parser p.parseFile(sys.stdin) p.close() # Retrieve the DOM tree doc = dh.document
The SaxBuilder document handler makes the resulting DOM tree available as its document attribute.
An even easier way of creating a DOM tree is to use the FileReader class in the xml.dom.utils module. It
from xml.dom import utils
reader = utils.FileReader('quotations.xml')
doc = reader.document
FileReader can handle both XML and HTML input files, and can parse input from any file-like object. This can be used to produce DOM trees from documents retrieved through HTTP:
from xml.dom.utils import FileReader
import urllib
URL = 'http://localhost/index.html'
sock = urllib.urlopen(URL)
f=FileReader()
doc = f.readFile('index.html', sock)
You could also subclass FileReader to implement some specialized behaviour that you require.