4.1 Getting A DOM Tree

The easiest way to get a DOM tree is to have it built for you. One of the modules in the xml.dom package is sax_builder.py, which provides a SaxBuilder class that will construct a DOM tree from its input. You must create a SaxBuilder instance and a SAX parser, associate the instance as the parser's document handler, and then retrieve the resulting tree.

import sys
from xml.sax import saxexts
from xml.dom.sax_builder import SaxBuilder

# Create a SAX parser and a SaxBuilder instance
p = saxexts.make_parser()
dh = SaxBuilder()
p.setDocumentHandler(dh)

# Parse the input, and close the parser
p.parseFile(sys.stdin)
p.close()

# Retrieve the DOM tree
doc = dh.document

The SaxBuilder document handler makes the resulting DOM tree available as its document attribute.

An even easier way of creating a DOM tree is to use the FileReader class in the xml.dom.utils module. It

from xml.dom import utils
reader = utils.FileReader('quotations.xml')
doc = reader.document

FileReader can handle both XML and HTML input files, and can parse input from any file-like object. This can be used to produce DOM trees from documents retrieved through HTTP:

from xml.dom.utils import FileReader
import urllib

URL = 'http://localhost/index.html'
sock = urllib.urlopen(URL)
f=FileReader()
doc = f.readFile('index.html', sock)

You could also subclass FileReader to implement some specialized behaviour that you require.