Show ‘n Tell Thursday: Validating XML when parsing with NotesSAXParser (23 Mar 2006)

When using and parsing XML for any purpose other than purely educational you really want to be using a DTD (Document Type Definition) to make sure the document is “valid” in the eyes of the application. This is possible both when using the DOM and the SAX parser, though there are differences as to when you might discover a violation of the DTD.

Using DOM a violation will be discovered before you actually start walking the DOM tree since the entire document is parsed and validated before control is given back to you as the programmer. Using SAX a violation is not reported until encountered which means that you might have parsed and processed most of the XML document so beware… When using SAX with a DTD you really shouldn’t consider any data from the XML as being valid until the SAX_EndDocument event has been raised (in the case of LotusScript).

To use a DTD with the SAX parser you really only need to be aware of the fact that the path to the DTD should be specified using the file:// protocol as shown in the example XML document below (notice the DTD import in bold):

<?xml version="1.0" encoding="iso-8859-1" ?>
<!DOCTYPE names SYSTEM "file:///c:/names.dtd">
<names>
  <name>
    <first>Mikkel</first>
    <last>Heisterberg</last>
  </name>
</names>

In this simple example the DTD is very simple and simply states that:

  • there is an element called “names”
  • the “names”-element can have 0 to many children elements called “name”
  • the “name”-element must have two children called “first” and “last” in that order
  • the contents of the “first”-element is parsable character data (#PCDATA)
  • the contents of the “last”-element is parsable character data (#PCDATA)

A caveat that most forget to mention is that the name of the top-level element should be named in the DTD import (<!DOCTYPE names SYSTEM…).

<!ELEMENT names (name*)>
<!ELEMENT name (first,last)>
<!ELEMENT first (#PCDATA)>
<!ELEMENT last (#PCDATA)>

Left is only to tell the NotesSAXParser to validate the parsed XML document against the specified DTD using the InputValidationOption-property (see code in bold).

Sub Initialize
  Dim session As New NotesSession
  Dim stream_xml As NotesStream
  Dim parser As NotesSAXParser

  Set stream_xml = session.CreateStream()
  Call stream_xml.Open("c:names.xml")

  'create validating parser
  Set parser = session.CreateSAXParser(stream_xml)
  parser.InputValidationOption = VALIDATE_ALWAYS

  'declare events
  On Event SAX_StartElement From parser Call SAXStartElement

  'start parsing
  On Error Goto catch
  Call parser.Parse()
  Exit Sub
catch:
  Msgbox parser.Log
  Resume finally
finally:

End Sub

Public Sub SAXStartElement(Source As NotesSAXParser, Byval ElementName As String, Attributes As NotesSAXAttributeList)
   Print "Encountered start of element: " + ElementName
End Sub

The above code can be copied into an agent in Domino Designer if you want to play around with it. To play around with it change the XML (e.g. by adding multiple “name”-elements and having the last one be invalid) to see how the violation isn’t reported until encoutered. Both the XML and the DTD goes in the root of the C-drive as the code is written.

Happy parsing…