xpath – lekkimworld.com

Using Abdera XPath on the Lotus Connections service document

As always namespaces and XPath/XSLT is “funny” to play around with. Tonight I have been messing a little with the ATOM feeds available in Lotus Connections and needed to use XPath to extract a URL from the service document instead of the object model in Abdera. I didn’t find it all together easy to figure out so I’ll post it here in case it helps anyone. The key is to specify a java.util.Map with the two namespaces in use (atom, app) when doing the XPath (the “ns” variable) and remembering to use the correct namespaces in the actual XPath string.

Document<Service> doc = ...;
XPath x = Abdera.getNewXPath();
Map ns = new HashMap();
ns.put("atom", "http://www.w3.org/2005/Atom");
ns.put("app", "http://www.w3.org/2007/app");

String link = x.valueOf("/app:service/app:workspace" +
   "/atom:link[@rel='http://www.ibm.com/xmlns/prod/sn" +
   "/mv/theboard']/@href", service, ns));
System.out.println("Link: " + link);

Tool of the day: XPather

Simple, powerful and easy to use XPath extension for Firefox to easily work with XPath statements: XPather.

Building XPath expression from XML node

When programmatically dealing with large XML (or DXL) documents it is often beneficial to be able to indicate, for logging or similar, which node the processing stopped at or where the “thing” you are logging was found. The simplest way to do this for XML is using XPath. The code below is from a library I wrote and constructs a XPath expression to the org.w3c.dom.Node supplied to the method.

Consider a XML document like the one below and the below table. The left column shows the title we supply to the method and the right column the returned XPath. Notice how the method will try to use “known” attributes to address the specific node (id/name attribute) to make the XPath more readable. If no “known” attribute is found we fall back to the sibling index.

Supplied node	XPath
Title node of “Harry Potter and the Chamber of Secrets”	bookstore/book[@id=’2′]/title[1]
Second tag node of “Harry Potter and the Prisoner of Azkaban”	bookstore/book[@id=’3′]/tags[1]/tag[2]

If you combine this with a nice logging engine like log4j you have a robust solution for reproducing parsing issues.

Use to your heart’s content…

<?xml version="1.0" encoding="iso-8859-1" ?>
<bookstore>
  <book id="1">
    <title>Harry Potter and the Philosopher's Stone</title>
    <isbn>0747532745</isbn>
    <tags>
      <tag>children</tag>
      <tag>stone</tag>
    </tags>
  </book>
  <book id="2">
    <title>Harry Potter and the Chamber of Secrets</title>
    <isbn>0747538484</isbn>
    <tags>
      <tag>children</tag>
      <tag>secrets</tag>
    </tags>
  </book>
  <book id="3">
    <title>Harry Potter and the Prisoner of Azkaban</title>
    <isbn>0747546290</isbn>
    <tags>
      <tag>children</tag>
      <tag>prisoner</tag>
    </tags>
  </book>
</bookstore>

/* *********************************************************************
 *                    *** DISCLAIMER ***
 * This code is covered by the Creative Commons Attribution 2.5 License
 * (http://creativecommons.org/licenses/by/2.5/).
 *
 * You may use this code in any way you see fit as long as you realize
 * that the code is provided AS IS without any warrenties and confers
 * to rights what so ever! The author cannot be held accountable for
 * any loss, direct or indirect, afflicted by using the code.
 *
 * *********************************************************************
 */

import java.util.Stack;

import org.w3c.dom.Element;
import org.w3c.dom.Node;

/**
 * Utility class for dealing with XML DOM elements.
 *
 *
 * @author Mikkel Heisterberg, lekkim@lsdoc.org
 */
public class ElementUtil {

   /**
    * Constructs a XPath query to the supplied node.
    *
    * @param n
    * @return
    */
   public static String getXPath(Node n) {
      // abort early
      if (null == n) return null;

      // declarations
      Node parent = null;
      Stack hierarchy = new Stack();
      StringBuffer buffer = new StringBuffer();

      // push element on stack
      hierarchy.push(n);

      parent = n.getParentNode();
      while (null != parent && parent.getNodeType() != Node.DOCUMENT_NODE) {
         // push on stack
         hierarchy.push(parent);

         // get parent of parent
         parent = parent.getParentNode();
      }

      // construct xpath
      Object obj = null;
      while (!hierarchy.isEmpty() && null != (obj = hierarchy.pop())) {
         Node node = (Node) obj;
         boolean handled = false;

         // only consider elements
         if (node.getNodeType() == Node.ELEMENT_NODE) {
            Element e = (Element) node;

            // is this the root element?
            if (buffer.length() == 0) {
               // root element - simply append element name
               buffer.append(node.getLocalName());
            } else {
               // child element - append slash and element name
               buffer.append("/");
               buffer.append(node.getLocalName());

               if (node.hasAttributes()) {
                  // see if the element has a name or id attribute
                  if (e.hasAttribute("id")) {
                     // id attribute found - use that
                     buffer.append("[@id='" + e.getAttribute("id") + "']");
                     handled = true;
                  } else if (e.hasAttribute("name")) {
                     // name attribute found - use that
                     buffer.append("[@name='" + e.getAttribute("name") + "']");
                     handled = true;
                  }
               }

               if (!handled) {
                  // no known attribute we could use - get sibling index
                  int prev_siblings = 1;
                  Node prev_sibling = node.getPreviousSibling();
                  while (null != prev_sibling) {
                     if (prev_sibling.getNodeType() == node.getNodeType()) {
                        if (prev_sibling.getLocalName().equalsIgnoreCase(node.getLocalName())) {
                           prev_siblings++;
                        }
                     }
                     prev_sibling = prev_sibling.getPreviousSibling();
                  }
                  buffer.append("[" + prev_siblings + "]");
               }
            }
         }
      }

      // return buffer
      return buffer.toString();
   }
}

Free on-line XPath tool

If you occasionally need to do a XPath query against a XML document and don’t want to shell out the money for a professional tool to cover that need you should take a look at the BIT-101 XPath Query Tool.

Ahhh – there is of cause the Microsoft (proprietary) solution

As mentioned
yeasterday I have been reinventing the wheel and (re)writing XPath 2.0 functions as named XSLT templates since the MSXML 3 in Internet Explorer 6 isn’t XPath 2.0 compliant. As always there is however a Microsoft proprietary solution using the urn:schemas-microsoft-com:xslt namespace. This namespace adds support for a number of utility functions as mentioned in the “Microsoft XPath Extension Functions“-article over at Microsoft Developer Network.

You have to be running MSXML 4 for this namespace to work however which means that even the proprietary solution isn’t workable for me since MSXML 3 is the default for Internet Explorer 6.

While researching this subject I found that MSXML 3 (or 4) isn’t the newest version. There is a MSXML 5 (only used with Office 2003) and a MSXML 6 (supplied with Visual Studio 2005). Even the newest MSXML 6 doesn’t however support XPath 2.0. The supported API’s in MSXML 6 is:

XML 1.0 (DOM & SAX2 APIs)
XML Schema (XSD) 1.0
XPath 1.0
XSLT 1.0

Come on already – please implement the standards!

How I hate reinventing the wheel

I’m doing quite a lot of work at the moment defining XML document “languages” and associated XML schemas (why I’m happy for the great XML and XML Schema (XSD) support in Callisto) at the moment. In that connection I’m also doing XSLT stylesheets for end-user presentation in Internet Explorer (version 6 or higher). We have to make Internet Explorer a requirement for user-presentation since Firefox doesn’t support resolving entity references when using XSLT which we need for content reuse.

Once this was settled it was all well and good until I yesterday discovered that Internet Explorer 6 (and hence MSXML 3.0) doesn’t support XPath 2.0 which means that all the nifty XSLT functions defined in XPath 2.0 such as the date/time functions cannot be used. Bummer!

So here I am back at reinventing the wheel rewriting all the date/time functions as named templates using the substring XPath 1.0 substring function. Even more bummer!