For our OnTime Group Calendar 2011 Notes user interface which is Java plugin based we have seen some strange problem reports with customers reporting that the widget installs but after installation an error message is displayed on screen (“The widgets below had issues during the installation. See the log for more details.”). The wierd thing about the reports were that the customers reported that the widget (and hence the plugins) actually installed just fine and worked after a client restart. What’s even funnier is that the same plugins installed without error if the customer used HTTP and not NRPC for their update site access protocol. In the log (Help/Support/View Trace) the below error messages was shown (my highlighting):
Could not access digest on the site: no protocol: 0/8E4DFE3996CBDD94C1257A33003CE3A9/$file/digest.zip [Fatal Error] :1:1: Content is not allowed in prolog. org.eclipse.ui.WorkbenchException: Content is not allowed in prolog. at org.eclipse.ui.XMLMemento.createReadRoot(Unknown Source) at org.eclipse.ui.XMLMemento.createReadRoot(Unknown Source) at com.ibm.rcp.dynamic.extensions... at com.ibm.rcp.dynamic.extensions... at com.ibm.rcp.dynamic.extensions... at com.ibm.rcp.toolbox.internal.management... at com.ibm.rcp.toolbox.internal.management... at com.ibm.rcp.toolbox.internal.management... at com.ibm.rcp.toolbox.internal.management... at com.ibm.rcp.toolbox.internal.management... at org.eclipse.core.internal.jobs... Caused by: org.xml.sax.SAXParseException: Content is not allowed in prolog. at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) at org.apache.xerces.jaxp.DocumentBuilderImpl... ... 11 more
This really had me scratching my head.
After reseaching and Googling I found a post called “Java: “Content is not allowed in prolog” – causes of this XML processing error” which led me in right direction.
When I opened the widget XML file in an editor and switched to HEX mode I saw this:
3 invisible bytes were inserted at the start of the file. I copy/pasted the XML into a new file which solved the issues. The 3 bytes actually turn out to be what’s called a Byte order mark or BOM for short. From the wikipedia article I found out that the BOM for a UTF-8 file is those exact bytes and I did save the file as UTF-8 in my editor.
The UTF-8 representation of the BOM is the byte sequence 0xEF,0xBB,0xBF. A text editor or web browser interpreting the text as ISO-8859-1 or CP1252 will display the characters  for this.
But the bytes were not there in my editor but only in the file that I received from a customer. So how did the bytes get there?!
So after more research and playing around I found out that the BOM is added by Notepad on Windows. But of course!! A lot of customers receive the XML file and change the update site URL to point to their own servers and if they use Notepad for this the BOM is added (see “The Notepad file encoding problem, redux” for more details).
But how to fix it? Telling customers not to use Notepad to edit the XML file would be a no-go as many probably no not use another editor. And even if they did the problem would probably still arise in some situations and we would spend time diagnosing it. The solution I’ve opted for is to change the encoding of the file to ASCII and change the encoding declation in the XML file to ASCII as well. The content of the XML file is unchanged and now customers can change it using Notepad without issues.
I emailed a bit with IBM on this and the response is the following:
"<span style="font-family: sans-serif; font-size: small; ">All widget xml files are written out using UTF-8 (widget export, ’email to’, or publish to catalog) and read in using UTF-8 (widget install)."</span>
IBM does however not consider it a high priority to support the UTF-8 BOM (which I agree with by the way) so for now be aware of the issue if you edit widget descriptors.
LikeLike