<< Jack Dausman: The move to the Eclipse platform a bad thing for the Notes client? | Home | JSR-170 >>

Helping out a fellow blogger getting the actual bytes of an image resource - a lesson in the intricacies of DXL representation

This is a rather lengthy Friday post for those of you who have tried being haunted by a programming problem you simply couldn't let go. I guess those of you into Java and Notes development will probably appreciate it as well... :-)

It all started Thursday night when I read a post called I need some help, PLEASE over at Jamie Prices blog. Jamie needed some help to extract the bytes from an image resource in a Notes database using Java. Being the nice guy I am, the fact that it sounded pretty nerdy (direct hit again) and because I didn't think it would be that difficult I started messing around with it.

Easy? Well no, but I cracked it!

Thursday

I immediately thought "DXL" and started up Eclipse. I guessed that the image resource would probably be base64 encoded and thought it a small matter to solve Jamies problem. I imagined 3 easy steps:

  • Get the DXL for the image resource,
  • Extract the base64 encoded image resource,
  • Decode the base64 string to a byte array
and I would be laughing... Well not so easy.

I pretty quickly got at the base64 encoded image resource but whatever I did I couldn't get the image properly converted. I thought this was simply my tired fingers so I posted my idea and some code in a comment on Jamies blog and went to bed.

As I read your post you need some way to get a byte array of an image resource in the current database - right? You could use DXL to get at the base64 encoded image and then decode it into a byte array (here I find an image resource called "location.gif"):

Database db = ac.getCurrentDatabase();
NoteCollection nc = db.createNoteCollection(false);
nc.setSelectImageResources(true);
nc.buildCollection();
String noteid = nc.getFirstNoteID();
String noteid2 = null;
while (noteid.length() > 0) {
  noteid2 = nc.getNextNoteID(noteid);
  Document doc = db.getDocumentByID(noteid);
  if (!doc.getItemValue("$TITLE").elementAt(0).equals("location.gif")) {
    nc.remove(doc);
  }
  noteid = noteid2;
}

DxlExporter exporter = session.createDxlExporter();
exporter.setForceNoteFormat(true);
String dxl = exporter.exportDxl(nc);

If you look at the generated DXL a base64 encoded image will appear there. I didn't go further since I wanted to make sure that this is actually what you want to achieve.

<snip total_wrong="true">
I did however find out that I was unable to export design elements to DXL using Java from a local database under Notes 7.0.1 - I had to do it on a server... 
</snip>

/lekkim

Friday

Today, Friday, I received an e-mail from Jamie thanking me for looking into the issue. In hindsight I know he just smelled the blood... At that point however the e-mail just reminded me that I hadn't solved the problem which really bothered me. I did some research and found some evidence that something might be up with the base64 encoded image resource. A post I found on LDD pointed towards the fact that the base64 might contain some header and/or control information as well. Venturing further down this avenue and by comparing the bytes of a decoded image resource with the bytes of the same image not as an image resource revealed the secret.

The structure of an image resource

It turns out that an image resource has 66 bytes of header information followed by 10240 bytes of image data, 10 bytes of "control information", 10240 bytes of image data, 10 bytes of "control information" and so on... I felt I was on to something.

I extracted the base64 encoded image resource from the DXL using cut-and-paste and wrote some code to process the image resource bytes as per the above pattern an voilá! One perfect, viewable, valid image on my harddrive.


public byte[] getImageBytes() throws NotesException {
   // get base64 encoded image
   String base64 = this.getBase64();
   
   // convert the base64 to byte array 
   // using the Base64 encoder/decorder by 
   // Robert Harder from http://iharder.net/base64
   byte[] b_source = Base64.decode(base64);
   
   // remove header and control information
   ArrayList list = new ArrayList();
   int j=0;
   int k=0;
   int i=0;
   for (i=0; i<b_source.length; i++) {
      if (i<66) {
         // ignore (some kind of header)
      } else if (j < 10240) {
         list.add(new Byte(b_source[i]));
         j++;
      } else if (k<9){
         k++;
      } else {
         j=0;
         k=0;
      }
   }
   
   byte[] result = new byte[list.size()];
   i=0;
   for (Iterator ite=list.iterator(); ite.hasNext(); i++) {
      result[i] = ((Byte)ite.next()).byteValue();
   }
   
   // return result
   return result;
}

This was all well and good. The extracted bytes was easily written to disk using a java.io.FileOutputStream:

// write to a file on the file system
FileOutputStream out = new FileOutputStream("c:\\image_resource.gif");
out.write(my_acquired_image_bytes);
			
out.flush();
out.close();

Get at the base64 data from Java

Now came the time to get at the base64 data of any image resource. I chose to use a regular expression for the job.

A thing to bear in mind is that the base64 data of an image resource in DXL is sometimes split over two "rawdataitems". This seems to be filesize related since an image resource of 17 kb is in one item and one of 35 kb is in two items (probably our dear 32kb limit)

public String getBase64() throws NotesException {
   // build a note collection of image resources
   NoteCollection nc = this.pDb.createNoteCollection(false);
   nc.setSelectImageResources(true);
   nc.buildCollection();
   String noteid = nc.getFirstNoteID();
   String noteid2 = null;
   while (noteid.length() > 0) {
      noteid2 = nc.getNextNoteID(noteid);
      Document doc = this.pDb.getDocumentByID(noteid);
      if (!doc.getItemValue("$TITLE").elementAt(0).equals(this.pName)) {
         // not the one we are looking for
         nc.remove(doc);
      }
      
      noteid = noteid2;
   }
   
   // make sure we have at least one image resource
   if (nc.getCount() != 1) {
      throw new RuntimeException("Unable to find the requested image resource");
   }
   
   // export our collection if the found image resource as DXL
   DxlExporter exporter = this.pSession.createDxlExporter();
   exporter.setForceNoteFormat(true);
   String dxl = exporter.exportDxl(nc);
   
   // use regexp to get the base64 encoded image resource
   Pattern p = Pattern.compile("<rawitemdata type='1'>([a-zA-Z0-9/=+\\s]*)</rawitemdata>");
   Matcher m = p.matcher(dxl);
   StringBuffer buf = new StringBuffer();
   while (m.find()) {
      buf.append(m.group(1));
   }
   
   // return
   return buf.toString();
}

Please note: Since I used the java.util.regex package it means that the code requires Notes 7.x to run since the package was added in Java 1.4.x. It should however be fairly easy to get the base64 using string operations as well if you're pre-Notes 7.

Conclusion

Well it turned out not to be as easy as first expected but at least I have peace of mind now that I cracked it. I sent the code to Jamie and he's trying it out as well so we'll see how it goes. For those who are interested to see the complete code I have put together an example database available for download.

I hope Jamie can use the code. Happy Friday!

Tags :


Avatar: Ben Poole

Wow

That ROCKS! Excellent work.
Avatar: Richard Schwartz

Re: Helping out a fellow blogger getting the actual bytes of an image resource - a lesson in the intricacies of DXL representation

Nice. That's weird, though. 10240 bytes in the initial data block, and then 10250 bytes in each of the successive blocks. Very strange. Hmmm... it looks like something got dropped from your code in getImageBytes(). There's an "else" clause hanging without an "if" right after the "for (i=0; i<66)".

Re: Helping out a fellow blogger getting the actual bytes of an image resource - a lesson in the intricacies of DXL representation

Richard, thanks. It was a case of HTML that wasn't escaped correctly. Now corrected.
Avatar: Jamie Price

Re: Helping out a fellow blogger getting the actual bytes of an image resource - a lesson in the intricacies of DXL representation

I will be implementing this solution on Monday. Today i had to move along and make 5 other forms that will all use this technique when I've tested it. Thanks for doing this, you have no idea how big the dent in the wall next to my desk is from me beating my head against it. I'll update you when I've had the chance to implement it. FYI.. I didn't smell the blood in the water, at the time I read his first email the water was too cloudy for me to even think straight! Time for me to buy a book on java.

Re: Helping out a fellow blogger getting the actual bytes of an image resource - a lesson in the intricacies of DXL representation

Do you know Guido Purper?

Re: Helping out a fellow blogger getting the actual bytes of an image resource - a lesson in the intricacies of DXL representation

Nope...

Re: Helping out a fellow blogger getting the actual bytes of an image resource - a lesson in the intricacies of DXL representation

I did however do a Google search now that you mention the name and found some code on OpenNTF. Guidos code however looks like it deals with attachments on documents - mine deals with image resources so I guess they compliment each other just fine...
Avatar: Ben Langhinrichs

Re: Helping out a fellow blogger getting the actual bytes of an image resource - a lesson in the intricacies of DXL representation

If it helps any, the logic is fairly simple with a knowledge of the C API.
The image resource starts with a SIG_CD_IMAGEHEADER record followed by a series of SIG_CD_IMAGESEGMENT records. For example, the follow comes from my personal CD record dumper for a JPEG image in a rich text field, but the same format is used in the image resource:

005 Record type: CD_IMAGEHEADER
[125] Length: 28
ImageType: JPEG Width: 293 pixels
Height: 558 pixels
ImageDataSize: 46282
SegCount: 5
Flags: 0
Reserved: 0
006 Record type: CD_IMAGESEGMENT
[124] Length: 10250
DataSize: 10240
SegSize: 10240
007 Record type: CD_IMAGESEGMENT
[124] Length: 10250
DataSize: 10240
SegSize: 10240 008 Record type: CD_IMAGESEGMENT
[124] Length: 10250
DataSize: 10240
SegSize: 10240
009 Record type: CD_IMAGESEGMENT
[124] Length: 10250
DataSize: 10240
SegSize: 10240
010 Record type: CD_IMAGESEGMENT
[124] Length: 5332
DataSize: 5322
SegSize: 5322
The actual structures (from the C Api editods.h header file) are:

typedef struct
{
LSIG Header; /* Signature+Length */
WORD ImageType; /* GIF, JPEG) */
WORD Width; /* in pixels */
WORD Height; /* in pixels */
DWORD ImageDataSize; /* Size (in bytes) of the image data */
DWORD SegCount;/* Number of CDIMAGESEGMENT records expected to follow */
DWORD Flags; /* Flags (currently unused) */
DWORD Reserved;/* Reserved for future use */
} CDIMAGEHEADER;

typedef struct
{
LSIG Header; /* Signature and Length */
WORD DataSize;/* Actual Size of image bits in bytes, ignoring any filler */
WORD SegSize; /* Size of segment, is equal to or larger than DataSize if filler byte added to maintain word boundary */
/* Image bits for this segment */
} CDIMAGESEGMENT;

Hope that is of some interest to someone (besides me)

Re: Helping out a fellow blogger getting the actual bytes of an image resource - a lesson in the intricacies of DXL representation

Thanks for sharing! Just curious - does this structure also go for shared file resources? Do you know?

Avatar: Ben Langhinrichs

Re: Helping out a fellow blogger getting the actual bytes of an image resource - a lesson in the intricacies of DXL representation

The file resource is a similar structure, but not identical. It has a FILE header followed by File segments, and the structures are:

typedef struct
{
LSIG Header;
WORD FileExtLen;/* Length of file extenstion */
DWORD FileDataSize;/* Size (in bytes) of the file data */
DWORD SegCount;/* Number of CDFILESEGMENT records expected to follow */
DWORD Flags;/* Flags (currently unused) */
DWORD Reserved;/* Reserved for future use */
/*Variable length string follows (not null terminated). This string is the file extension for the file. */
} CDFILEHEADER;


typedef struct
{
LSIG Header;
WORD DataSize;/* Actual Size of image bits in bytes, ignoring any filler */
WORD SegSize;/* Size of segment, is equal to or larger than DataSize if filler byte added to maintain word boundary */
DWORD Flags;/* currently unused*/
DWORD Reserved;/* Reserved for future use */
/* File bits for this segment */
} CDFILESEGMENT;


The trick for what you want to do is that you have to actually read the FileExtLen so that you can account for those bytes, since you could have .js or .png or .html, for example. This byte count will presumably be padded if it is an odd number. Here is the dump from a .png file resource in my system (I know all this stuff, incidentally, because our Midas Rich Text LSX allows you to create and manipulate both image resources and file resources, so I had to learn about it).

001 Record type: CD_FFILEHEADER [97] Length: 28
FileDataSize: 23302
FileExtLen: 3
SegCount: 3
Flags: 0
Reserved: 0
extension data: 'png'
002 Record type: CD_FFILESEGMENT [96] Length: 10258
DataSize: 10240
SegSize: 10240
Flags: 0
Reserved: 0
003 Record type: CD_FFILESEGMENT [96] Length: 10258
DataSize: 10240
SegSize: 10240
Flags: 0
Reserved: 0
004 Record type: CD_FFILESEGMENT [96] Length: 2840
DataSize: 2822
SegSize: 2822
Flags: 0
Reserved: 0
Avatar: Mark Jorgensen

Re: Helping out a fellow blogger getting the actual bytes of an image resource - a lesson in the intricacies of DXL representation

Hi Mikkel, I just tried out your code with Gif images in Domino 7.0.1 and had some other issues.
First of all, the Gif images were saved internally in Domino in notesbitmap format, but there's an option when extracting the DXL to make Domino convert them to GIF.
DxlExporter exporter = pSession.createDxlExporter();
exporter.setForceNoteFormat(true);
exporter.setConvertNotesBitmapsToGIF(true);
Then, once the GIF file was decoded form base64, there was no 64 byte header, nor 10240 bytes of image data followed by 10 bytes of "control information". Actually the decoded byte array was perfect, no need to tamper with it. I guess the setConvertNotesBitmapsToGIF property generates some clean Gif files.
Cheers, Mark

Re: Helping out a fellow blogger getting the actual bytes of an image resource - a lesson in the intricacies of DXL representation

That's strange since the format is fixed according to what Ben Langhinrichs writes... I could understand that you could have only one "data segment" if the image is a gif and therefore smaller in filesize but I would think you'd always need the control information.

Anyways - thanks for sharing... :-)