As I’ve tweeted I have spent the last couple of days (and the weekend) helping out a customer that exceeded the hard 64 gb database size limit in Lotus Domino. Before discussing how we solved the problem and got the customer back in business I would like you to think about how situations like this could be avoided. And avoiding it is key as once you exceed the size you’re doomed.
First — how and why database platform would EVER allow a database to cross a file size that makes it break. Why doesn’t Domino start to complain at 50gb and make the warnings progressively harder to ignore as the database gets closer to 64gb. Why doesn’t it refuse data once it reaches 60gb? I find it totally unacceptable that a software product allows a database to exceed a size it knows it cannot handle.
Now I know that there are considerations for such a warning and that it could be done in application code (e.g. database script, QueryOpen event) but it really isn’t something an application developer should think about. Also it should be applied to backend logic as well and really doesn’t lend itself to a UI computation. I also know that DDM or similar could warn about it but it still doesn’t change my stance. The 64gb limit is a hard limit and reaching, and exceeding it, shouldn’t depend on me configuring a specific piece of functionality.
Second — having the option of keeping the view index in another location/file than the database would have helped. This has been brought up a number of times including at Lotusphere Ask-The-Developers sessions. One could argue that externalizing the view index from the database would just have postponed the problem but the view index takes up a substantial amount of disk for databases of this size.
Now on to how we saved the data.
The bottom line in this is that the customer was lucky. VERY lucky. The customer uses Cisco IP telephones and keeps a replica of the database in question on a secondary server for phone number lookup using a Java servlet. Due to the way the way the servlet is written only as single, very small, view was built on the secondary server. This is turn meant that the database that had exceeded 64 gb on the primary server was “only” 55 gb on the secondary server. The database on the primary server was toast and gave out very interesting messages if attempting the access or fixup the database:
**** DbMarkCorruptAgain(Both SB copies are corrupt)
So thank God they had the secondary server otherwise the outcome of the story would have been less pleasant because using the secondary server we were able to:
- Take the database offline (restrict access using ACL)
- Purge all view indexes (using Ytria ViewEZ)
- Create a database design only copy to hold archived documents
- Delete all views to avoid them accidentally being built
- Build a very simple view to prepare for data archiving
- Write a LotusScript to archive documents (copy then delete) from the database
- Use Ytria ScanEZ to delete deletion stubs from the database (this works for them because the database isn’t replicated to user workstations or laptops)
- Do a compact to reclaim unused space
- Make the database available on the primary server
Whew! They are now back in business after building views in the database. They were lucky – VERY lucky. If they hadn’t had that secondary replica the data would probably have been lost to much distress. To them and me.
So what are the main take aways from this?
- UI check — in the future all databases that I develop will have a database script check on the database size to try and prevent situations like this
- DAOS — enable DAOS for databases to keep attachments out of the database and keep the size down
- Monitoring — monitor databases either using DDM or other tools to try and prevent sitations like this
And so concludes a story from the field. 4 days later where my hair have turned gray from watching copy/fixup/compact progress indicators the customer is back in and happy once again. Whew!!