{"id":516,"date":"2009-06-05T08:05:15","date_gmt":"2009-06-04T22:05:15","guid":{"rendered":"http:\/\/nsrd.wordpress.com\/?p=516"},"modified":"2009-06-05T08:05:15","modified_gmt":"2009-06-04T22:05:15","slug":"architectural-failings-of-monolithic-personal-mail-databases","status":"publish","type":"post","link":"https:\/\/nsrd.info\/blog\/2009\/06\/05\/architectural-failings-of-monolithic-personal-mail-databases\/","title":{"rendered":"Architectural failings of monolithic personal mail databases"},"content":{"rendered":"<p>This article might be summarised as:<\/p>\n<p style=\"text-align:center;\"><em>I wish Microsoft would pull their heads in and design better email client software.<\/em><\/p>\n<p>I&#8217;ve used a lot of email clients over the years. That includes a wide variety of Unix text and graphical mail clients, <em>evolution<\/em> on Linux, the various Netscape-ish mail clients (Thunderbird, Mozilla Mail and Netscape Mail), Outlook, Groupwise, Lotus Notes and Entourage*.<\/p>\n<p>As a backup administrator, I think Outlook and Entourage represent a special kind of hell, due to the monolithic nature of the client storage database.<\/p>\n<p>As an example, here&#8217;s a <em>file<\/em> size breakdown of the current Entourage (2008) mail database on my laptop:<\/p>\n<pre>[Thu Jun 04 16:20:29]\npreston@archon ~\/Documents\/Microsoft User Data\/Office 2008 Identities\/Main Identity\n$ du -hs *\n5.7G\u00a0\u00a0 \u00a0Database\n 16K\u00a0\u00a0 \u00a0Mailing Lists\n4.0K\u00a0\u00a0 \u00a0My Day.plist\n 28K\u00a0\u00a0 \u00a0Rules\n304K\u00a0\u00a0 \u00a0Signatures<\/pre>\n<p>Note that they&#8217;re all <em>files<\/em>, not <em>directories<\/em>. That&#8217;s right, my Entourage mail database is currently 5.7GB.<\/p>\n<p>Every single email I&#8217;ve received in my current job is stored in a single, monolithic database. Obviously, there are copies on the central exchange server, with older copies shortcut via EmailXtender. However, I work remote to the primary exchange server, so I really do rely on easy access via my local mail store.<\/p>\n<p>Now, I know I could choose not to backup this mail database, given the email is already on the server, but because I&#8217;m remote <em>all the time<\/em>, I don&#8217;t really want to either:<\/p>\n<p>(a) Have to resync the database in the event of a crash<\/p>\n<p>or<\/p>\n<p>(b) Pull old email out of EmailXtender <em>just because I had a crash and had to retrieve shortcuts<\/em>.<\/p>\n<p>So, needing to backup the database, I&#8217;m faced with a nigh-on 6GB and growing daily incremental backup, even if all I do is <em>mark a single email as read<\/em>.<\/p>\n<p>Conversely, my personal email, stored within Apple Mail, is now over 8GB, and daily incrementals for that are typically less than 100KB.<\/p>\n<p>I can think of no compelling architectural reason to keep all the mail in one location other than a desire to keep individual messages off the filesystem, <em>and I no longer consider that a compelling reason<\/em>. Sure, my 8GB of mail stored as individual messages takes up a lot of inodes on the filesystem, but filesystems do certainly have a <em>lot<\/em> of inodes, so that&#8217;s not really a problem.<\/p>\n<p>Yes, having a lot of small files makes for a dense filesystem, but I&#8217;ll take a dense filesystem over a monolithic database with no backup tool any day for data storage. At least in the former, you can still back it up incrementally, albeit slowly, as opposed in the latter where you need to a full backup every time.<\/p>\n<p>Various stabs have been made in the past, particularly for Outlook, in supporting incremental backups of the PST\/local data stores \u2013 or to be more accurate, supporting <em>delta<\/em> backups (i.e., changed blocks only).<\/p>\n<p>I found it somewhat ironic when Apple released Mac OS 10.5 Leopard, and its most important feature, Time Machine, that many users complained about Time Machine struggling with Entourage backups**. The expectation was that Apple was somehow responsible for the monolithic database structure of a third party application.<\/p>\n<p>It&#8217;s not, in the same way that EMC isn&#8217;t responsible for the database structure for Oracle, Sybase, etc. In those cases, EMC are able to provide modules that support incremental backups due to cooperation between the various companies in making APIs and procedures available to one another. Further, for server based application storage which will frequently exceed client application storage by orders of magnitude, <em>this is entirely appropriate<\/em>.<\/p>\n<p>Bear in mind I&#8217;m not saying that Microsoft say, has APIs but doesn&#8217;t release them, or have designed product with no APIs at all. I don&#8217;t know what the state of API access for Entourage and Outlook mail database formats are \u2013 and frankly, <em>I don&#8217;t care<\/em>. For client-side mail storage, there shouldn&#8217;t <em>need<\/em> to be an API to access the database, and a licensed backup product necessary to do anything more advanced than cold backups. <em>It&#8217;s just email<\/em>. It should be plain text and immediately accessible.<\/p>\n<p>Given the complexity of integration achieved by say, Apple&#8217;s mail\/calendaring (particularly when including Apple&#8217;s server product) using an individual file structure for mail storage, and given the complexity of integration achieved by Domino for a series of much smaller databases, and given the complexity of integration achieved by Groupwise for a series of much smaller databases, <em>there is no excuse<\/em> for Microsoft.<\/p>\n<p>Is this article a rant? Yes, you could perhaps argue that it is. Maybe I was standing on a soapbox the entire time I was writing it, but it is a rant grounded in some architectural reasoning: using monolithic database storage <strong>for client side applications<\/strong> <em>when it is not required<\/em> and <em>when incremental backups would be highly desirable<\/em> is at best distastefully inelegant.<\/p>\n<p>&#8212;<br \/>\n* If you&#8217;ve not had exposure to it, Entourage is the &#8220;Outlook&#8221;<em>ish<\/em> mail client for the Macintosh.<\/p>\n<p>** I don&#8217;t, because I exclude Entourage from Time Machine backups.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This article might be summarised as: I wish Microsoft would pull their heads in and design better email client software.&hellip;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[3,4,5],"tags":[364,713],"class_list":["post-516","post","type-post","status-publish","format-standard","hentry","category-architecture","category-aside","category-backup-theory","tag-entourage","tag-outlook"],"aioseo_notices":[],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/pKpIN-8k","jetpack_sharing_enabled":true,"jetpack_likes_enabled":true,"_links":{"self":[{"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/posts\/516","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/comments?post=516"}],"version-history":[{"count":0,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/posts\/516\/revisions"}],"wp:attachment":[{"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/media?parent=516"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/categories?post=516"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/tags?post=516"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}