{"id":3380,"date":"2012-01-22T19:18:07","date_gmt":"2012-01-22T09:18:07","guid":{"rendered":"http:\/\/nsrd.info\/blog\/?p=3380"},"modified":"2018-12-11T14:51:24","modified_gmt":"2018-12-11T04:51:24","slug":"dark-data","status":"publish","type":"post","link":"https:\/\/nsrd.info\/blog\/2012\/01\/22\/dark-data\/","title":{"rendered":"Dark Data"},"content":{"rendered":"<p><a href=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2011\/12\/dark-data_Snapseed.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3510\" title=\"Dark Data\" src=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2011\/12\/dark-data_Snapseed.jpg\" alt=\"Dark Data\" width=\"600\" height=\"400\" srcset=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2011\/12\/dark-data_Snapseed.jpg 600w, https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2011\/12\/dark-data_Snapseed-300x200.jpg 300w, https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2011\/12\/dark-data_Snapseed-450x300.jpg 450w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><\/a><\/p>\n<p>We&#8217;ve all heard the term <em>Big Data<\/em>&nbsp;&#8211; it&#8217;s something the vendors have been ramming down our throats with the same level of enthusiasm as <em>Cloud<\/em>. Personally, I think <em>Big Data<\/em>&nbsp;is a problem that shouldn&#8217;t exist: it serves for me as a stark criticism of OS, Application, Storage and Software companies for failing to anticipate the high end of the data growth arena and developing suitable mechanisms for dealing with it as part of the regular tool sets. After all, why should the end user have to ask him\/herself: &#8220;Hmmm, do I have <em>data<\/em>&nbsp;or <em>big data<\/em>?&#8221;<\/p>\n<p>Moving right along, recently another term has been starting to popup, and it&#8217;s far a more interesting \u2013 and legitimate \u2013 a problem.<\/p>\n<p>It&#8217;s <em>dark data<\/em>.<\/p>\n<p>If you haven&#8217;t heard of the term, I&#8217;m betting that you&#8217;ve either guessed the meaning or have a bit of an idea about it.<\/p>\n<p>Dark data refers to all those bits and pieces of data you&#8217;ve got floating around in your environment that <em>aren&#8217;t<\/em>&nbsp;fully accounted for. Such as:<\/p>\n<ul>\n<li>All those user PST files on desktops and notebooks;<\/li>\n<li>That server a small workgroup deployed for testing purposes that&#8217;s not centrally managed or <em>officially<\/em>&nbsp;known about;<\/li>\n<li>That research data an academic is storing on a 2TB USB drive connected to her laptop;<\/li>\n<li>That offline copy of a chunk of the fileserver someone grabbed before going overseas that&#8217;s now sufficiently different from the real content of the fileserver;<\/li>\n<li><em>and so on<\/em>.<\/li>\n<\/ul>\n<p>Dark data is a real issue within the business environment, because there&#8217;s potentially a large amount of critical information &#8220;out there&#8221; in the business but not necessarily under the control of the IT department.<\/p>\n<p>You might call it <em>decentralised data<\/em>.<\/p>\n<p>As we know from data protection, decentralised backups are particularly dangerous; they increase the cost of control and maintenance, they decrease the reliability of the process, and they can be a security nightmare. It&#8217;s exactly the same for dark data \u2013 in fact, worse, because by the very nature of the definition, it&#8217;s also data that&#8217;s unlikely to be backed up.<\/p>\n<p>To try to control the spread of dark data, some companies will institute rigorous local storage policies, but these often present bigger headaches than they&#8217;re worth. For instance, locking down user desktops to make local storage not writeable isn&#8217;t always successful, and the added network load by shifting user profiles across to fileservers can be painful. Further, pushing these files across to centralised storage can make for extremely dense filesystems (or at least contribute towards them), trading one problem for another. Finally, it introduces new risk to the business, making users <em>extremely<\/em>&nbsp;unproductive if there are network or central storage issues.<\/p>\n<p>There&#8217;s a few things a business can do in relation to <em>dark data<\/em> so as to decrease the headache and challenges created by it. These are <em>acceptance<\/em>, <em>anticipation<\/em>, and&nbsp;<em>discovery<\/em>.<\/p>\n<ol>\n<li><strong>Acceptance<\/strong> \u2013 Acknowledge that dark data will find its way into the organisation. Keeping the corporate head in the sand over the existence of dark data, or blindly adhering to the (false) notion that rigorous security policies will prevent storage of data anywhere in the organisation except centrally, is foolish. Now, this doesn&#8217;t mean that you have to <em>accept<\/em>&nbsp;that data will <em>become<\/em>&nbsp;dark. Instead, acknowledging that there <em>will<\/em>&nbsp;be dark data out there will keep it as a known issue. What&#8217;s more, because it&#8217;s actually acknowledged by the business, it can be <em>discussed<\/em>&nbsp;by the business. Discussion will facilitate two key factors: keeping users aware of the dangers of dark data, and encouraging users to report dark data.<\/li>\n<li><strong>Anticipation<\/strong> \u2013 Accepting that dark data exists is one thing; anticipating what can be done about it, and how it might be found allows a company to actually start <em>dealing<\/em>&nbsp;with dark data. Anticipating dark data can&#8217;t happen unless someone is <em>responsible<\/em>&nbsp;for it. Now, I&#8217;m not suggesting that being <em>responsible<\/em>&nbsp;for dark data means getting in trouble if there are issues with unprotected dark data going missing \u2013 if that were the case, not a single person in a company would want to be responsible for it. (And any person who <em>did<\/em>&nbsp;want to be responsible under those circumstances would likely not understand the scope of the issue.) The obvious person for this responsibility is the <em>Data Protection Advisor<\/em>. (See <a title=\"What don't you backup?\" href=\"https:\/\/nsrd.info\/blog\/2011\/08\/23\/what-dont-you-backup\/\" target=\"_blank\">here<\/a> and <a title=\"But where does the DPA fit in?\" href=\"https:\/\/nsrd.info\/blog\/2011\/08\/24\/but-where-does-the-dpa-fit-in\/\" target=\"_blank\">here<\/a>.) You might argue that the dark data problem explicitly points out the need for one or more DPAs at every business.<\/li>\n<li><strong>Discovery<\/strong> \u2013 No discovery process for dark data will be fully automated. There will be a level of automation that can be achieved via indexing and search engines deployed from central IT, but given dark data may be on systems which are only intermittently connected, or outside of the domain authority of IT, there will be a human element as well. This will consist of the DPA(s), end users, and team leaders, viz:<\/li>\n<ul>\n<li><strong>The DPA<\/strong> will be tasked with not only periodic visual inspections of his\/her area of responsibility, but will also be responsible for issuing periodic reminders to staff, requesting notification of any local data storage.<\/li>\n<li><strong>End users<\/strong> should be aware (via induction, and company policies) of the need to avoid, as much as possible, the creation of data outside of the control and management of central IT. But they should equally be aware that in situations where this happens, a policy can be followed to notify IT to ensure that the data is protected or reviewed.<\/li>\n<li><strong>Team leaders<\/strong> should equally be aware of the potential for dark data creation, as per end users, but should also be tasked with liaising with IT to ensure dark data, once discovered, is appropriately classified, managed and protected. This may sometimes necessitate moving the data under IT control, but it may also at times be an acknowledgement that the data is best left local, with appropriate protection measures implemented and agreed upon.<\/li>\n<\/ul>\n<\/ol>\n<p>Dark data is a real problem that will exist in practically every business; however, it doesn&#8217;t have to be a <em>serious<\/em>&nbsp;problem, when carefully dealt with. The above three rules \u2013 acceptance, anticipation, and discovery, will ensure it stays managed.<\/p>\n<p><strong>[2012-01-27 Addendum]<\/strong><\/p>\n<p>There&#8217;s now a followup to this article &#8211; &#8220;<a title=\"Data Awareness Distribution in the Enterprise\" href=\"https:\/\/nsrd.info\/blog\/2012\/01\/27\/data-awareness-distribution-in-the-enterprise\/\" target=\"_blank\">Data Awareness Distribution in the Enterprise<\/a>&#8220;.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>We&#8217;ve all heard the term Big Data&nbsp;&#8211; it&#8217;s something the vendors have been ramming down our throats with the same&hellip;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[5],"tags":[163,270,271,272,940],"class_list":["post-3380","post","type-post","status-publish","format-standard","hentry","category-backup-theory","tag-big-data","tag-dark-data","tag-data","tag-data-architecture","tag-storage-architecture"],"aioseo_notices":[],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/pKpIN-Sw","jetpack_sharing_enabled":true,"jetpack_likes_enabled":true,"_links":{"self":[{"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/posts\/3380","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/comments?post=3380"}],"version-history":[{"count":1,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/posts\/3380\/revisions"}],"predecessor-version":[{"id":7490,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/posts\/3380\/revisions\/7490"}],"wp:attachment":[{"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/media?parent=3380"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/categories?post=3380"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/tags?post=3380"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}