{"id":10677,"date":"2021-10-31T06:29:31","date_gmt":"2021-10-30T20:29:31","guid":{"rendered":"https:\/\/nsrd.info\/blog\/?p=10677"},"modified":"2021-10-31T06:29:35","modified_gmt":"2021-10-30T20:29:35","slug":"crunching-networker-deduplication-stats","status":"publish","type":"post","link":"https:\/\/nsrd.info\/blog\/2021\/10\/31\/crunching-networker-deduplication-stats\/","title":{"rendered":"Crunching NetWorker Deduplication Stats"},"content":{"rendered":"\n<p>If you use NetWorker with Data Domain, you&#8217;ve probably sometimes wanted to know which of your clients have the best deduplication \u2013 or perhaps more correctly, you&#8217;ve probably wanted to occasionally drill into clients delivering lower deduplication levels.<\/p>\n\n\n\n<p>There are NMC and DPA reports that&#8217;ll pull this data for you, but likewise, you can get information directly from <em>mminfo<\/em> that&#8217;ll tell you details about deduplication for Data Domain Boost devices.<\/p>\n\n\n\n<p>You&#8217;ll find the information you&#8217;re after within the output of <em>mminfo -S<\/em>. The -S option in mminfo provides a veritable treasure trove of extended details about individual savesets. To give you an example, let&#8217;s look at just one saveset in -S format. First, below, you&#8217;ll see the mminfo command to identify a saveset ID, then the -S option invoked against a single saveset.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">[Sun Oct 24 13:43:02]\n\n## ~  \n## root@orilla \n\n$ <strong>mminfo -q \"name=\/nsr\/scripts,savetime&gt;=4 hours ago\" -r \"ssid,savetime(20)\"<\/strong>\n ssid          date     time\n3816067119    24\/10\/21 09:43:58\n3665074337    24\/10\/21 10:20:00\n\n[Sun Oct 24 13:43:10]\n\n## ~  \n## root@orilla \n\n$ <strong>mminfo -q \"ssid=3665074337\" -S<\/strong>\nssid=3665074337 savetime=24\/10\/21 10:20:00 (1635031200) orilla.turbamentis.int:\/nsr\/scripts\n  level=manual sflags=vF       size=2453183044   files=18         insert=24\/10\/21\n  create=24\/10\/21 complete=24\/10\/21 browse=24\/11\/21 10:20:00 retent=24\/11\/21 23:59:59\n  clientid=c6fb4ece-00000004-5fdabaf1-5fdabaf0-00019ed8-a41317f3\n          *backup start time: 1635031200;\n*ss data domain backup cloneid: 1635031201;\n*ss data domain dedup statistics: \"v1:1635031201:2459622768:395458878:151768627\";\n                       group: NAS_PMdG;\n            saveset features: CLIENT_SAVETIME;\n  Clone #1: cloneid=1635031201  time=24\/10\/21 10:20:01    retent=24\/11\/21  flags=\n    frag@         0 volid=3923434705 file\/rec=       0\/0     rn=0 last=24\/10\/21<\/pre>\n\n\n\n<p>There&#8217;s a support article that explains how to review this information, <strong><a href=\"https:\/\/www.dell.com\/support\/kbdoc\/en-au\/000023290\/networker-how-to-read-data-domain-statistics-for-a-networker-save-set?lang=en\" target=\"_blank\" rel=\"noreferrer noopener\">here<\/a><\/strong>. The line you&#8217;re looking for in particular though is the &#8220;*ss data domain dedup statistics&#8221; \u2013&nbsp;outlined in that aforementioned support article.<\/p>\n\n\n\n<p>I recently wanted to review some deduplication details, and while DPA and NMC are options, I needed to analyse some data over the weekend<span id='easy-footnote-1-10677' class='easy-footnote-margin-adjust'><\/span><span class='easy-footnote'><a href='https:\/\/nsrd.info\/blog\/2021\/10\/31\/crunching-networker-deduplication-stats\/#easy-footnote-bottom-1-10677' title='There are, unfortunately, only so many hours in a busy work-day.'><sup>1<\/sup><\/a><\/span> and as such didn&#8217;t have access to the live DPA or NMC host for a particular environment. So, since I&#8217;m not doing anything social while Melbourne&#8217;s COVID case numbers remain so high, I tasked myself with writing a script to analyse deduplication statistics from raw <em>mminfo -S<\/em> output. <\/p>\n\n\n\n<p>The script I wrote (in Perl, of course), can either be run on a NetWorker server via a query option, or against the saved output of <em>mminfo -S<\/em>. The usage syntax is this:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2021\/10\/dedupe-analysis-syntax.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1718\" height=\"1292\" src=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2021\/10\/dedupe-analysis-syntax.png\" alt=\"\" class=\"wp-image-10681\" srcset=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2021\/10\/dedupe-analysis-syntax.png 1718w, https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2021\/10\/dedupe-analysis-syntax-300x226.png 300w, https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2021\/10\/dedupe-analysis-syntax-1024x770.png 1024w, https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2021\/10\/dedupe-analysis-syntax-768x578.png 768w, https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2021\/10\/dedupe-analysis-syntax-1536x1155.png 1536w\" sizes=\"auto, (max-width: 1718px) 100vw, 1718px\" \/><\/a><figcaption>Usage Options for dedupe-analysis<\/figcaption><\/figure>\n\n\n\n<p>The script pulls the data and outputs at least three key data files, all in CSV format:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>An &#8220;all&#8221; file that contains deduplication details for each saveset<\/li><li>A &#8220;client&#8221; file that contains summarised deduplication details for each client<\/li><li>A &#8220;type&#8221; file that contains summarised deduplication details for each client and backup type within the client (e.g., SQL, SAP, Oracle, Exchange, etc.)<\/li><\/ul>\n\n\n\n<p>Since a saveset, via cloning, can live on more than one Data Domain, the volume IDs are included in the output \u2013&nbsp;and the second two files actually provide the breakdowns first by volume ID \u2013&nbsp;so you can see stats on a per-Data Domain basis. There&#8217;s also an option to anonymise the host names in the output, and if you invoke that, a CSV will also be written containing the original hostname to anonymised hostname conversion<span id='easy-footnote-2-10677' class='easy-footnote-margin-adjust'><\/span><span class='easy-footnote'><a href='https:\/\/nsrd.info\/blog\/2021\/10\/31\/crunching-networker-deduplication-stats\/#easy-footnote-bottom-2-10677' title='The &amp;#8220;all&amp;#8221; output does not attempt to anonymise saveset names \u2013 though it will anonymise virtual machine names'><sup>2<\/sup><\/a><\/span>. I.e., this would allow you to send the anonymised version to someone for discussion, but privately lookup the real host in any subsequent discussion.<\/p>\n\n\n\n<p>There&#8217;s an additional output option too, which can be handy if you&#8217;re analysing millions of savesets: it&#8217;s an option to create one output file per client. You still get the rollup data, but you&#8217;ll also get a per-client CSV file so you can deep-drill into individual client results with a better chance of avoiding Excel&#8217;s 1,000,000 row limit.<\/p>\n\n\n\n<p>Here&#8217;s an example of an analysis run against my lab environment, with anonymisation turned on:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">[Sat Oct 30 11:09:12]\n\n## \/nsr\/scripts  \n## root@orilla \n\n$ <strong>.\/dedupe-analysis.pl -q \"savetime&gt;=20 years ago\" -i -a -o PrestonLab<\/strong>\n\nProcessing saveset details\n   ...processed 0 savesets\n   ...processed 261 savesets total\n\nWriting PrestonLab-all.csv\n   ...written 0 saveset details\n   ...wrote 261 saveset details\nWritten per-saveset details to PrestonLab-all.csv\nIndividual (per-client) file results requested. Processing.\n   ...Writing noumenon.turbamentis.int data to PrestonLab\/host-00000000.csv\n   ...Writing orilla.turbamentis.int data to PrestonLab\/host-00000001.csv\n\nWriting PrestonLab-client.csv\nWritten per-client details to PrestonLab-client.csv\n\nWriting PrestonLab-type.csv\nWritten per-client\/type details to PrestonLab-type.csv\n\nWriting host anonymisation mappings for private reference - do not distribute.\n...anonymisation mappings written to PrestonLab-anonmap.csv.<\/pre>\n\n\n\n<p>So what sort of output does it generate? For some output examples, I&#8217;ve imported the generated CSV files into Excel and converted them to a table. Here&#8217;s an example of the &#8220;per-saveset&#8221; data:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2021\/10\/Lab-All.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1294\" height=\"815\" src=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2021\/10\/Lab-All.png\" alt=\"\" class=\"wp-image-10689\" srcset=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2021\/10\/Lab-All.png 1294w, https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2021\/10\/Lab-All-300x189.png 300w, https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2021\/10\/Lab-All-1024x645.png 1024w, https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2021\/10\/Lab-All-768x484.png 768w\" sizes=\"auto, (max-width: 1294px) 100vw, 1294px\" \/><\/a><figcaption>Deduplicated Details by Individual Saveset Instance<\/figcaption><\/figure>\n\n\n\n<p>When going by client rollup you&#8217;ll see content such as the following:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2021\/10\/Lab-Client.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"425\" src=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2021\/10\/Lab-Client-1024x425.png\" alt=\"\" class=\"wp-image-10690\" srcset=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2021\/10\/Lab-Client-1024x425.png 1024w, https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2021\/10\/Lab-Client-300x124.png 300w, https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2021\/10\/Lab-Client-768x319.png 768w, https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2021\/10\/Lab-Client.png 1294w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><figcaption>Deduplicated Details by Volume ID and Client<\/figcaption><\/figure>\n\n\n\n<p>And the backup type output looks like the following:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2021\/10\/Lab-Type.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1294\" height=\"537\" src=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2021\/10\/Lab-Type.png\" alt=\"\" class=\"wp-image-10691\" srcset=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2021\/10\/Lab-Type.png 1294w, https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2021\/10\/Lab-Type-300x124.png 300w, https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2021\/10\/Lab-Type-1024x425.png 1024w, https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2021\/10\/Lab-Type-768x319.png 768w\" sizes=\"auto, (max-width: 1294px) 100vw, 1294px\" \/><\/a><figcaption>Deduplicated Details by Volume ID, Client and BackupType<\/figcaption><\/figure>\n\n\n\n<p>Now, they&#8217;re just lab environments there &#8212; and while accurate, they&#8217;re hardly edifying. What I was working towards was the analysis of a production environment. With host anonymisation turned on, here&#8217;s an example of that output:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2021\/10\/CustData-By-Client-and-Type.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1134\" height=\"838\" src=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2021\/10\/CustData-By-Client-and-Type.png\" alt=\"\" class=\"wp-image-10692\" srcset=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2021\/10\/CustData-By-Client-and-Type.png 1134w, https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2021\/10\/CustData-By-Client-and-Type-300x222.png 300w, https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2021\/10\/CustData-By-Client-and-Type-1024x757.png 1024w, https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2021\/10\/CustData-By-Client-and-Type-768x568.png 768w\" sizes=\"auto, (max-width: 1134px) 100vw, 1134px\" \/><\/a><figcaption>Deduplication data by Volume ID, Client and Backup Type from a Production Environment<\/figcaption><\/figure>\n\n\n\n<p>The only additional thing I&#8217;ve done in the example output there is to set number formatting on the Original\/Post-Comp\/Average Reduction columns.<\/p>\n\n\n\n<p>If you&#8217;re interested in being able to run this against your own environment (or mminfo -S output, in general), here&#8217;s the script:<\/p>\n\n\n\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-\">#!\/usr\/bin\/perl -w\n\n###########################################################################\n# Modules\n###########################################################################\nuse strict;\nuse File::Basename;\nuse Getopt::Std;\nuse Sys::Hostname;\n\n\n###########################################################################\n# Subroutines\n###########################################################################\n\n# in_list($elem,@list) returns true iff $elem appears in @list.\nsub in_list {\n\treturn 0 if (@_+0 &lt; 2);\n\t\n\tmy $element = $_[0];\n\treturn 0 if (!defined($element));\n\tshift @_;\n\t\n\tmy @list = @_;\n\treturn 0 if (!@_ || @_+0 == 0);\n\t\n\tmy $foundCount = 0;\n\tmy $e = quotemeta($element);\n\tforeach my $item (@list) {\n\t\tmy $i = quotemeta($item);\n\t\t$foundCount++ if ($e eq $i);\n\t}\n\t\n\treturn $foundCount;\n}\n\n# show_help([@messages]) shows help, any additional messages, then exits.\nsub show_help {\n\tmy $self = basename($0);\n\tprint &lt;&lt;EOF;\nUsage:\n\n$self [-h|-?] [-d] [-a] {-q query | -f file} -o file\n\nWhere:\n\n-h | -?     Prints this help and exits.\n-d          Enables debug mode for additional output.\n-q query    Run nominated mminfo 'query' and analyse against results.\n-f file     Run analysis against a file containing mminfo results.\n-o file     File to write results data to. Do NOT include a file extension.\n-a          Anonymise hostnames in output data.\n-i          Write the 'all' savesets data as an individual CSV per client.\n\nAnalyses Data Domain deduplication statistics held in mminfo -S output\nto build information about deduplication results on a per client and\nworkload type-basis.\n\nEOF\n\n\tif (@_+0 > 0) {\n\t\tmy @messages = @_;\n\t\tforeach my $message (@messages) {\n\t\t\tmy $tmp = $message;\n\t\t\tchomp $tmp;\n\t\t\tprint \"$tmp\\n\";\n\t\t}\n\t}\n\tdie \"\\n\";\n}\n\n# get_backup_type($savesetname) returns the guessed backup type based on the saveset name.\nsub get_backup_type {\n\treturn \"Other\" if (@_+0 != 1);\n\t\n\tmy $saveset = $_[0];\n\tmy $backupType = \"\";\n\t\n\tif ($saveset =~ \/^RMAN\/) {\n\t\t$backupType = \"Oracle\";\n\t} elsif ($saveset =~ \/^backint\/) {\n\t\t$backupType = \"SAP\";\n\t} elsif ($saveset =~ \/^SAPHANA\/) {\n\t\t$backupType = \"SAP HANA\";\n\t} elsif ($saveset =~ \/^VM\/) {\n\t\t$backupType = \"Virtual Machine\";\n\t} elsif ($saveset =~ \/^MSSQL\/) {\n\t\t$backupType = \"MSSQL\";\n\t} elsif ($saveset =~ \/^\\\/\/ ||\n\t\t\t $saveset =~ \/^\\&lt;\\d+\\>\\\/\/) {\n\t\t$backupType = \"Unix Filesystem\";\n\t} elsif ($saveset =~ \/^APPLICATIONS.*Exchange\/) {\n\t\t$backupType = \"Exchange\";\n\t} elsif ($saveset =~ \/^[A-Z]\\:\/ || \n\t\t\t $saveset =~ \/SYSTEM\/ ||\n\t\t\t $saveset =~ \/DISASTER\/ ||\n\t\t\t $saveset =~ \/WINDOWS ROLES\/ ||\n\t\t\t $saveset =~ \/\\\\VOLUME\\{\/ ||\n\t\t\t $saveset =~ \/\\\\\\?\\\\GLOBALROOT\/) { \t# Might need more test cases here\n\t\t$backupType = \"Windows Filesystem\";\n\t} elsif ($saveset =~ \/^index\/ || $saveset =~ \/^bootstrap\/ ) {\n\t\t$backupType = \"NetWorker\";\n\t} else {\n\t\t$backupType = \"Other\";\n\t}\n\treturn $backupType;\n}\n\n\n###########################################################################\n# Globals &amp; main.\n###########################################################################\nmy %opts = ();\nmy $version = \"1.1\";\nmy $method = \"query\";\t# Default method is to seek a query from the command line.\nmy $query = \"\";\t\t\t\nmy $file = \"\";\t\t\t\nmy $debug = 0;\t\t\t# Change to 1 for lots of messy debug output.\nmy $outFile = \"\";\nmy $baseOut = \"\";\nmy $byClientOut = \"\";\nmy $byTypeOut = \"\";\nmy $anonOut = \"\";\nmy $anonHosts = 0;\nmy %hostMap = ();\t# Used for mapping hostnames to anonymised names.\nmy %vmMap = ();\t\t# Used for mapping VM names to anonymised names.\nmy $hostCount = 0;\t# Used for iterating anonymised hostnames.\nmy $vmCount = 0;\t# Used for iterating anonymised VM names.\nmy $individualFile = 0;\nmy %clientElems = ();\t# If we're doing individual file out, use this to speedily iterate through gathered data.\n\n\n# Capture command line arguments.\nif (getopts('h?vq:f:o:adi',\\%opts)) {\n\tshow_help() if (defined($opts{'h'}) || defined($opts{'?'}));\n\tshow_help(\"This release: v$version\") if (defined($opts{'v'}));\n\t\n\tif (defined($opts{'q'})) {\n\t\t$query = $opts{'q'};\n\t} elsif (defined($opts{'f'})) {\n\t\t$method = \"file\";\n\t\t$file = $opts{'f'};\n\t\tif (! -f $file) {\n\t\t\tshow_help(\"Nominated file, '$file' does not exist or cannot be accessed.\");\n\t\t}\n\t} else {\n\t\tshow_help(\"You must specify a -q 'query' or -f 'file' option.\")\n\t}\n\n\t$anonHosts = 1 if (defined($opts{'a'}));\n\t$individualFile = 1 if (defined($opts{'i'}));\n\t$debug = 1 if (defined($opts{'d'}));\n\t\n\tif (defined($opts{'o'})) {\n\t\t$outFile = $opts{'o'};\n\t\tif ($outFile =~ \/\\.\/) {\n\t\t\tshow_help(\"Please don't give a file extension for the output base filename.\");\n\t\t}\n\t\t$baseOut = $outFile . \"-all.csv\";\n\t\t$byClientOut = $outFile . \"-client.csv\";\n\t\t$byTypeOut = $outFile . \"-type.csv\";\n\t\t$anonOut = $outFile . \"-anonmap.csv\";\n\t\t\n\t\tif (-f $outFile || -f $baseOut || -f $byClientOut || -f $byTypeOut || -f $anonOut) {\n\t\t\tshow_help(\"Output file exists. Pick another name.\\nSelected output files are:\\n\" . join(\"\\n\",($baseOut,$byClientOut,$byTypeOut,$anonOut)));\n\t\t}\n\t\tif ($individualFile &amp;&amp; -d $outFile) {\n\t\t\tshow_help(\"Individual\/per-client output specified but directory $outFile already exists.\");\n\t\t}\n\t} else {\n\t\tshow_help(\"You have not specified an output file.\");\n\t}\n}\n\nmy %dataSets = ();\nmy @rawData = ();\nmy $ssidCount = 0;\nmy $ddSavesets = 0;\nmy %volumeIDsbyCloneID = ();\nif ($method eq \"query\") {\n\tmy $queryCommand = \"mminfo -q \\'$query\\' -S\";\n\tif (open(MMI,\"$queryCommand 2>&amp;1 |\")) {\n\t\twhile (&lt;MMI>) {\n\t\t\tmy $line = $_;\n\t\t\tchomp $line;\n\t\t\tpush(@rawData,$line);\n\t\t\t\n\t\t\tif ($line =~ \/ss data domain dedup statistics\/) {\n\t\t\t\t$ddSavesets++;\n\t\t\t}\n\t\t\t\n\t\t\tif ($line =~ \/^ssid=\\d+\/) {\n\t\t\t\t$ssidCount++;\n\t\t\t\tif ($ssidCount % 100000 == 0) {\n\t\t\t\t\tprint \"...read $ssidCount saveset details ($ddSavesets on Data Domain)\\n\";\n\t\t\t\t}\n\t\t\t}\n\t\t}\n\t}\n\tclose(MMI);\n} elsif ($method eq \"file\") {\n\tif (open(FILE,$file)) {\n\t\twhile (&lt;FILE>) {\n\t\t\tmy $line = $_;\n\t\t\tchomp $line;\n\t\t\tpush(@rawData,$line);\n\t\t\t\n\t\t\tif ($line =~ \/ss data domain dedup statistics\/) {\n\t\t\t\t$ddSavesets++;\n\t\t\t}\n\t\t\t\n\t\t\tif ($line =~ \/^ssid=\\d+\/) {\n\t\t\t\t$ssidCount++;\n\t\t\t\tif ($ssidCount % 100000 == 0) {\n\t\t\t\t\tprint \"...read $ssidCount saveset details ($ddSavesets on Data Domain)\\n\";\n\t\t\t\t}\n\t\t\t}\n\t\t\t\n\t\t}\n\t}\n\tclose(FILE);\n}\n\nif ($ssidCount == 0) {\n\tdie (\"Did not find any savesets in an expected format in the $method.\\n\");\n}\nif ($ddSavesets == 0) {\n\tdie (\"Did not find any savesets on a Data Domain in the $method.\\n\");\n}\n\n# Now step through and discard all savesets that aren't on Data Domain devices.\nmy $count = 0;\nmy @dataSeg = ();\nmy $foundADDBoostSS = 0;\nfor (my $index = 0; $index &lt; (@rawData+0); $index++) {\n\tmy $line = $rawData[$index];\n\t\n\tif ($line =~ \/^ssid\/) {\n\t\tif ($index != 0) {\n\t\t\t# If we're at index 0 we're at the start of the file. We'll see\n\t\t\t# a line starting with 'ssid' but we won't have a prior record\n\t\t\t# to add to our dataSets.\n\t\t\tif ($foundADDBoostSS) {\n\t\t\t\t$debug &amp;&amp; print \"DEBUG: Boost saveset processing at index $index\\n\";\n\n\t\t\t\t$dataSets{$count} = join(\"\\n\",@dataSeg);\n\t\t\t\t@dataSeg = ();\n\t\t\t\t\n\t\t\t\t$foundADDBoostSS = 0;\n\t\t\t\t$count++;\n\t\t\t\t$debug &amp;&amp; print \"DEBUG: Incremented count to $count\\n\";\n\t\t\t}\n\t\t}\n\t\t@dataSeg = ($line);\n\t} else {\n\t\tpush (@dataSeg,$line);\n\t}\n\t\n\tif ($line =~ \/ss data domain dedup statistics\/) {\n\t\t$foundADDBoostSS = 1;\n\t}\n}\n# Explicitly free up the initially read data. This is useful if you have >1M savesets\n# in reducing the runtime footprint. (E.g., on sample of 1.7M savesets it uses 1\/3 less\n# memory over runtime.)\n$debug &amp;&amp; print \"DEBUG: Deallocating raw data.\\n\";\nundef @rawData;\t\n\nmy %procData = ();\nmy $datum = 0;\n\nmy $procCount = 0;\nprint \"\\n\\nProcessing saveset details\\n\";\nforeach my $key (sort {$a &lt;=> $b} keys %dataSets) {\n\tmy $block = $dataSets{$key};\n\tmy @block = split(\/\\n\/,$block);\n\t\n\tmy $ssid = \"\";\n\tmy $savetime = \"\";\n\tmy $nsavetime = \"\";\n\tmy $client = \"\";\n\tmy $saveset = \"\";\n\tmy $level = \"\";\n\tmy $ssflags = \"\";\n\tmy $totalsize = 0;\n\tmy $retention = \"\";\n\tmy $dedupeStats = \"\";\n\tmy $vmname = \"\";\n\t\n\tif ($procCount % 100000 == 0) {\n\t\tprint \"   ...processed $procCount savesets\\n\";\n\t}\n\t$procCount++;\n\t\n\tmy $vm = 0;\n\tmy $blockIndex = 0;\n\tforeach my $item (@block) {\n\t\tif ($item =~ \/^ssid=(\\d+) savetime=(.*) \\((\\d+)\\) ([^:]*):(.*)\/) {\n\t\t\n\t\t\t$ssid = $1;\n\t\t\t$savetime = $2;\n\t\t\t$nsavetime = $3;\n\t\t\t$client = $4;\n\t\t\t$saveset = $5;\n\t\t\t\n\t\t\t$procData{$datum}{ssid} = $ssid;\n\t\t\t$procData{$datum}{savetime} = $savetime;\n\t\t\t$procData{$datum}{client} = $client;\n\t\t\t$procData{$datum}{nsavetime} = $nsavetime;\n\t\t\t$procData{$datum}{saveset} = $saveset;\n\t\t\t\n\t\t\tif ($anonHosts) {\n\t\t\t\tif (%hostMap &amp;&amp; defined($hostMap{$client})) {\n\t\t\t\t\t# Nothing to do here.\n\t\t\t\t} else {\n\t\t\t\t\t$hostMap{$client} = sprintf(\"host-%08d\",$hostCount);\n\t\t\t\t\t$hostCount++;\n\t\t\t\t}\n\t\t\t}\n\t\t\t\n\t\t\t$debug &amp;&amp; print \"\\nDEBUG: START $ssid ($savetime - $nsavetime): $client|$saveset\\n\";\n\t\t}\n\t\t\n\t\tif ($item =~ \/^\\s*Clone \\#\\d+: cloneid=(\\d+)\/) {\n\t\t\tmy $cloneID = $1;\n\t\t\tmy $volID = 0;\n\t\t\t# Here we assume any DD saveset only has a single frag to keep\n\t\t\t# things simple. This may break if someone clones something out\n\t\t\t# to tape.\n\t\t\tmy $nextLine = $block[$blockIndex+1];\n\t\t\tif (!defined($nextLine)) {\n\t\t\t\t# Orphaned saveset\/clone instance without a volume. Set VolID to Zero.\n\t\t\t\t$volID = 0;\n\t\t\t} else {\n\t\t\t\tif ($nextLine =~ \/.*volid=\\s*(\\d+) \/) {\n\t\t\t\t\t$volID = $1;\n\t\t\t\t} else {\n\t\t\t\t\t# Something odd going on here. Set VolID to Zero.\n\t\t\t\t\t$volID = 0;\n\t\t\t\t}\n\t\t\t}\n\t\t\t$volumeIDsbyCloneID{$ssid}{$cloneID} = $volID;\n\t\t}\n\t\t\n\t\tif ($item =~ \/vcenter_hostname\/) {\n\t\t\t$vm = 1;\n\t\t}\n\t\tif ($vm == 1 &amp;&amp; $item =~ \/^\\s+\\\\\"name\\\\\": \\\\\"(.*)\\\\\"\/) {\n\t\t\t$vmname = $1;\n\t\t\tif ($anonHosts) {\n\t\t\t\tif (%vmMap &amp;&amp; defined($vmMap{$vmname})) {\n\t\t\t\t\t$procData{$datum}{vmname} = $vmMap{$vmname};\n\t\t\t\t} else {\n\t\t\t\t\t$vmMap{$vmname} = sprintf(\"vm-%09d\",$vmCount);\n\t\t\t\t\t$vmCount++;\n\t\t\t\t\t$procData{$datum}{vmname} = $vmMap{$vmname};\n\t\t\t\t}\n\t\t\t} else {\n\t\t\t\t$procData{$datum}{vmname} = $vmname;\n\t\t\t}\n\t\n\t\t\t$vm = 0;\n\t\t}\n\t\t\n\t\tif ($item =~ \/^\\s+level=([^\\s]*)\\s+sflags=([^\\s]*)\\s+size=(\\d+).*\/) {\n\t\t\t$level = $1;\n\t\t\t$ssflags = $2;\n\t\t\t$totalsize = $3;\n\t\t\t\n\t\t\t$procData{$datum}{level} = $level;\n\t\t\t$procData{$datum}{ssflags} = $ssflags;\n\t\t\t$procData{$datum}{totalsize} = $totalsize;\n\t\t\t\n\t\t\t$debug &amp;&amp; print \"DEBUG: ----> Level = $level, ssflags = $ssflags, totalsize = $totalsize\\n\";\n\t\t}\n\t\t\n\t\tif ($item =~ \/create=.* complete=.* browse=.* retent=(.*)$\/) {\n\t\t\t$retention = $1;\n\t\t\t\n\t\t\t$procData{$datum}{retention} = $retention;\n\t\t\t\n\t\t\t$debug &amp;&amp; print \"DEBUG: ----> Retention = $retention\\n\";\n\t\t}\n\t\t\n\t\tif ($item =~ \/^\\*ss data domain dedup statistics: \"(.*)\"(\\,|;)\\s*$\/) {\n\t\t\t$dedupeStats = $1;\n\t\t\tmy $eol = $2;\n\t\t\tif ($eol eq \",\") {\n\t\t\t\tmy $stop = 0;\n\t\t\t\tmy $lookahead = $blockIndex + 1;\n\t\t\t\twhile (!$stop) {\n\t\t\t\t\tmy $tmpLine = $block[$lookahead];\n\t\t\t\t\tif ($tmpLine =~ \/^\\s*\"(.*)\"(\\,|;)\\s*$\/) {\n\t\t\t\t\t\tmy $tmpstat = $1;\n\t\t\t\t\t\tmy $tmpset = $2;\n\t\t\t\t\t\t$dedupeStats .= \"\\n\" . $tmpstat;\n\t\t\t\t\t\t$stop = 1 if ($tmpset eq \";\");\n\t\t\t\t\t}\n\t\t\t\t\t$lookahead++;\n\t\t\t\t}\n\t\t\t}\n\t\t\t\n\t\t\t$procData{$datum}{dedupe_stats} = $dedupeStats;\n\t\t\t\n\t\t\t$debug &amp;&amp; print \"DEBUG: ----> Dedupe stats: \" . join(\"|||\",split(\/\\n\/,$dedupeStats)) . \"\\n\";\n\t\t} elsif ($item =~ \/^\\*ss data domain dedup statistics: \\\\\\s*$\/) {\n\t\t\tmy $stop = 0;\n\t\t\tmy $lookahead = $blockIndex + 1;\n\t\t\twhile (!$stop) {\n\t\t\t\tmy $tmpLine = $block[$lookahead];\n\t\t\t\tif ($tmpLine =~ \/^\\s*\"(.*)\"(\\,|;)\\s*$\/) {\n\t\t\t\t\tmy $tmpstat = $1;\n\t\t\t\t\tmy $tmpset = $2;\n\t\t\t\t\t$dedupeStats .= \"\\n\" . $tmpstat;\n\t\t\t\t\t$stop = 1 if ($tmpset eq \";\");\n\t\t\t\t}\n\t\t\t\t$lookahead++;\n\t\t\t}\n\n\t\t\t$dedupeStats =~ s\/^\\n(.*)\/$1\/s;\t\t\t\n\t\t\t$procData{$datum}{dedupe_stats} = $dedupeStats;\n\t\t\t\n\t\t\t$debug &amp;&amp; print \"DEBUG: ----> Dedupe stats: \" . join(\"|||\",split(\/\\n\/,$dedupeStats)) . \"\\n\";\t\t\n\t\t}\n\t\t\n\t\t$blockIndex++;\n\t}\n\t$datum++;\n}\nprint (\"   ...processed $procCount savesets total\\n\");\n\n# Now dump $dataSets to save memory. This doesn't save as much\n# as the previous @rawData drop but retaining in case it's useful\n# for someone on a low-memory system.\n$debug &amp;&amp; print \"DEBUG: Deallocating pre-parsed datasets.\\n\";\nundef %dataSets;\n\n# Now do a quick post-process through the clients if we have host anonymisation turned on\n# to catch any index savesets and adjust them. It should be safe to do so here.\nif ($anonHosts) {\n\tforeach my $elem (keys %procData) {\n\t\tif ($procData{$elem}{saveset} =~ \/^index:(.*)\/) {\n\t\t\tmy $indexHostname = $1;\n\t\t\tif (%hostMap &amp;&amp; defined($hostMap{$indexHostname})) {\n\t\t\t\t$procData{$elem}{saveset} = \"index:\" . $hostMap{$indexHostname};\n\t\t\t} else {\n\t\t\t\t$hostMap{$indexHostname} = sprintf(\"host-%08d\",$hostCount);\n\t\t\t\t$hostCount++;\n\t\t\t\t$procData{$elem}{saveset} = \"index:\" . $hostMap{$indexHostname};\n\t\t\t}\n\t\t}\n\t}\n}\n\nmy %clientData = ();\n\n# As we write out the base file, assemble the client rollup data.\nmy @clients = ();\nprint (\"\\n\\nWriting $baseOut\\n\");\nmy $countOut = 0;\nif (open(OUTP,\">$baseOut\")) {\n\tprint OUTP (\"SSID,CloneID,VolumeID,Client,Savetime,Level,Name,OriginalMB,PreLCompMB,PostLCompMB,Reduction(:1)\\n\");\n\tforeach my $elem (keys %procData) {\n\t\tmy $dedupeStats = $procData{$elem}{dedupe_stats};\n\t\tmy @dedupeStats = split(\"\\n\",$dedupeStats);\n\t\tif ($countOut % 100000 == 0) {\n\t\t\tprint \"   ...written $countOut saveset details\\n\";\n\t\t}\n\t\t$countOut++;\n\t\n\t\tforeach my $stat (@dedupeStats) {\n\t\t\tif ($stat =~ \/^v1:(\\d+):(\\d+):(\\d+):(\\d+)\/) {\n\t\t\t\tmy $clID = $1;\n\t\t\t\tmy $original = $2;\n\t\t\t\tmy $precomp = $3;\n\t\t\t\tmy $postcomp = $4;\n\t\t\t\n\t\t\t\tmy $client = $procData{$elem}{client};\n\t\t\t\tmy $saveset = $procData{$elem}{saveset};\n\t\t\t\tmy $savetime = $procData{$elem}{savetime};\n\t\t\t\tmy $ssid = $procData{$elem}{ssid};\n\t\t\t\tmy $level = $procData{$elem}{level};\n\t\t\t\tmy $vmName = (defined($procData{$elem}{vmname})) ? $procData{$elem}{vmname} : \"\";\n\t\t\t\tmy $volumeID = $volumeIDsbyCloneID{$ssid}{$clID};\n\t\t\t\t\n\t\t\t\t# Only build up the client&lt;>element mappings if we have to output\n\t\t\t\t# individual clients.\n\t\t\t\tif ($individualFile) {\n\t\t\t\t\tif (in_list($client,@clients)) {\n\t\t\t\t\t\tpush(@{$clientElems{$client}},$elem);\n\t\t\t\t\t} else {\n\t\t\t\t\t\tpush (@clients,$client);\n\t\t\t\t\t\t@{$clientElems{$client}} = ($elem);\n\t\t\t\t\t}\n\t\t\t\t}\n\t\t\t\t\n\t\t\t\tif ($vmName ne \"\") {\n\t\t\t\t\t$saveset = \"VM:$vmName\";\n\t\t\t\t}\n\t\t\t\t$original = $original \/ 1024 \/ 1024;\t#MB\n\t\t\t\t$precomp = $precomp \/ 1024 \/ 1024;\t\t#MB\n\t\t\t\t$postcomp = $postcomp \/ 1024 \/ 1024;\t#MB\n\t\t\t\tmy $reduction = $original \/ $postcomp;\t#:1\n\t\t\t\t\n\t\t\t\tmy $backupType = get_backup_type($saveset);\n\t\t\t\t\n\t\t\t\t$procData{$elem}{backuptype} = $backupType;\t# Store this in case we need it for individuals.\n\t\t\t\t\n\t\t\t\tif (%clientData &amp;&amp;\n\t\t\t\t\tdefined($clientData{by_client}) &amp;&amp;\n\t\t\t\t\tdefined($clientData{by_client}{$volumeID}) &amp;&amp;\n\t\t\t\t\tdefined($clientData{by_client}{$volumeID}{$client})) {\n\t\t\t\t\t$clientData{by_client}{$volumeID}{$client}{original} += $original \/ 1024;\n\t\t\t\t\t$clientData{by_client}{$volumeID}{$client}{postcomp} += $postcomp \/ 1024;\n\t\t\t\t\t$clientData{by_client}{$volumeID}{$client}{count}++;\n\t\t\t\t} else {\n\t\t\t\t\t$clientData{by_client}{$volumeID}{$client}{original} = $original \/ 1024; \t# Store in GB\n\t\t\t\t\t$clientData{by_client}{$volumeID}{$client}{postcomp} = $postcomp \/ 1024;\t# Store in GB\t\t\t\n\t\t\t\t\t$clientData{by_client}{$volumeID}{$client}{count} = 1;\n\t\t\t\t}\n\t\t\t\t\n\t\t\t\tif (%clientData &amp;&amp;\n\t\t\t\t\tdefined($clientData{by_type}) &amp;&amp;\n\t\t\t\t\tdefined($clientData{by_type}{$volumeID}) &amp;&amp;\n\t\t\t\t\tdefined($clientData{by_type}{$volumeID}{$client}) &amp;&amp;\n\t\t\t\t\tdefined($clientData{by_type}{$volumeID}{$client}{$backupType})) {\n\t\t\t\t\t\n\t\t\t\t\t$clientData{by_type}{$volumeID}{$client}{$backupType}{original} += $original \/ 1024;\t# Store in GB\n\t\t\t\t\t$clientData{by_type}{$volumeID}{$client}{$backupType}{postcomp} += $postcomp \/ 1024;\t# Store in GB\n\t\t\t\t\t$clientData{by_type}{$volumeID}{$client}{$backupType}{count}++;\n\t\t\t\t} else {\n\t\t\t\t\t$clientData{by_type}{$volumeID}{$client}{$backupType}{original} = $original \/ 1024;\t# Store in GB\n\t\t\t\t\t$clientData{by_type}{$volumeID}{$client}{$backupType}{postcomp} = $postcomp \/ 1024;\t# Store in GB\n\t\t\t\t\t$clientData{by_type}{$volumeID}{$client}{$backupType}{count} = 1;\n\t\t\t\t}\n\t\t\t\t\n\t\t\t\tmy $finalClientName = ($anonHosts == 1) ? $hostMap{$client} : $client;\n\t\t\t\t\n\t\t\t\tprint OUTP (\"$ssid,$clID,$volumeID,$finalClientName,$savetime,$level,$saveset,$original,$precomp,$postcomp,$reduction\\n\");\n\t\t\t\n\t\t\t\tif ($debug) {\t\t\t\n\t\t\t\t\tprint (\"DEBUG: \" . $client . \",\" . $saveset . \" (ClID $clID)\\n\");\n\n\t\t\t\t\tprintf (\"DEBUG: Original Size: %.2f MB\\n\",$original);\n\t\t\t\t\tprintf (\"DEBUG:     Pre-LComp: %0.2f MB\\n\",$precomp);\n\t\t\t\t\tprintf (\"DEBUG:    Post-LComp: %0.2f MB\\n\",$postcomp);\n\t\t\t\n\t\t\t\t\tprintf (\"Debug:     Reduction: %0.2f:1\\n\",$reduction);\n\t\t\t\t\tprint \"DEBUG: \\n\";\n\t\t\t\t}\n\t\t\t}\n\t\t}\n\t}\n}\nclose(OUTP);\nprint \"   ...wrote $countOut saveset details\\n\";\nprint \"Written per-saveset details to $baseOut\\n\";\n\n# If we need to output per-client details, do that now.\nif ($individualFile) {\n\tprint \"Individual (per-client) file results requested. Processing.\\n\";\n\tmkdir(\"$outFile\");\n\tif (-d $outFile) {\n\t\tforeach my $client (sort {$a cmp $b} @clients) {\n\t\t\tmy $filename = $client;\n\t\t\t# Override here.\n\t\t\tif ($anonHosts) {\n\t\t\t\t$filename = $hostMap{$client};\n\t\t\t}\n\t\t\t$filename =~ s\/\\.\/_\/g;\n\t\t\t$filename = $outFile . \"\/\" . $filename . \".csv\";\n\t\t\tif (open(OUTP,\">$filename\")) {\n\t\t\t\tprint \"   ...Writing $client data to $filename\\n\";\n\t\t\t\tprint OUTP (\"SSID,CloneID,VolumeID,Client,VMName,Savetime,Level,Name,OriginalMB,PreLCompMB,PostLCompMB,Reduction(:1)\\n\");\n\t\t\t\tmy @elemList = @{$clientElems{$client}};\n\t\t\t\tforeach my $elem (@elemList) {\n\t\t\t\t\tnext if ($procData{$elem}{client} ne $client);\n\t\t\t\t\n\t\t\t\t\t# Else...\n\t\t\t\t\tmy $dedupeStats = $procData{$elem}{dedupe_stats};\n\t\t\t\t\tmy @dedupeStats = split(\"\\n\",$dedupeStats);\n\t\t\t\t\tforeach my $stat (@dedupeStats) {\n\t\t\t\t\t\tif ($stat =~ \/^v1:(\\d+):(\\d+):(\\d+):(\\d+)\/) {\n\t\t\t\t\t\t\tmy $clID = $1;\n\t\t\t\t\t\t\tmy $original = $2;\n\t\t\t\t\t\t\tmy $precomp = $3;\n\t\t\t\t\t\t\tmy $postcomp = $4;\n\t\t\n\t\t\t\t\t\t\tmy $saveset = $procData{$elem}{saveset};\n\t\t\t\t\t\t\tmy $savetime = $procData{$elem}{savetime};\n\t\t\t\t\t\t\tmy $ssid = $procData{$elem}{ssid};\n\t\t\t\t\t\t\tmy $level = $procData{$elem}{level};\n\t\t\t\t\t\t\tmy $vmName = (defined($procData{$elem}{vmname})) ? $procData{$elem}{vmname} : \"\";\n\t\t\t\t\t\t\tmy $volumeID = $volumeIDsbyCloneID{$ssid}{$clID};\n\t\t\t\t\n\t\t\t\t\t\t\tpush (@clients,$client) if (!in_list($client,@clients));\n\t\t\t\t\n\t\t\t\t\t\t\tif ($vmName ne \"\") {\n\t\t\t\t\t\t\t\t$saveset = \"VM:$vmName\";\n\t\t\t\t\t\t\t}\n\t\t\t\t\t\t\t$original = $original \/ 1024 \/ 1024;\t#MB\n\t\t\t\t\t\t\t$precomp = $precomp \/ 1024 \/ 1024;\t\t#MB\n\t\t\t\t\t\t\t$postcomp = $postcomp \/ 1024 \/ 1024;\t#MB\n\t\t\t\t\t\t\tmy $reduction = $original \/ $postcomp;\t#:1\n\t\t\t\t\t\t\tmy $backupType = $procData{$elem}{backuptype};\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\tmy $finalClient = ($anonHosts == 1) ? $hostMap{$client} : $client;\n\t\t\t\t\t\t\tmy $finalSaveset = \"\";\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\tif ($anonHosts) {\n\t\t\t\t\t\t\t\tif ($vmName ne \"\") {\n\t\t\t\t\t\t\t\t\t$finalSaveset = $vmName;\n\t\t\t\t\t\t\t\t} else {\n\t\t\t\t\t\t\t\t\t$finalSaveset = $saveset;\n\t\t\t\t\t\t\t\t}\n\t\t\t\t\t\t\t} else {\n\t\t\t\t\t\t\t\t$finalSaveset = $saveset;\n\t\t\t\t\t\t\t}\n\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\tprint OUTP (\"$ssid,$clID,$volumeID,$finalClient,$vmName,$savetime,$level,$finalSaveset,$original,$precomp,$postcomp,$reduction\\n\");\n\t\t\t\t\t\t}\n\t\t\t\t\t}\n\t\t\t\t}\n\t\t\t}\n\t\t\tclose(OUTP);\n\t\t}\n\t} else {\n\t\tdie \"Unable to create directory $outFile\\n\";\n\t}\n}\n\n# Write file: Summary by VolumeID and Client.\nprint(\"\\n\\nWriting $byClientOut\\n\");\nif (open(OUTP,\">$byClientOut\")) {\n\tprint OUTP (\"VolumeID,Client,Total Original (GB),Total Post-Comp (GB),Average Reduction\\n\");\n\tforeach my $volumeID (keys %{$clientData{by_client}}) {\n\t\tforeach my $client (sort {$a cmp $b} keys %{$clientData{by_client}{$volumeID}}) {\n\t\t\tmy $finalClientName = ($anonHosts == 1) ? $hostMap{$client} : $client;\n\t\t\tprintf OUTP (\"%s,%s,%.8f,%.8f,%.8f\\n\",\n\t\t\t\t\t\t\t$volumeID,\n\t\t\t\t\t\t\t$finalClientName,\n\t\t\t\t\t\t\t$clientData{by_client}{$volumeID}{$client}{original},\n\t\t\t\t\t\t\t$clientData{by_client}{$volumeID}{$client}{postcomp},\n\t\t\t\t\t\t\t$clientData{by_client}{$volumeID}{$client}{original} \/ $clientData{by_client}{$volumeID}{$client}{postcomp});\n\t\t}\n\t}\n}\nclose(OUTP);\nprint \"Written per-client details to $byClientOut\\n\";\n\n# Write file: Summary by VolumeID, Client and BackupType.\nprint (\"\\n\\nWriting $byTypeOut\\n\");\nif (open(OUTP,\">$byTypeOut\")) {\n\tprint OUTP (\"VolumeID,Client,Backup Type,Total Original (GB),Total Post-Comp (GB),Average Reduction\\n\");\n\tforeach my $volumeID (sort {$a cmp $b} keys %{$clientData{by_type}}) {\n\t\tforeach my $client (sort {$a cmp $b} keys %{$clientData{by_type}{$volumeID}}) {\n\t\t\tforeach my $type (sort {$a cmp $b} keys %{$clientData{by_type}{$volumeID}{$client}}) {\n\t\t\t\tmy $finalClientName = ($anonHosts == 1) ? $hostMap{$client} : $client;\n\t\t\t\tprintf OUTP (\"%s,%s,%s,%.8f,%.8f,%.8f\\n\",\n\t\t\t\t\t\t\t\t$volumeID,\n\t\t\t\t\t\t\t\t$finalClientName,\n\t\t\t\t\t\t\t\t$type,\n\t\t\t\t\t\t\t\t$clientData{by_type}{$volumeID}{$client}{$type}{original},\n\t\t\t\t\t\t\t\t$clientData{by_type}{$volumeID}{$client}{$type}{postcomp},\n\t\t\t\t\t\t\t\t$clientData{by_type}{$volumeID}{$client}{$type}{original} \/ $clientData{by_type}{$volumeID}{$client}{$type}{postcomp});\n\t\t\t}\n\t\t}\n\t}\n}\nclose(OUTP);\nprint \"Written per-client\/type details to $byTypeOut\\n\";\n\n# If we're anonymising hosts, write data that can be kept private to map\n# anonymised hostnames to real hostnames.\nif ($anonHosts) {\n\tprint \"\\n\\nWriting host anonymisation mappings for private reference - do not distribute.\\n\";\n\tif (open(ANONMAP,\">$anonOut\")) {\n\t\tprint ANONMAP \"Client Name,Anonymous Mapping\\n\";\n\t\tforeach my $client (sort {$a cmp $b} keys %hostMap) {\n\t\t\tprint ANONMAP \"$client,$hostMap{$client}\\n\";\n\t\t}\n\t\t\n\t\tprint ANONMAP \"\\n\";\n\t\tprint ANONMAP \"Virtual Machine,Anonymous Mapping\\n\";\n\t\tforeach my $vm (sort {$a cmp $b} keys %vmMap) {\n\t\t\tprint ANONMAP \"$vm,$vmMap{$vm}\\n\";\n\t\t}\n\t\tclose (ANONMAP);\n\t}\n\tprint \"...anonymisation mappings written to $anonOut.\\n\";\n}\n<\/code><\/pre>\n\n\n\n<p>One thing to note in the script &#8212; the breakdown of what types of backups there are was dependent on the saveset information that I had available to me. So it while it covers things like Windows and Unix filesystems, Oracle, SAP and MSSQL, it doesn&#8217;t include coverage for identifying Lotus Notes, DB2, and so on. (There&#8217;s a subroutine, get_backup_type, that interprets the backup type from the saveset name, that you&#8217;d need to modify if you wanted additional types.)<\/p>\n","protected":false},"excerpt":{"rendered":"<p>If you use NetWorker with Data Domain, you&#8217;ve probably sometimes wanted to know which of your clients have the best&hellip;<\/p>\n","protected":false},"author":1,"featured_media":10139,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[1181,16,20],"tags":[301,594],"class_list":["post-10677","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-domain-2","category-networker","category-scripting","tag-deduplication","tag-mminfo"],"aioseo_notices":[],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2021\/03\/bigStock-Command-Line.jpg","jetpack_shortlink":"https:\/\/wp.me\/pKpIN-2Md","jetpack_sharing_enabled":true,"jetpack_likes_enabled":true,"_links":{"self":[{"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/posts\/10677","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/comments?post=10677"}],"version-history":[{"count":5,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/posts\/10677\/revisions"}],"predecessor-version":[{"id":10695,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/posts\/10677\/revisions\/10695"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/media\/10139"}],"wp:attachment":[{"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/media?parent=10677"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/categories?post=10677"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/tags?post=10677"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}