Jan 242017

In 2013 I undertook the endeavour to revisit some of the topics from my first book, “Enterprise Systems Backup and Recovery: A Corporate Insurance Policy”, and expand it based on the changes that had happened in the industry since the publication of the original in 2008.

A lot had happened since that time. At the point I was writing my first book, deduplication was an emerging trend, but tape was still entrenched in the datacentre. While backup to disk was an increasingly common scenario, it was (for the most part) mainly used as a staging activity (“disk to disk to tape”), and backup to disk use was either dumb filesystems or Virtual Tape Libraries (VTL).

The Cloud, seemingly ubiquitous now, was still emerging. Many (myself included) struggled to see how the Cloud was any different from outsourcing with a bit of someone else’s hardware thrown in. Now, core tenets of Cloud computing that made it so popular (e.g., agility and scaleability) have been well and truly adopted as essential tenets of the modern datacentre, as well. Indeed, for on-premises IT to compete against Cloud, on-premises IT has increasingly focused on delivering a private-Cloud or hybrid-Cloud experience to their businesses.

When I started as a Unix System Administrator in 1996, at least in Australia, SANs were relatively new. In fact, I remember around 1998 or 1999 having a couple of sales executives from this company called EMC come in to talk about their Symmetrix arrays. At the time the datacentre I worked in was mostly DAS with a little JBOD and just the start of very, very basic SANs.

When I was writing my first book the pinnacle of storage performance was the 15,000 RPM drive, and flash memory storage was something you (primarily) used in digital cameras only, with storage capacities measured in the hundreds of megabytes more than gigabytes (or now, terabytes).

When the first book was published, x86 virtualisation was well and truly growing into the datacentre, but traditional Unix platforms were still heavily used. Their decline and fall started when Oracle acquired Sun and killed low-cost Unix, with Linux and Windows gaining the ascendency – with virtualisation a significant driving force by adding an economy of scale that couldn’t be found in the old model. (Ironically, it had been found in an older model – the mainframe. Guess what folks, mainframe won.)

When the first book was published, we were still thinking of silo-like infrastructure within IT. Networking, compute, storage, security and data protection all as seperate functions – separately administered functions. But business, having spent a decade or two hammering into IT the need for governance and process, became hamstrung by IT governance and process and needed things done faster, cheaper, more efficiently. Cloud was one approach – hyperconvergence in particular was another: switch to a more commodity, unit-based approach, using software to virtualise and automate everything.

Where are we now?

Cloud. Virtualisation. Big Data. Converged and hyperconverged systems. Automation everywhere (guess what? Unix system administrators won, too). The need to drive costs down – IT is no longer allowed to be a sunk cost for the business, but has to deliver innovation and for many businesses, profit too. Flash systems are now offering significantly more IOPs than a traditional array could – Dell EMC for instance can now drop a 5RU system into your datacentre capable of delivering 10,000,000+ IOPs. To achieve ten million IOPs on a traditional spinning-disk array you’d need … I don’t even want to think about how many disks, rack units, racks and kilowatts of power you’d need.

The old model of backup and recovery can’t cut it in the modern environment.

The old model of backup and recovery is dead. Sort of. It’s dead as a standalone topic. When we plan or think about data protection any more, we don’t have the luxury of thinking of backup and recovery alone. We need holistic data protection strategies and a whole-of-infrastructure approach to achieving data continuity.

And that, my friends, is where Data Protection: Ensuring Data Availability is born from. It’s not just backup and recovery any more. It’s not just replication and snapshots, or continuous data protection. It’s all the technology married with business awareness, data lifecycle management and the recognition that Professor Moody in Harry Potter was right, too: “constant vigilance!”

Data Protection: Ensuring Data Availability

This isn’t a book about just backup and recovery because that’s just not enough any more. You need other data protection functions deployed holistically with a business focus and an eye on data management in order to truly have an effective data protection strategy for your business.

To give you an idea of the topics I’m covering in this book, here’s the chapter list:

  1. Introduction
  2. Contextualizing Data Protection
  3. Data Lifecycle
  4. Elements of a Protection System
  5. IT Governance and Data Protection
  6. Monitoring and Reporting
  7. Business Continuity
  8. Data Discovery
  9. Continuous Availability and Replication
  10. Snapshots
  11. Backup and Recovery
  12. The Cloud
  13. Deduplication
  14. Protecting Virtual Infrastructure
  15. Big Data
  16. Data Storage Protection
  17. Tape
  18. Converged Infrastructure
  19. Data Protection Service Catalogues
  20. Holistic Data Protection Strategies
  21. Data Recovery
  22. Choosing Protection Infrastructure
  23. The Impact of Flash on Data Protection
  24. In Closing

There’s a lot there – you’ll see the first eight chapters are not about technology, and for a good reason: you must have a grasp on the other bits before you can start considering everything else, otherwise you’re just doing point-solutions, and eventually just doing point-solutions will cost you more in time, money and risk than they give you in return.

I’m pleased to say that Data Protection: Ensuring Data Availability is released next month. You can find out more and order direct from the publisher, CRC Press, or order from Amazon, too. I hope you find it enjoyable.

Sampling device performance

 NetWorker, Scripting  Comments Off on Sampling device performance
Aug 032015

Data Protection Advisor is an excellent tool for producing information about your backup environment, but not everyone has it in their environment. So if you’re needing to go back to basics to monitor device performance unattended without DPA in your environment, you need to look at nsradmin.

High Performance

Of course, if you’ve got realtime access to the NetWorker environment you can simply run nsrwatch or NMC. In either of those systems, you’ll see device performance information such as, say:

writing at 154 MB/s, 819 MB

It’s that same information that you can get by running nsradmin. At its most basic, the command will look like the following:

nsradmin> show name:; message:
nsradmin> print type: NSR device

Now, nsradmin itself isn’t intended to be a full scripting language aka bash, Perl, PowerShell or even (heaven forbid) the DOS batch processing system. So if you’re going to gather monitoring details about device performance from your NetWorker server, you’ll need to wrap your own local operating system scripting skills around the process.

You start with your nsradmin script. For easy recognition, I always name them with a .nsri extension. I saved mine at /tmp/monitor.nsri, and it looked like the following:

show name:; message:
print type: NSR device

I then created a basic bash script. Now, the thing to be aware of here is that you shouldn’t run this sort of script too regularly. While NetWorker can sustain a lot of interactions with administrators while it’s running without an issue, why add to it by polling too frequently? My general feeling is that polling every 5 minutes is more than enough to get a view of how devices are performing overnight.

If I wanted to monitor for 12 hours with a five minute pause between checks, that would be 12 checks an hour – 144 checks overall. To accomplish this, I’d use a bash script like the following:

for i in `/usr/bin/seq 1 144`
        /usr/sbin/nsradmin -i /tmp/monitor.nsri
        /bin/sleep 300
done >> /tmp/monitor.log

You’ll note from the commands above that I’m writing to a file called /tmp/monitor.log, using >> to append to the file each time.

When executed, this will produce output like the following:

Sun Aug 02 10:40:32 AEST 2015
                        name: Backup;
                     message: "reading, data ";
                        name: Clone;
                     message: "writing at 94 MB/s, 812 MB";
Sun Aug 02 10:45:32 AEST 2015
                        name: Backup;
                     message: "reading, data ";
                        name: Clone;
                     message: "writing at 22 MB/s, 411 MB";
Sun Aug 02 10:50:32 AEST 2015
                        name: Backup;
                     message: "reading, data ";
                        name: Clone;
                     message: "writing at 38 MB/s, 81 MB";
Sun Aug 02 10:55:02 AEST 2015
                        name: Clone;
                     message: "writing at 8396 KB/s, 758 MB";
                        name: Backup;
                     message: "reading, data ";

There you have it. In actual fact, this was the easy bit. The next challenge you’ll have will be to extract the data from the log file. That’s scriptable too, but I’ll leave that to you.

May 072012

How much time do your staff take to monitor backups?

The answer should be: very little.

Not because they don’t care, or you’re not tasking someone with the responsibility, but because your system should be designed such that your staff can see a “big picture” overview of all backups in a very short period of time. Assuming you do all your full backups on the weekend, your staff don’t arrive until 08.55 and spend the first 10 minutes grabbing a coffee, chatting, logging on, firing up email, browsers, etc., then if your staff can’t by 09.15 tell you what your percentage success rate for weekend backups, you’re monitoring backups wrong.

Don’t get this confused with troubleshooting. If backups encountered problems, troubleshooting may take considerably longer.

What unfortunately happens all too regularly is that monitoring and troubleshooting are seen as the same activity, or worse, they occupy the same amount of time. Nothing should be further from the truth.

nsrwatch, the most missing tool under NetWorker for Windows

 NetWorker  Comments Off on nsrwatch, the most missing tool under NetWorker for Windows
Feb 022009

I started administering NetWorker servers in 1996. At the time I was working with Solstice Backup, the Sun OEM rebadged version of NetWorker, but the product was essentially the same. I think the main difference between the two products was that a search and replace was done on the NetWorker source code replacing Legato NetWorker with Solstice Backup.

At the time, many of the NSR/SBU servers I administered were remote – really remote. I also had very low bandwidth connections to them – as low as 4KB/s that was shared with email links, etc. This meant it was necessary to be incredibly economical with administrative commands*.

As such, I learned nsradmin faster than I learned the GUI. I still feel more comfortable making most configuration changes via nsradmin rather than the GUI, though NMC is as at least occasionally tempting me to run from time to time.

I also learned the simple elegance of nsrwatch, the command line monitor for NetWorker that in a simple terminal window showed all of the following:

  1. Server summary details – number of backups, number of restores, etc.
  2. All devices, and their current activity.
  3. All currently running sessions.
  4. Current server messages.
  5. Pending alerts.

Back in the days of smaller environments, this literally gave you a complete view of everything on the NetWorker server in an 80×25 terminal window.

I was a dedicated Unix system administrator at that time and it wasn’t until I moved into consulting in 2000 that I first had to administer a NetWorker server on Windows. I was rather shocked to find nsrwatch missing on Windows.

To this day, I still find it frustrating that nsrwatch is missing on Windows. I have to say, I feel sorry for Windows NetWorker administrators (particularly in a Windows only environment) who have to run up a big GUI to show details that could be shown in such an economical amount of space.

The nsrwatch tool has also been very important when the NetWorker server is operating under load. The old Windows NetWorker GUI for instance used to hammer the NetWorker server for detail requests, and get to the point where the server and the GUI wouldn’t communicate with each other under heavy load, resulting in operators randomly rebooting backup servers in the middle of the night just because it looked like NetWorker had hung.

Even to this day, while NMC responds faster and is less interruptive to NetWorker, it still doesn’t show all those details in one easy screen. Thus, I’m still not aware of a single NetWorker administrator on Unix platforms who doesn’t still run nsrwatch, even if they also use NMC for day to day operations and administration.

It seems that these days nsrwatch seems to only get token updates to ensure it continues to work with current releases of NetWorker. It’s a shame – it needs more attention; it needs to be enhanced so that it say, supports dynamic drive sharing (only showing the active instance of a drive), and it needs to be ported to Windows.

It really, really needs to be ported to Windows.

* Nothing in those days was worse than running up the visual Veritas Volume Manager GUI. Bringing up a GUI that visually represented plexes, disks, volumes, etc., across a very low bandwidth link was about as much fun as being poked in the eye with a burnt stick. Thankfully, Volume Manager has far more economical GUIs, and better command line options these days.