With LTO-5 now just starting to go mainstream, it’s reassuring to see that the Ultrium roadmap has been expanded with another 2 generations, taking the mapping out to 8 generations in total. Linking to the roadmap image, we see:

LTO Ultrium Roadmap(Image copyright the LTO Consortium.)

LTO6 had been roadmapped a while ago, and presents slightly more than double the native capacity of LTO5 at 3.2TB. Generation 7 and 8 are currently mapped for doubling each previous generation. Interestingly there’s predictions of higher increases in tape streaming speed. One would hope these are managed carefully; it was a real relief to see LTO5 not do the conventional doubling of streaming speed, giving backup networks and infrastructure generational time to catch up.

It’s pretty clear that an investment in LTO5 today is an investment in a well roadmapped future that has been consistently delivered on thus far. Sure, the use of tape within backup is evolving – we’re going to see it moved more into the role of long-term backup storage in larger sites, and clones-only in smaller sites, but with a healthy roadmap ahead of us and LTO5 just now starting to ramp up into mainstream, tape continues to show it’ll be around for a while to come.

 

I joined Twitter about 6 months ago, having resisted it for some time. My only regret now is that I didn’t join it earlier, but suffice it to say I think it has great potential as a knowledge collective tool, and as a means of people from disparate places over the world staying in touch without the formality of email or the one-on-one nature of instant messaging. It’s not always perfect for every situation, but it’s definitely useful.

I follow four main types of people:

  • Vendor employees
  • End users
  • Other IT workers (resellers, journalists, etc.)
  • Thinkers*

Of course, there’s the odd celebrity thrown in there, but not the ones you might normally think of. As to vendor employees, and where I refer to “vendors” in the following, I’m not exclusively talking of EMC. NetApp, Compellent, Symantec, Xiotech, 3Par, EMC, Emulex, etc., all fall into the general “vendor” clump for what I’m talking about in this post.

Where was I? Ah yes, I need to bring this back to something to do with backup – or at least IT. Rest assured, that was my intent from the start. I am typing from a position of a nasty cold and food poisoning, so my mind wanders more easily than Ross Noble at the moment.

If I look at the people in technical fields that I follow, I notice something really interesting about the use of industry analysts. You know, the “respected” heavyweights such as Gartner, ICT, etc. I can sum this up with a single sentence:

The only people you can be guaranteed aren’t talking about the industry analysts are the end users.

The incredibly ironic thing about this of course is that if the analysts were doing things right, the people who would be most interested in their reports would be the end users.

In less charitable moments over the years I’ve used the term “circle jerk” to collectively define groups of industry analysts, in much the way that you might describe a “pride of lions”. What is telling about the lack of end-users talking about the industry analysts is that I’m not alone in those less-than-charitable thoughts.

Why is this the case? I think there’s three key reasons – and the final one only came to me in the last few days. These are:

  • Perceived lack of independence. Rightly or wrongly, end-users frequently don’t trust that the analysts are independent. I’d like to think better of them and suggest that this is a result of how analyst findings are used. Vendors tend to cherry-pick analyst reports, quotes and findings. This of course is entirely logical and from a business perspective, entirely reasonable. If you’ve got a message to tell customers (and the market in general), you pick the details that help tell that message. So of course, given that the average person only sees analyst data that’s been released by vendors (because individual user access to analyst reports is just so damn cost-prohibitive) it creates the appearance that the analysts are working hand-in-hand with the vendors. It’s a very big case of chicken-and-egg.
  • Facts vs fluffy statements. Most analyst statements that get out into the (tech) mainstream that aren’t vendor-released tend to be full of rather fluffy statements rather than strong fact. I’m betting that most techos don’t really have much time for astrology, and unfortunately the very fluffy reports about “By 2016 X PB of data will be produced monthly…” sort of statements are about as exciting as reading your daily horoscope.
  • The lack of why. This is the one that came to me in the last few days. I recently watching Simon Sinek’s presentation for TED, “How great leaders inspire action“, and as a consequence I’ve started to read his book, Start With Why. If there’s a core lesson in Simon’s work it seems to be this most basic premise: “People don’t buy what you do, they buy why you do it.”

When I saw another vendor tweet around another analyst report, it suddenly occurred to me that maybe the reason why end-users don’t talk about (and are mostly uninterested in) analyst reports is because they’re not seeing a why to buy into. They see what the analysts do, but they just don’t see the why. Without the why, there’s no trust, there’s no reason to engage with.

The only ones that can fix that issue are the analysts themselves. This though leads to the next question: who are the real customers of the analysts? End-users (i.e., IT generally), or vendors?


* “Thinkers” is a broad category of basically people who I find interesting, and they range from @DerrenBrown to @StephenFry to @Zephoria to @TheUngayGuy with a whole bunch of coverage in-between. (You might say that they’re probably most of all the people who I’d be extremely chuffed about being able to sit down for a coffee and a lengthy chat with.)

 

For the release of both Mac OS X Tiger (10.4 – 2005) and Max OS X Leopard (10.5 – 2007), Apple had various mocking campaigns and posters for the preceding conferences with slogans along the lines of:

Redmond, start your photocopiers

This was a very public and very open jibe from Apple regarding Microsoft’s reputation for simply copying features from Mac OS X. Now, I don’t want to really get into the “you’re a fanboy – no, you’re a fanboy!” style argument, but I do want to suggest that given the recent debacle that’s started to surface over the abysmal performance of the Windows 7 backup process, Microsoft appears to be cutting their noses off to spite their faces.

Back on 6 March 2009, I covered just how amazing Time Machine was as an OS-integrated backup product. I never said it was something that would replace enterprise products like NetWorker, but I did say:

This, quite honestly, is the epitome of simplicity. Going beyond standard backup and recovery operations, Time Machine is also an excellent disaster recovery tool – if you have serious enough issues that you need to rebuild your machine, the Mac OS X installer actually has the option of doing a rebuild and recovery from Time Machine backups.

To be blunt – as a backup utility for end users, Time Machine is an ace in the hole, and one of the most underrated features of Mac OS X.

Sure, Time Machine doesn’t do everything that every user wants it to do – but then again, no product ever will. Yet I’ve backed up a significant number of TB (as far as desktops go) using Time Machine, and recently I was highly pleased to be able to recover 18 months of my fathers’ hard work with no effort at all. This was from a machine where I’d setup Time Machine and had not had a chance to visit since – nor check remotely, since my parents don’t use the internet.

So frankly, on behalf of Windows users, I’m somewhat horrified at the experiences being felt with Microsoft’s Windows 7 backup utility – and their use case scenarios!

As documented over at The Register, “Windows 7 Backup Gets Users’ Backs Up”, there’s a litany of issues being reported:

Jon Hell posted on April 23 that he is backing up 900GB of data on a quad core PC with 7GB of RAM; “After twenty four hours Windows Backup had managed to complete 18 per cent of the backup, but after forty eight hours, it had got even slower, and had only reached 23 per cent of the full backup.”

And:

John Dougrez-Lewis was the first poster, and wrote that he could use file copy to move 250GB of file data to an external eSATA drive in an hour at a speed of 72MB/sec. When he did the same job using Windows 7 RTM Backup it took 14 hours, roughly 5MB/sec – more than 14 times slower.

If these were isolated experiences it could be understood – after all, no product will work perfectly for every single person.

The actual Microsoft forum regarding the issues is directly available via this link. We also see an article from Microsoft, Backing up large data set on Windows 7:

Windows Backup is optimized to help home users protect their important data on their PCs and this is typically expected to be 200GB of data on average. On a PC that contains significantly larger data size, Windows Backup’s performance may degrade. If you need to back up more than 400GB of data, we recommend that you backup your PC using a system image.

Sorry to say, but this “meh” attitude towards backup turns my stomach. If this were an article published a decade ago about an OS-included backup utility it might be understandable – after all, a decade ago, 400GB of data was a big amount!

The article goes on to provide instructions for setting up a scheduled system image. Sure, the average techo will look at the instructions provided and punch through them in a couple of minutes at most, but with instructions like the following, you’re guaranteed to (a) turn most average users off and (b) definitely provide a terrible user experience:

If you have a separate data drive, you will need to create a task in Task Scheduler to create the system image:

a.      Open an elevated command prompt

b.      Type the following command:

SCHTASKS /Create /SC <Frequency> /TN <TaskName> /RL HIGHEST /ST <StartTime> /TR “WBADMIN START Backup –backupTarget:<target> -include:<source> -quiet”

This goes to the heart of why Time Machine is so successful – Apple recognised that the only way to get users to backup is to make it painless and easy. Microsoft’s approach to end-user backup seems to be diametrically opposed to that of Apple – and as a result of it, I know which backup mechanism will save more consumer data, even given the hugely different market shares of the platforms.

When it comes to backup, Microsoft would do well to “start their photocopiers”.

 

The classic NetWorker install will see:

  • A bunch of clients
  • Optional storage nodes and/or dedicated storage nodes
  • The NetWorker server
  • The NetWorker Management Console server running on the NetWorker server

Architecturally, there’s no reason why you have to have the NetWorker Management Console server running on the backup server itself. Both logically and architecturally, there are good reasons why you would choose to keep these separate. Let’s start by using a diagram to show how the alternate architecture looks:

Divorcing NetWorker Management Console Server from Backup Server

So, what are the advantages of this sort of layout? There’s three distinct advantages:

  • Feature access – in my experience the vast majority of backup administrators are conservative in their approach to the technology in use. This means that there’s a slow-ramping process for adoption of new backup server software. While some users will hop on the bandwagon straight away, others will wait for a while. The momentum eventually builds up, but it takes a while to get there. In the meantime though, we periodically encounter situations where the features in the latest version of NMC are highly desirable. For instance, the unified monitoring provided in the version of NMC that comes with NetWorker 7.6 should appeal to just about every NetWorker administrator out there. If the NMC server and the NetWorker server are one and the same machine, it makes rolling out a new version of NMC while keeping the old version of NetWorker practically impossible. On the other hand, if the NMC server and the NetWorker server aren’t the same machine, it’s trivial to upgrade a single client to the latest version of NetWorker and NMC.
  • Performance – in small environments, the footprint of the NMC server creates negligible additional load on the backup server. As the number of clients and simultaneously active savesets ramps up though, the load of the NMC server – particularly with multiple accessing consoles – the impact of running the NetWorker Management Console server on the backup server can be observed. By keeping these hosts separate, the problem does not happen.
  • Protection – the NMC server has become considerably more stable over its lifetime, but like all software, there are no guarantees that it is crash proof. If the NMC server isn’t running on the same host as the backup server, then it gives you the advantage of being able to reboot the NMC server should there be an issue with monitoring, without impacting the actual backup server itself. In actual fact, keeping systems separate that don’t need to be together gives you better options for fault handling, upgrades and scheduled maintenance.

Assuming you want to run the NMC server as a separate host to the NetWorker server, it’s really quite easy:

  • Using either nsradmin or the existing NMC install on the backup server, modify NetWorker’s Administrator user group to include administrators from the NMC server.
  • Install the NMC server and NetWorker client software on the intended host. (If on Unix, I always recommend also installing the NetWorker man pages. You never know when you’ll need them.) Be sure to allow NetWorker to setup the NMC backup instance if you want your database backed up and aren’t sure how to configure this manually.
  • Shutdown NMC on your backup server and configure it to not automatically start up. If necessary you can start it later to retrieve historical reports – otherwise you can leave it there installed, but not running, to avoid confusion.
 

Today my site hosting service suffered an outage of over 12 hours.

While this was the first major outage for the NetWorker Blog, I have two sites hosted with the service, and have suffered several maddening and unexplained outages on the other site. Each time I’ve requested assistance regarding the other site my support inquiries have taken over 24 hours to be answered, by which time the site has been available again for hours and I get a drones’ response: “Your website appears to be working.”

As a consequence I’ll be starting to look for a hosting provider that actually has a real 24×7 support team and communicates well with its customers. If anyone out there has recommendations for hosting services, preferably Unix/Linux/Mac based that support WordPress, PHP with custom scripts and decent bandwidth/storage limits, I’d be grateful in hearing from you.

 

Introduction

When I entered the work force, for the first few months I trained as a MIMS consultant, but was then seconded to a system administration team on the other side of the country for 3 months (which became 6 months). Shortly after I returned from that, I joined the BHP IT Unix System Administration team in Newcastle and spent 4 years there. I built a lot of technical knowledge in that group, but what I got most out of that group was an understanding about what makes an excellent system administrator.

I was no means an excellent system administrator when I joined that team – I was wet around the ears, somewhat naïve, and probably too opinionated. The team that I joined taught me what makes an excellent system administrator, but in doing so also gave me an excellent foundation to some of the core requirements to be a good consultant too, and I thought it was time I shared these.

So here are what I’d call the 7 rules for system administrators, distilled from the experience of working with the best system administration team I’ve ever worked with. (While I’m at it – hello to Dave, Scott, Andrew, John, Jason and Russell.)

Knowledge Centric Approach

It wasn’t until after I stopped working with the Newcastle Unix team that I realised (to my horror) there were other ways system administration groups could run. There’s two distinct approaches:

  • Knowledge centric approach – everyone knows a little about what everyone else is doing, and while any one person will be an expert on certain things, everyone is capable of getting involved with anything.
  • Person centric approach – each system, application or function is administered by one or two people in the group at most, and the ability of the group to maintain those systems without the individuals being around is negligible at best.

My absolute belief is that any system administration team built around a person centric approach has it wrong. They do their users and the business a disservice.

Paranoia

While sometimes some of the people I’ve worked with have taken paranoia and security to extremes I find overboard, paranoia is a trait that should be considered a healthy mental attitude for system administrators. Paranoia in this case means not being overly trusting – having an idea of what processes should be running, requiring empirical evidence that the system is functional, and not making dangerous assumptions.

Testing

If you want to avoid testing, assume it doesn’t work. This is the mentality of a good system administrator. Since assuming everything doesn’t work means you have to assume that everything needs to be fixed, the alternative – having a testing regime and ensuring that changes don’t go into production without appropriate testing seems much easier.

Documentation

Documentation is vital to good systems operation and system administration. That covers the full gamut – system build documentation, procedural documentation, change control, etc. Why? Quite simply, if your systems and processes aren’t documented, then it means that you’re slipping into a person-centric approach to a system administration team.

Being Lazy

A good sysadmin is a lazy sysadmin. Lazy system administration is about automation. If a task that you perform requires you to run three commands, taking the output of each prior command and using it as input to the next command, you should be automating it. Every time you do repetitive, mundane tasks that can be scripted, you’re wasting your own time and company time. In my experience system administrators that religiously avoid scripting repetitive tasks lose up to an hour a day in mundane tasks that could be better spent elsewhere – self training, research, etc. (Of course, every bit of that automation needs good documentation!)

Only make a mistake once

We all make mistakes. Demanding people make no mistakes doesn’t account for how people learn. The trick of course is ensuring that we learn from our mistakes. That means that you should acknowledge that you’ll periodically make mistakes, but be ever determined to not make the same mistake twice.

Ask questions, listen to the answer

I used to say that the only stupid question is the one you don’t ask. This remains partially true, but it could equally be said that a stupid question is one that you ask, but don’t listen to the answer.

All system administrators should be prepared to ask one another questions (again, coming back to the knowledge-centric approach to system administration) – no one person in the team will have the answers to every single situation. But asking the question is only the first part of it. In fact, it’s probably only the first 30% at most.

The larger part – 70% of the effort, is taking the time to listen to the answer and making sure you understand the answer. In many cases that probably means asking some follow-up questions: question TLAs, question terms you don’t understand in the answer, and if the answer itself still doesn’t make sense, ask for more clarification. Sometimes you’ll have it explained to you, and sometimes you may be told that you need to do some research yourself. But don’t pretend to understand the answer when inside you’re just as confused.

In conclusion

While I’ve couched this from the perspective of rules for system administration, the techniques equally apply to just about any IT endeavor – backup administration, application administration, database administration, etc. All of these disciplines and more can follow the above principles and achieve an approach which is more satisfying – to the business, as well as from both a personal and professional perspective to the individual.

 

While there’s no native NetWorker management app for the iPad (or iPod Touch/iPhone), there are some management options available for you. On the Windows front, there are RDP clients that I’m told work quite well, though I’ve never got around to buying them myself. On the Unix front, if you’ve got an iPad and a NetWorker server, you should make sure to invest in iSSH. iSSH is a fantastic tool that I bought ages ago for the iPhone and it has continually evolved and added full iPad support for no extra charge.

Using it I can obviously get full ssh access to a Unix NetWorker server, meaning I can do any command I want – including nsrwatch:

nsrwatch running on an iPad

Additionally though, if you’re prepared to setup a VNC server – either on your own computer (as I did with my laptop) or on an appropriate server, you can also run NMC remotely:

NMC login via VNC on the iPad

NMC console via VNC on the iPad

It’s not entirely elegant, but lacking an actual management app, it’s a useful stop-gap measure.

Incidentally, if you’re looking for my general thoughts on the iPad, you can find them here on my personal blog.

 

In previous articles I’ve discussed the need for zero error policies. This was covered first in What is a Zero Error Policy?, and followed up in Zero Error Policy Management. (If you’ve not read those articles, you really should before continuing.)

Key to ensuring a zero error policy is not only adopted, but also achieved, is a good understanding of the error lifecycle. That’s right – errors have a lifecycle, which is not only well defined, but actually helps us to keep them under control. An error lifecycle will resemble the following:

The error lifecycleThe start of the lifecycle is our Test and Detect loop:

  • Detect – An error is determined to have happened either as a result of a significant fault, or as a result of routine monitoring and analysis.
  • Test – An error is determined to have happened as a result of actual testing (formal or informal).

Once it’s determined that an error has happened, we then move into the resolution cycle, which consists of:

  • Diagnose – Determine the nature of the error – i.e., the root cause. If you don’t understand the actual cause, you can’t be certain that any solution you come up with is complete.
  • Rectify – Having understood the error, it’s time to resolve it. There’s two standard resolution techniques: complete resolution or workaround. Either are acceptable, so long as the resolution technique chosen is acceptable to the business and appropriate to the error.
  • Document – Once an error is solved, it needs to be documented. As has been said on numerous occasions, “Those who don’t learn from history are doomed to repeat it.” One of the worst possible error situations for instance is one where you’ve solved it in the past, but you can’t remember what you did and thus have to repeat the entire process. At minimum, documentation requires 3 components: (a) what lead to the error, (b) how the error manifests/is detected, and (c) how the error was resolved.

The error lifecycle doesn’t stop there though, as indicated by the diagram; instead, we add that error into a test and detection register – having encountered it, we should be able to more easily be on the look out for another instance. This is hopefully where the error finishes: being monitored for, but never again recurring. In the event though that it does reoccur, the diagnosis, rectification and documentation process should be simpler.

There you have it – the error lifecycle. Knowing it allows you to manage errors, rather than errors managing you.

 

Once upon a time, if you said to someone “do you have a test environment?” there was at least a 70 to 80% chance that the answer would be one of the following:

  • Only some very old systems that we decommissioned from production years ago
  • No, management say it’s too expensive

I’d like to suggest that these days, with virtualisation so easy, there are few reasons why the average site can’t have a reasonably well configured backup and recovery test environment. This would allow the following sorts of tests could be readily conducted:

  • Disaster recovery of hosts and databases
  • Disaster recovery of the backup server
  • Testing new versions of operating systems, databases and applications with the backup software
  • Testing new versions of the backup software

Focusing on the Intel/x86/x86_64 world, we see where this is immediately achievable. Remember, for the average set of tests that you run, speed is not necessarily going to be the issue. Let’s focus on non-speed functionality testing, and think of what would be required to have a test environment that would suit many businesses, regardless of size:

  1. Virtualisation server – obviously VMware ESXi springs to mind here, if cost is a driving factor.
  2. Cheap storage – if performance is not an issue for testing (i.e., you’re after functionality not speed testing), there’s no reason why you can’t use cheap storage. A few 2TB SATA drives in a RAID-5 configuration will give you oodles of space if you need any level of redundancy, or just in a RAID-0 stripe will give you capacity and performance. Optionally present storage via iSCSI if its available.
  3. Tiny footprint – previously test environments were disqualified in a lot of organisations, particularly those at locations where space was at a premium. Allocating room for say, 15 machines to simulate part of the production network took up tangible space – particularly when it was common for test environments to not be built using rackable equipment.

In the 2000′s, much excitement was heralded over the notion of supercomputers at your desk – for example, remember when Orion released a 96-CPU capable system? The notion of that much CPU horsepower under your desk for single tasks may be appealing to some, but let’s look at more practical applications flowing from multi-core/multi-CPU systems – a mini datacentre under your desk. Or in that spare cubicle. Or just in a 3U rack enclosure somewhere within your datacentre itself.

Gone are the days when backup and recovery test environments are cost prohibitive. You’re from a small organisation? Maybe 10-20 production servers at most? Well that simply means your requirements will be smaller and you can probably get away with just VMware Workstation, VMware Fusion, Parallels or VirtualBox running on a suitably powerful desktop machine.

For companies already running virtualised environments, it’s more than likely the case that you can even use a production virtualisation server due for replacement as a host to the test environment, so long as it can still virtualise a subset of the production systems you’d need to test with. During budgetary planning this can make the process even more painless.

This sort of test environment obviously doesn’t suit every single organisation or every single test requirement – however, no single solution ever does. If it does suit your organisation though, it can remove a lot of the traditional objections to dedicated test environments.

© 2012 The NetWorker Blog Suffusion theme by Sayontan Sinha