While much of business has moved into the twenty-first century, one thing that continues to demonstrate a profound stuck-in-the-80′s mentality is the demand for on-site work. To be sure, there are some levels of work that have to be conducted on-site. It’s somewhat challenging to do hardware work remotely, for instance, unless you have a protein based robot at your beck and call on-site, in which case you may as well be there yourself in many instances.

Software work? Consulting? Let’s start being smarter about that. Let’s consider the following impasse:

  • Companies regularly require local presence when there is no need.
  • Companies regularly cite security concerns as a reason why people have to come on-site.
  • Companies want people to attend site 5 minutes ago when they have an issue.

This is the 21st century. We can get around these problems and help the environment by not requiring people to travel everywhere to do some work. (Red tape and bureaucracy alone should never be a reason to deny remote work – honestly, if it is used as a reason with a straight face, there’s something seriously wrong with a company.)

Despite what some would have you believe, I’d estimate that in 90% or more of the cases where security is cited as a reason to prevent remote access, security is not a sufficient reason. Let’s consider options here:

  • Legally binding non-disclosure agreements.
  • Isolated virtual machines, audited by both companies.
  • Physically isolated machines (e.g., incoming connection machine only plugged into network when necessary).
  • Fixed IP addresses for access isolation.
  • Audited, or even remotely monitored sessions.
  • One time passwords for login.
  • One time passwords on the remote access machine (at the consultant’s company) that are only provided when connectivity is required.
  • Encrypted traffic.
  • Encrypted traffic over VPNs. (I.e., doubly-encrypted traffic.)
  • Automatic lock-out of inactive accounts (e.g., not used in 5 days – lock out).

All of these and more will resolve all security issues in (again, I’d estimate) more than 90% of cases, and the remaining cases are as much as anything based on legal or military obligations. (Any case where it doesn’t resolve it due to red tape or bureaucracy is shameful.)

Let’s be frank:

  • We have too many cars/motorbikes on the road;
  • Diesel trains still consume an inordinate amount of fuel per passenger, regardless of whether that’s less or more than cars (this is frequently debated);
  • Electric trains have to get their power from somewhere, and for many countries that’s fossil fuel power-stations anyway;
  • Airplanes use large amounts of fuel too.

Sure, we have to acknowledge that when we use computers and network/internet infrastructure, we’re using power which in turn may be coming from fossil fuel power stations as well. But even a quick search reveals a plethora of studies that show telecommuting uses significantly less power/fossil fuel resources than regular commuting. Telecommuting doesn’t, of course, have to only be about employees, but it can be with contractors and other “on-hire” staff too. Not only that, companies that are reluctant to try telecommuting with their own employees can dip their toes into the water by making provisions for remote work from contractors, support suppliers, etc.

Now I’ll mention my biases here:

  • I do remote work
  • I do support work
  • I do on-site consulting work
  • I’ve done my fair share of lengthy travel for remote work
  • Lengthy travel doesn’t enthrall me.
  • Lengthy stays in hotel rooms don’t enthrall me.

But let’s also be honest – IT consulting is not a family-friendly environment. Long hours of both travel and on-site work can at times actually detract from the experience both for the person doing the consulting and the company engaging the consultants. Particularly when the work is running late, or out of hours, happy people work better than unhappy people.

Bottom-line/numbers person? Then let’s think about project costs. If you think that any company doesn’t build staff travel and accommodation costs into their prices, think again. (Actually, depressingly I do have personal experience in one major PC company that continually demands such stupidity from companies which they buy contracting services from – and they continue to leave a string of collapsed contractor companies and dejected, failed ex-company owners behind them who thought they could manage or hide those costs.)

In all honesty, continuing to insist on local site attendance for activities that can be done cheaper, more immediately and more comfortably for all involved is just ongoing collective business insanity.

If you want to do something for the environment in 2010 – and make business cheaper without losing profitability, how about you:

  • Suggest to your customer, if you’re a consultant, that an on-site activity they’d normally get you to do is one that you can do remotely (of course, only if it can be done remotely!)
  • Ask your consultants, if you’re a customer, whether they can do the work you want done remotely.

Of course, there’s always going to be situations where on-site work is required, but let’s start bringing consulting into the 21st century.

 

So you’re a busy backup administrator and you’re getting ready to go on leave. It’s 4pm on your final day before the holiday, you’ve finally got everything off your plate, and you think to yourself, “Now I’ve finally got the time, I’ll just quickly upgrade NetWorker before I leave.”

This unfortunately is an alternative of that Friday change rule violation known as POETS.

There’s three distinctly wrong things with this scenario:

  • Infrastructure upgrade done without change control.
  • Infrastructure upgrade done at the last minute.
  • Infrastructure upgrade done without follow-up monitoring.

Any one of those scenarios is enough to cause a nightmare situation – either for yourself, getting call-outs when you’re meant to be on holidays, or for your colleagues, left in the lurch after you switch your phone off for two weeks and go on a holiday to the East Islands.

All three though? That’s just asking for trouble.

(This lesson doesn’t actually just apply to NetWorker – it applies across the board for system, application and storage administration. Don’t modify the system just before going away for a while.)

Just before this holiday season, I had a customer upgrade* their NetWorker server from 7.3.x to 7.5 before going on leave. Not 7.5.1, not 7.5.1.8, 7.5. This didn’t go so well, and a few days later when the fill-in administrators noticed the issue**, there was a bit of work to rectify the various issues and some backups during that time didn’t work.

This however is by no means unique. Following Twitter I noticed one on-call person suffer a hideous xmas day and following day working on a call-out from what appeared to be an untested change done by someone else before that other person went on holiday.

And non-betting man that I am, I’d bet a considerable wad of money (and win) that this fellow’s experience wasn’t unique for IT workers over xmas 2009.

In short: choosing to do an untested/uncontrolled upgrade just before going on holidays can be either self-destructive or selfish (or even both) – it may lose your your holiday, depending on the level of the fail and the backup (or lack thereof) within your company, or it may cause a colleague to have an insufferably unpleasant time. (Alternatively, if you can be reached, it may result in you having a bad time on your holiday in order to help out a colleague having a bad time as well.)

The problem with rushing through upgrades at the last minute is that they tend to be poorly done, even if they seem simple enough. Even if change control is being followed, if that change control has been rushed through (as it can sometimes be done as a “last minute” activity), then it provides no guarantee that the change will work smoothly. And don’t forget: Murphy’s Law works in the datacentre as well. Something that looks easy, that you should be able to do with your eyes closed, when done as a rush job at the last minute can come unstuck quite easily.

So please – for your sake, for your colleagues sake, for NetWorkers’ sake and for the sake of your company: please don’t upgrade just before you go on holidays.


* upgrade = “update” in NetWorker speak

** Which should serve as a reminder that you should never only have one backup administrator.

 

Part of my goal in moving the blog from WordPress to a dedicated site was to be able to start adding more material that doesn’t necessarily fit within the confines of a standard blog.

Starting today, I’ve launched the NetWorker Information Hub, which I hope to build as a repository of knowledge about EMC NetWorker, with a variety of information sources, advanced topics, and comprehensive links to other NetWorker information sites across the internet, both official EMC sites and other enthusiast sites.

Bookmark the site and make sure to check it out – and if you’ve got ideas on content or resources that it should link to, let me know.

 

You may recall that in an earlier article, “(In)securing your logs using nsr_render_log -z“, I pointed out that the “-z” option, advertised as capable of obfuscating host and user details to make the log truly anonymous did, at best, an extremely poor job of doing so and should be considered untrustworthy.

As a result of the discussions I had with EMC support over this, NetWorker 7.6 has seen the “-z” option removed from the man page as an option. Disappointingly, it remains available as a command line option, meaning you can still run:

# nsr_render_log -z /nsr/logs/daemon.raw

Why is this disappointing? Because it’s still entirely insecure. For example, after running it against my daemon.raw file on a lab server, I’ve got lines like:

...host1 nsrjobd SYSTEM error: remote exec problem for command `nsrcheckbackup.sh -s nox 
-g archon -c archon / /Volumes/TARDIS/Yojimbo /Volumes/Yu': No route to host
...host1 savegrp archon: error occured during probing; could not execute probe job
...host1 nsrd savegroup failure alert: archon: error occured during probing; could not execute probe job
...host1 nsrd runq: NSR group archon exited with return code 21.
...host1 nsrd savegroup info: aralathan is probing

Furthermore, NetWorker startups will still reveal hostnames in the licensed host list, etc.

As such, despite the fact that the -z option is still available within nsr_render_log, my original recommendation remains: don’t use it, don’t rely on it, and if you need to secure (obfuscate) your daemon log, do it manually.

 

In the spirit of this site, I hope that:

  • You’re all able to take a break;
  • All your backups run smoothly while you’re on that break;
  • No-one needs a recovery while you’re gone, and if they do –
  • Someone else knows how to do it.

For the next week I’ll be concentrating on getting the rest of nsrd.info up and running.

Be safe and happy both in and outside of the datacentre for this holiday season.

 

As the year draws to a close, I thought I’d spend a few minutes jotting down the major (technology) related lessons I’ve learnt this year. I’m not talking about training lessons, but rather, real world lessons that have real implications for my day to day work and what I’d recommend to customers (or anyone else who takes the time to listen to me).

It seems fittingly appropriate that this post is not only one of the first ones on this new site, but is also the 300th post on the blog.

So, without further introduction, here’s my key technology lessons from this year.

ext3 sucks

I’m still kicking myself that it took so long to actually get around to starting to use XFS on Linux. Since that point, I won’t go back. Regardless of whether it’s for regular filesystems or disk backup filesystems, I’m not likely to back to the ext tree, and it’s equally likely that I won’t try another Linux filesystem until btrfs comes out. I can’t believe it’s not offered by default on RedHat (and therefore CentOS).

I really like VTLs

I’ll be the first to admit that I still rail against the principle of VTLs: to be perfectly blunt, we should not need them in an ideal world. But regardless of whatever backup product you’re currently using, there are practical limitations with the way conventional disk backup is implemented. The level of “suck” will vary depending on the product, but the fact remains that we need VTLs (and I’m growing to like VTLs) not because they represent a compelling evolutionary architectural step in data protection, but because they’re frequently better than the alternative of conventional backup to disk.

When it comes to tape, logic struggles to defeat stupidity

Every year we see a plethora of stories enthusiastically predicting the death of tape. Despite all the logical arguments to the contrary, these stories just keep on coming out again, and again, and again, and again. It continues to be tiresome.

My long term avoidance of working with MySQL seems well founded

In terms of administration, MySQL seems sadly lacking in comparison to PostgreSQL. (Obviously this entry is biased in favour of PostgreSQL, but some of the arguments are compelling.) I just have to say, I really find that the MySQL way of doing things comes across as very counter-intuitive. Maybe it’s because before I used PostgreSQL, the databases I most used were Oracle and Ingres, but PostgreSQL just seems to make more sense to me. I’ll continue to use MySQL where I need to, but PostgreSQL will remain my favourite open source database.

Be prepared for slapfests when reading vendor blogs or twitter postings

You have to take the bad with the good, it seems. When they’re good (i.e., detailing their products and how they envisage the market moving forward), they’re very good. But when they’re engaged in petty point scoring slapfests with one another, it’s cringeworthy school-ground name calling at best, and downright unprofessional at worst.

There are actually merits to Twitter

This one took me completely by surprise. I had long railed against Twitter as being incomprehensibly useless. I hereby admit I was entirely wrong. I’m starting to grok that it’s a remarkably useful tool for train of thought collaboration in a way email could never achieve (particularly with verbose people like myself – impose a character limit and it forces a rethink of how to get your ideas across).

Everyone likes Drobo

For a period in November it seemed like the technical world was ablaze with lust for Drobo devices. I’m already starting to be afraid of my next electricity bill, given that “intelligent” metering is being introduced – and I do have an awful lot of drives attached to my Mac Pro. Whether I buy a Drobo or not will largely depend on whether my electricity goes up sufficiently high enough that there’s a 3 quarter cost-neutrality between the two costs :-)

Mac Pros are about the most awesome computers you could ever possibly hope to own

No debate, no discussion, throw out every other desktop computer/tower you’ve ever had. If you want sheer performance and scaleability, do yourself the favour and go buy a Mac Pro now. Seriously, now – stop reading, and go buy one now. That’s me talking from the previous generation Mac Pro, not even the Nahelem series, which is practically twice as fast again.

(Why, just why EMC doesn’t at least port the NetWorker storage node to Mac OS X is also beyond me. My Mac Pro, only moderately specced, smashes the performance capabilities of many of my customers’ backup servers.)

People don’t always do everything I tell them to

Otherwise you’d be too busy buying a Mac Pro to read this point, hmmm? I guess I’d better come up with a few more points then…

Leopards do change their spots

Moving from Mac OS X 10.5 (Leopard) to 10.6 (Snow Leopard) was one of the biggest performance boosts I’ve had in years on computers.

A big name does not guarantee a reliable product

Think Cloud, then think Microsoft/Danger and the T-Mobile sidekick debacle.

Actually, this didn’t really take me by surprise, but it did seem to take quite a few other people by surprise, so maybe what took me by surprise was that it took others by surprise.

Apparently you can just keep rolling the same tired old turd in glitter

Think Microsoft and Live Search, oops I mean Windows Live Search, oops I mean MSN Search, oops I mean Bing. There must obviously reach a point where all you see is the glitter and suddenly it’s good (at least that’s how it seems to go in the media).

More recently, Microsoft seems to have decided that in order go get more people to use Bing, they need to come up with exclusivity arrangements for search forms on mobile devices. It would seem that Microsoft are still trying the same tired old arrangements of buying market share: the problem is that people are getting increasingly aware of this trick and will rebel rather than silently put up with it.

The problem with Microsoft is Ballmer

People may think that I’m just anti-Microsoft through and through. I used to, as well. These days I’ve determined I’m anti-Ballmer, and think the best thing that company could do would be to replace him and actually put someone in charge who can drives the company up rather than down. This was after all the decade that Microsoft missed. (Also check this article over at CNET.)

Apparently Avatar is only good because of NetApp

If you believe the incessant twittering, blogging and raving from various folks from NetApp, Avatar is the most amazing movie of all time not really because of the quality of the effects, rendering, story and acting, but because the files for Avatar were stored on NetApp fileservers. (Sorry NetApp folks, I don’t buy this. You don’t have any right to hitch your commercial wagon to this. The type of storage used shouldn’t make or break a movie so long as it’s been appropriately planned and deployed, and going on like a broken record gets a little tiresome.)

That being said, Avatar is, in my opinion, the greatest blockbuster of all time.

2009 was the year DeDupe entered mainstream

Sure, deduplication has been around for a while, but in whispers and bleeding edge implementations. I still maintain that in its current state, deduplication is mostly bleeding edge, but it’s definitely entered the mainstream. Nothing emphasised this more than the outright bidding war between EMC and NetApp for Data Domain.

Data deduplication can no longer be ignored, and given the logical challenges of having a total data lifecycle data deduplication process that keeps the data deduplicated throughout, 2010 will be the year when companies start to pick (in backup) whether they need to implement source or target deduplication. (My preference for a variety of reasons remains target based deduplication, for what it’s worth.)

Michael Dell doesn’t get the computer industry

When Apple purchased NeXT, Michael Dell was famously quoted as saying that as far as he was concerned, Apple should have been closed and the money given back to shareholders.

Perhaps, given that Apple’s market capitalisation massively exceeds that of Dell, they have almost double the assets of Dell, significantly higher net income, yet almost half the employees, Michael Dell should forget what he learnt at school and try some fresh business approaches. Unfortunately for Dell, they made their name on shipping a similar product, but being cheaper and faster. Now that their competitors have finally pulled their pants up and are able to negate those advantages, Dell has continually struggled to find true competitive advantages. In short, Dell has forgotten the crucial business lesson that everyone can imitate: in order to succeed you have to learn how stay unique.

I’m losing my tolerance for Solaris

It used to be that when customers said to me “should I deploy NetWorker on Solaris or Linux?” I’d 100% of the time recommend Solaris.

I’m struggling to reach that same conclusion any more. This isn’t due to costs, or potential issues to do with Oracle’s acquisition of Sun, but due to incredibly weird timeout issues that occur on Solaris compared to Linux.

Take for instance an issue I’ve been working on for the last several months which is now “resolved” in NetWorker 7.5.1.8 on Solaris. Well, I say “resolved” because I think it’s more that weird Solaris timeouts prevent it from happening – even though it does still happen on Linux.

The scenario is that if you’re staging from a disk backup unit on a storage node to a server, then have extremely flaky communications between the two hosts that cause NetWorker to:

  • Detect the storage node nsrmmd’s have stopped responding
  • Attempt to restart them
  • Fail
  • Timeout and attempt to restart them
  • Comms are restored
  • Restart nsrmmd’s

Here’s the odd thing: in this scenario it appears that from the time that the nsrmmd’s are restarted, it takes approximately 1 hour and 45 minutes for NetWorker on Solaris to finally kill the staging operation. On Linux? 3 minutes or less.

This is on top of a variety of other timeout issues – inactivity timeouts being ignored when certain network failures occur, etc.

So when someone asks me now if they should deploy NetWorker on Solaris or Linux, I’m likely to counsel fibre channel device connectivity and Linux all the way.

Coming to this conclusion having used Solaris in various forms since 1992 is greatly disappointing.

I prefer the cumulative patch cluster package release style

When EMC started offering cumulative patch clusters as whole new packages/installers rather than just issuing hot fixes, I railed against it, feeling that it wasn’t all that logical an idea. Now, I’m rather sold on it. I do however continue to think that not easily offering the cumulative patch builds within PowerLink is a tad painful.

dbgcommand

For some time, dbgcommand was a mysterious and pseudo-magical command that was whispered about and occasionally handed out by EMC engineering. Since we managed to get it included in the standard distribution, I’ve grown to absolutely love this command, and think that all NetWorker administrators should ensure they’re acquainted with it.

The RFE system works

Many would argue that EMC’s RFE system, like that of all other vendors, sucks. Certainly there remains room for improvement, but 2009 was a record year for me in getting bugs and RFEs alike in NetWorker actually fixed and integrated into new releases. I’m very impressed with the efforts that EMC has gone to in order to improve the NetWorker RFE system, and I’d encourage those who have previously given up on it to revisit it. (Alternatively, in case I’m having more luck because of this blog, contact me and convince me to make the argument for a bug fix or RFE resolution. Obviously I can’t promise anything, but something is working for me.)

You can’t use compress and encryption directives

Earlier in the year I’d blogged to say that you could. What it turned out was that NetWorker silently ignored both directives in this case, and I learnt this some time later.

I like Wizards

I don’t mean the Harry Potter kind. (Well, I like Harry Potter, but that’s not my point). I’d long scoffed at Wizards in NetWorker, but the updates in NetWorker 7.5 and Oracle Module 5 led to the first time I’ve found something easier and more efficient to do in the GUI than I normally do from the command line.

Pseudo-disorganised tools can help people anal retentive about filing

I love Yojimbo. I rarely, if ever go a day without using it. End of point.

Don’t always trust the man pages

I refer to nsr_render_log and the lack of security in its security feature (now removed in 7.6).

In summary

When I was at University, a philosophy lecturer remarked that University is the last place you learn for the sake of learning. After that, you learn for work or learn to survive. I thought at the time this sounded like sage advise. Now I think while he may have been technically accurate, he wasn’t correct. People who only learn because they need to for their job don’t really learn, they assimilate the required information and nothing more – and rarely the comprehension required to use it out of the exact reason they learnt it for.

It’s possible to learn for the sake of learning for your job as well. That’s called being enthusiastic, and being passionate about what you do.

I’d like to think that my blog, started in January this year, helped others learn for the sake of learning.

 

The Register reports that Overland Storage is on the ropes, as far as the Nasdaq as concerned. While delisting from the Nasdaq wouldn’t automatically be a killer sentence for Overland, it does point out the ongoing trend in storage – the consolidation of the marketplace to fewer vendors with larger market capitalisations able to survive the troughs of the current economic environment more readily.

This is a NetWorker blog, primarily, which means I’m somewhat affiliated in skill set and activities to EMC, but I continue to maintain that in storage, competition is King. Here’s hoping Overland can either turn themselves around, or that they’re not swallowed in such a way that their capabilities are lost forever.

 

I’d like to welcome everyone to the new site. I’ve transferred across all previous posts and comments and I’m pleased to say the process went remarkably smoothly.

I’m still sorting out email subscriptions on a non-Wordpress hosted blog, and may choose a slightly different option for this – I’ll have an update on this topic in the next day or so.

In the meantime, I’d invite you to update your bookmarks for the NetWorker blog to point to this, the new site, nsrd.info/blog.

 

I’d like to take a moment to wish all the regular readers – and any new visitors – a very safe and happy holiday season. There’s too many of you to send cards out to everyone (even if I knew all your contact details) so I thought I’d give seasons greetings through a suitably NetWorker way:

Seasons greetings from the NetWorker Blog

 

As a long term Unix admin, it’s frustrating when there are commands on my systems for which there aren’t man pages. As a long-term NetWorker user, it’s equally frustrating when there aren’t man pages for particular NetWorker commands.

When I’ve discussed this in the past, I’ve usually had a response of “that’s because you shouldn’t be running that command”. That’s a bad response. The correct response should be something along the lines of “oops, we’ll write a man page for the next release that states:

That command is for internal NetWorker use only. It does X. It should not be run manually.

Having undocumented commands that give no output, hang or produce strange results is just inviting frustration. Of just the nsr prefixed commands, on my current 7.6 lab server, the following commands are undocumented:

  • nsravamar
  • nsravtar
  • nsrbmr
  • nsrcatconfig
  • nsr_cp_install
  • nsrdmpix
  • nsrdsa_recover
  • nsrdsa_save
  • nsrfile
  • nsrfsra
  • nsrlmc
  • nsrndmp_2fh
  • nsrrcopy
  • nsrrcopy2
  • nsrvcbserv_tool

So out of the 55 nsr prefixed commands I have on my server, 15 (or 27%) are undocumented.

Note to EMC: This does not produce a healthy level of trust. Please – get some documentation on these commands, even if that documentation gives us a one line overview of where they’re used and tells us not to run them ourselves.

© 2012 The NetWorker Blog Suffusion theme by Sayontan Sinha