[Update]

False positives – that’s the update following the investigation by Samsung and others. I’m personally very glad to hear that – it effectively seems to have come down to malware detection software making the wrong call, and then a supervisor at Samsung making a really wrong call. Anyway, we can all relax now. Privacy – yep, Samsung believes in that, clearly. (Thanks to adrift_in_space for pointing out the correction on the Samsung article over at Network World.)

[Original article below]

It’s articles like this that make me glad that the only Samsung product I have in my house is a TV – and it’s not connected to the internet. It’s downright spooky and creepy.

Samsung responds to installation of keylogger on its laptop computers.

Honestly, you have to read it to believe it. Apparently a Samsung technical support supervisor admits that Samsung knowingly (i.e., deliberately) installed keyboard logging software on their laptops before they were sold so as to:

“monitor the performance of the machine and to find out how it is being used.”

If this is true, I see a big and nasty class action lawsuit looming for Samsung. And they’d totally deserve it.

 

It struck me recently while working on a report that there’s 7 distinct challenges in data protection, and that we can only address those challenges when we’re completely across them.

Most sites with enterprise backup will be aware of a few of these challenges, but as soon as you lose sight of some of them, you’ve lost focus on the goal.

They are:

  1. Budget
  2. Communication
  3. Regulatory Compliance
  4. Age
  5. Volume
  6. Search
  7. Formalisation

Each of these on their own represents a particular obstacle or hurdle that needs to be overcome. I should also stress – these are issues for data protection as a whole, and that’s not necessarily limited just to backup and recovery.

What’s even more important, is when you look at that list, it’s clear that any issues your site is having are not unique. Every company has to deal with the same challenges, and therefore you don’t have to feel that your solution must be unique. It just simply has to fit.

And there’s a world of difference – and cost – between “unique” and “fit”.

Let’s look at each of those challenges individually and explain what I mean.

Budget

Something I mention a bit in my book, and when I run training courses, is that I could take the entire budget, for an entire organisation, spend it solely on data protection activities, and still not come up with a solution that is 100% proof positive against any form of data loss or contingency that may happen. There’s always another contingency or potential problem looming around the corner. Sure, it might end up being something like “asteroid hits the earth” or “pandemic kills 99% of the human population”, but the net fact is: you can’t pre-emptively deal with every single possible scenario that may occur.

So it all becomes a game of “risk vs cost”. What’s the risk of it happening? What’s the cost of preparing for it? What’s the cost of it happening and not being prepared? What’s the risk that there’s nothing you can do about it?

As soon as you can start boiling everything down to “risk vs cost” you can actually prepare your data protection needs appropriately.

Communication

Except in the smallest of businesses, there’ll be different departments. And as soon as you have different departments, you have to factor in communications between those departments. Effectively, at this point, we’re talking about IS – Information Services – rather than IT (Information Technology) getting involved. You need to have clear and effective communication between the various departments within the business and the IT group in order to ensure that everyone understands the data protection requirements. In fact, you need to have that communication for pretty much everything to work. (Otherwise you end up in a situation where people think the muck described by the 37 Signals essay is a realistic portrayal of IT.)

To form effective communication, you need a bridge between a department and IT. That bridge is IS; the IS people may actually be the same people as the IT people, but the fact remains that the communication must be held at the policy level rather than the technical level. It’s not the role of someone in department X to understand how Y is done. It’s the role of IS to take their requirements, take IT options, and present strategy and requirements to the business.

Or if you want to phrase it another way – imagine someone prancing around stage like a monkey with bad flop sweat screaming out “Communicate! Communicate! Communicate!”

It’s that important.

Regulatory Compliance

Like it or not, we’re in an age where there is regulatory compliance attached to a lot of data protection. How long should information be kept for? Does it need to be destroyed at the end of that life time, or can it just be kept ‘forever’ if that’s easier?

Someone, somewhere in the company, needs to be aware of the regulatory compliance requirements that affect the company. You might say this is part of communication, but usually there’s somewhat of a gulf between how long departments want to retain data for, and what they’re required to keep data for. As to which one is longer: well, flip a coin. You need to know both.

Age

Go to a museum or library. Find an old book in your language,  pick it up, open it to a random page, and I bet you’ll still be able to mostly grasp what was written. As an example, I’ve read Leviathan (Thomas Hobbes) several times. It’s not necessarily easy going, but you can do it.

Can you confidently say that a document written by someone in say, WordStar 1.1, hanging around in a tired old directory on a fileserver somewhere within your environment is still readable?

While age presents particular problems to paper based record keeping, it’s never been easier to preserve and replicate such information. Grab it early enough, and you photocopy the original, or scan/OCR it. Suddenly you’ve got the information all over again, in relatively pristine format. It might be from several hundred years ago even, if not longer. There’s fictional works out there going back 2000+ years that people just casually read, for instance.

But age presents a particular problem to data protection in a digital age: it doesn’t matter squat if you can recover, or keep online a document going back 5, 10, 15 years, if you can’t actually retrieve the data within it.

So age becomes a significant planning factor. How do you ensure that not only can you can retrieve a file or chunk of data from 7 years ago, or 10 years ago, but it actually is still meaningful to someone?

Volume

Without a doubt, the amount of data we’re storing each year grows at a fantastic rate. Data is somewhere between air and liquid – it seems to want to expand to fill whatever storage is available, within reason. The explosion in digital media is just further exacerbating this. I’d suggest that we’re moving from the first digital age into the second at the moment; the first digital age was where data was almost naturally structured – databases are a classic example. Now though, the second digital age is all about unstructured data. Educational facilities for instance are increasingly making every lecture done by every academic available – not as a bunch of PowerPoint slides, but the actual presentation, as a video file, and often as a separate audio file, to assist people with disabilities, or distant students.

That data growth is not slowing down. I don’t see it slowing down or plateauing any time soon – and nor does most of the storage industry.

Search

It used to be that finding data stored ‘somewhere’ was akin to finding a needle in a haystack. Now, it’s a case of finding a needle in dozens or hundreds of haystacks.

It doesn’t matter how much data you store online, or retain in backups, archive, etc., if you can’t find it when you need it. It’s the sister problem to the ‘age’ issue – there’s far more than just storage involved here.

Search is big business. We see that with Google every day, but let’s consider a prime example – it used to be that filesystem/OS search tools were primarily around filename search. “Tell me part of the file name, and I’ll have a hunt around for it”, was the old approach. Now, it’s “tell me something that’s in the file, and I’ll have a hunt around for it.” I use it every day. If anything, tools like Apple’s Spotlight, for instance, have devolved my previously anal retentive approach to file storage because I don’t have to rely so much on structure any longer. I can search by content.

That works for text. What’s coming next is searching by content for complex data and media. For instance, you can already search for audio – point your iPhone at a speaker, turn on Shazam, capture 11 seconds or so of a song and violá, you’ve suddenly found a song based on a snippet. I imagine in 10 years time people who have some sense of pitch will be able to hum, sing or whistle a few bars and do the same thing. Image search is a growing area too – you can upload an image to some websites and find copies of it online – even to the point of say, finding larger, higher resolution copies of it online, etc.

Video? Undoubtedly coming.

The first vs second digital age analogy works well here too, I think. Search was able to be relatively simple when data was mostly structured. However, with that move to unstructured data, search becomes vitally important.

Make sure you have a search strategy.

(Finally) Formalisation

Most IT departments have grown from ad-hoc, informal processes within the average company. Start with a few people hired to keep systems running, and eventually as the company grows you’ve suddenly got a team of IT staff in a full time department.

What often doesn’t grow is the formality of the documentation and processes. It’s only natural that people will want to keep these as informal as possible, and I’m not suggesting that they need to be miracles of modern communication, but the simple fact remains: if it’s not written down, it doesn’t get done.

There reaches a point in any organisation where you have to be prepared to bite the bullet and admit “we have to take a more formal approach to things”. Implementing change control is a classic example; most big businesses take this for granted – yet most small businesses will start out with almost no change control process at all. Eventually though the business will hit a critical size and it becomes vitally important to actually have a real change control process.

That same jump from informal to formal is required on every level. You need formal documentation about how the network hangs together, you need formal documentation about creating new user accounts, etc. And you definitely need formal documentation about how data protection is handled within the company.

Summarising

Coming back to the original list, I can reiterate that the challenges faced in data protection are:

  1. Budget
  2. Communication
  3. Regulatory Compliance
  4. Age
  5. Volume
  6. Search
  7. Formalisation

None of those, individually should be any surprise to anyone. Again, they’re not unique to anyone either. We all have these same issues, regardless of whether we’re a customer, an integrator, a vendor, a whatever.

As soon as you acknowledge the challenges though, you can plan to overcome them.

 

When people are just starting to get into NetWorker, a common situation is that they get confused about the difference between cloning and staging. (This isn’t helped given NetWorker can report in-progress staging operations as cloning – a perennial source of annoyment.)

So, what’s the difference?

  • A clone operation is where NetWorker duplicates a saveset. It makes a registered copy of the saveset, and at the conclusion of the operation is aware that it has an additional copy.
  • A stage operation is where NetWorker moves a saveset. It first makes a registered copy of the saveset, and then at the conclusion of the operation removes reference to the instance that it copied from.

Typically when we talk about staging, we talk about moving from a disk (media type ‘FILE’ or media type ‘ADV_FILE’) volume to tape. In such a situation where NetWorker stages from an actual FILE/ADV_FILE volume, it not only removes reference to the original saveset, but it actually removes it from the source volume as well. That’s to be expected – it’s a real disk filesystem that NetWorker is accessing, and removing a saveset is as simple as just running an operating system ‘delete’ command.

While it’s not often done, NetWorker does support staging from tape – but obviously when it’s done reading from the source volume, it can’t then selectively erase chunks of savesets from the source tape. Instead, all it does in that situation is delete, from the media database, the reference to the saveset having been on the source volume.

(In case you’re new to NetWorker and intend to now run off and try some staging – make sure, please, before you do, to read the second article ever posted on the NetWorker Blog – “Instantiating Savesets“. It has some very cautionary information about staging operations.)

One final thing – while I said that a clone or stage operation is done against a saveset, it’s not always as simple as that. If you tell NetWorker to clone or stage an individual saveset/saveset instance, that’s exactly what it will do. However, if you tell NetWorker to clone or stage a NetWorker tape volume, it will clone/stage the entire volume, with multiplexing left intact.

Regardless of those caveats though, remember the simple rule – a clone is a copy operation, and a stage is a move operation.

 

There’s a report over at iTWire that has two highly pertinent details. (iTWire – Aussie storage growth above average: Gartner.)

The article is about how Australian spending on storage is growing faster than the rest of the world (IMHO that’s just further proof of how helpful the government stimulus package was), and has two particular points of interest.

First:

The big winner was EMC, which saw its revenue from the region grow from $US533.9 million to $US716.0 million. Most other vendors also saw improved revenues…

That doesn’t surprise me. As an employee of an EMC partner, I know EMC have been very strongly pushing in the Australian market over the last 12 months. I fully believe that other vendors have been pushing hard and (for the most part) achieving good results, but EMC has had a really solid story during this spending cycle, and it’s been paying off – time and time again.

What really didn’t surprise me though was the “but” following that above quote:

…but the biggest loser was Oracle. In 2009, Sun had $US134.4 million revenue in 2009. Now part of Oracle, it only recorded $US82.1 million revenue in 2010

Since the Oracle acquisition of Sun, every single one of my customers who had previously been a large Sun customer has either been resolutely turning away from the vendor, or eyeing them with firm displeasure. Why? Oracle’s higher prices for maintenance and product has had a significant impact on the budgetary options available to one of Sun’s biggest previous customer bases – the educational market. (This, for what it’s worth, is why I penned the article last year, “RIP Solaris“.)

While I’m not normally one to put much stock in analyst reports, this one seems to gel with what I’ve been seeing for the past 12 months.

 

In a previous blog post, I discussed how much I liked the scheduled cloning operations introduced in NetWorker 7.6 SP1. Since then, I’ve had several people comment on it saying that while they’re able to manually start scheduled cloning operations, they’re not able to stop scheduled cloning operations in NMC – regardless of whether they were manually or automatically started.

Now I thought I’d been able to manually stop a scheduled cloning operation via NMC during beta testing, but I may have confused myself with something else, and when I noticed the same issue, it led me to think – can I stop this some other way, maybe from the command line? (For what it’s worth, the inability to stop a scheduled clone from NMC is a known issue, and there’s an EMC request running for it.)

It turns out without NMC, the command line is how you stop a scheduled cloning operation. It actually turned out to be fairly simple in the end. To do so, you use jobquery and jobkill.

First, use jobquery to identify the scheduled clone job you want:

# jobquery
jobquery> show name:; job id:; job state:
jobquery> print type: clone job; job state: SESSION ACTIVE:
                      job id: 64002;
                   job state: SESSION ACTIVE;
                        name: clone.linux clones;

Once you’ve got that job ID, all you have to do is quit jobquery, and run:

# jobkill -j jobID

In this case – it would be:

# jobkill -j 64002
Terminating job 64002

That’s it – that’s how you stop a scheduled clone job.

 

I have to say, I’m really liking NetWorker’s ability to work with persistent binding on tape libraries now.

If you’re not aware of persistent binding, it exists to resolve a problem whereby on some platforms (such as Windows and Linux), device re-ordering can happen across reboots. For a long time there’s been ways of preventing this from being an issue for filesystems/LUNs – for instance, on Linux most filesystem types support mounting a filesystem via a unique label or UUID. For example, this is a typical entry from /etc/fstab on a CentOS install:

LABEL=/boot     /boot       ext3    defaults        1 2

This allows the OS to mount the /boot filesystem regardless of whether it’s on /dev/sda, /dev/sdb, /dev/whatever.

Persistent binding allows us to do the same thing with tape as the above does with disk. The advantage of this is obvious: when NetWorker configures a tape library, it maps element order to device paths –

[root@linuxvtl ~]# sjisn 3.0.0
Serial Number data for 3.0.0 (SPECTRA  PYTHON):
	Library:
		Serial Number: XYZZY_A   
		SCSI-3 Device Identifiers:
			ATNN=SPECTRA PYTHON
			WWNN=50223344AB000000
	Drive at element address 1:
		SCSI-3 Device Identifiers:
			ATNN=IBM     ULT3580-TD1  XYZZY_A1  
	Drive at element address 2:
		SCSI-3 Device Identifiers:
			ATNN=IBM     ULT3580-TD1  XYZZY_A2  
	Drive at element address 3:
		SCSI-3 Device Identifiers:
			ATNN=IBM     ULT3580-TD1  XYZZY_A3  
	Drive at element address 4:
		SCSI-3 Device Identifiers:
			ATNN=IBM     ULT3580-TD1  XYZZY_A4

If you can’t see the mappings there, don’t worry – we can see that via inquire – for example:

scsidev@3.1.0:IBM   ULT3580-TD1   550V|Tape, /dev/nst0
	S/N:	XYZZY_A1  
	ATNN=IBM     ULT3580-TD1     XYZZY_A1  
	WWNN=50223344AB000100

That information works well for conventional situations where there’s no risk of a device re-ordering.

When there’s a risk of re-ordering however, the above style of configuration doesn’t work – or if it does, it doesn’t work across reboots. To avoid it in its entirety, we instead do a configuration that uses more exact identification details. Typically, this is in a fibre-channel scenario, and that means WWNs.

We can access device details referencing WWNs via the persistent binding mode – inquire -p, and jbconfig -p.

Let’s look first at inquire -p:

[root@linuxvtl ~]# inquire -p
scsidev@3.0.0:SPECTRA PYTHON    |Autochanger (Jukebox),
/dev/tape/by-id/scsi-350223344ab000000
		S/N:	XYZZY_A   
		ATNN=SPECTRA PYTHON
		WWNN=50223344AB000000
scsidev@3.1.0:IBM   ULT3580-TD1 550V|Tape,
/dev/tape/by-id/scsi-350223344ab000100-nst
		S/N:	XYZZY_A1  
		ATNN=IBM     ULT3580-TD1     XYZZY_A1  
		WWNN=50223344AB000100

If we run jbconfig -p, the output and run-scenario looks a little different, because it’s referencing WWN-based paths rather than standard /dev/nst* paths:

[root@linuxvtl ~]# jbconfig -p

Jbconfig is running on host linuxvtl (Linux 2.6.18-128.el5),
  and is using linuxvtl as the NetWorker server.

	 1) Configure an AlphaStor Library.
	 2) Configure an Autodetected SCSI Jukebox.
	 3) Configure an Autodetected NDMP SCSI Jukebox.
	 4) Configure an SJI Jukebox.
	 5) Configure an STL Silo.

What kind of Jukebox are you configuring? [1] 2
14484:jbconfig: Scanning SCSI buses; this may take a
while ... 
These are the SCSI Jukeboxes currently attached to your
system:
  1) 350223344ab000000: Spectralogic
  2) 350223344ab000800: Spectralogic
Which one do you want to install? 1
Installing 'Spectralogic' jukebox - 350223344ab000000.

What name do you want to assign to this jukebox device? VTL1
15814:jbconfig: Attempting to detect serial numbers on the
jukebox and drives ...

15815:jbconfig: Will try to use SCSI information returned by
jukebox to configure drives.

Turn NetWorker auto-cleaning on (yes / no) [yes]? no

The following drive(s) can be auto-configured in this
jukebox:
 1> LTO Ultrium @ 3.1.0 ==>
/dev/tape/by-id/scsi-350223344ab000100-nst
 2> LTO Ultrium @ 3.2.0 ==>
/dev/tape/by-id/scsi-350223344ab000200-nst
 3> LTO Ultrium @ 3.3.0 ==>
/dev/tape/by-id/scsi-350223344ab000300-nst
 4> LTO Ultrium @ 3.4.0 ==>
/dev/tape/by-id/scsi-350223344ab000400-nst
These are all the drives that this jukebox has reported.

To change the drive model(s) or configure them as
shared or NDMP drives, 
 you need to bypass auto-configure. Bypass
auto-configure? (yes / no) [no] no

Jukebox has been added successfully
...

Once a library has been configured with persistent binding, the device access paths logically become different. On Windows, you’ll get device path names of \\.\TapeX where X starts at something along the lines of 2^31-X; on Linux, the paths will vary depending on the install – for instance, CentOS may give a different result than Oracle Unbreakable Linux, etc. The device paths on Linux as well will explicitly reference the WWNs:

[root@linuxvtl ~]# nsrjb -v
<snip>
drive 1 (/dev/tape/by-id/scsi-350223344ab000100-nst) slot :   
drive 2 (/dev/tape/by-id/scsi-350223344ab000200-nst) slot :   
drive 3 (/dev/tape/by-id/scsi-350223344ab000300-nst) slot :   
drive 4 (/dev/tape/by-id/scsi-350223344ab000400-nst) slot :
While this makes referencing individual tape drives a little more fiddly, it has the distinct advantage that across multiple reboots, the library remains fully operable and all devices accessible – a very, very small price to pay. There is, indeed, virtue in persistence. If you want to read more about NetWorker and persistent binding, check out the whitepaper about it available on PowerLink.
 

I have to admit, I have great personal reservations towards virtualising backup servers. There’s a simple, fundamental reason for this: the backup server should have as few dependencies as possible in an environment. Therefore to me it seems completely counter-intuitive to make the backup server dependent on an entire virtualisation layer existing before it can be used.

For this reason I also have some niggling concerns with running a backup server as a blade server.

Personally, at this point in time, I would never willingly advocate deploying a NetWorker server as a virtual machine (except in a lab situation) – even when running in director mode.

Let me qualify: I consider ‘director’ mode to be where the NetWorker server acts almost like a dedicated storage node – it only backs up its own index/bootstrap information; with all other backups in the datazone being sent to storage nodes. Hence, as much as possible, all it is doing is ‘directing’ the backups.

But I’m keen to understand your thoughts on the matter.

This survey has now closed.

 

As a consultant, you get attuned to (or as some would have it, “cynical”) certain key phrases and statements when you’re in meetings. Sometimes these statements are innocent and exactly what the person says, but usually they set the alarm bells ringing.

As a bit of winding down after a hectic 7 days, I thought I’d share the top 15 statements that cause me to start immediately trying to get deep qualification of what I’ve just been told…

What they say...What I worry it means...
"Our backup results get filed automatically and someone reviews them.""We have a server that hasn't successfully backed up for 6 months, but no-one's been checking the notifications."
"All our backups fit on a single tape""We upgrade our hardware every time this isn't the case."
"We're very selective about what we backup.""We have critical production systems we forgot to add to our schedule."
"We don't want to get backup notifications.""Backup? Meh."
"Our DBAs do their own backups.""The DBAs don't believe in enterprise backup software and think dumps are better" ... OR ... "The backup administrators have lost control of the system and its spiralling out of control."
"We don't have SLAs""No one wants ownership of establishing SLAs"
"We don't need SLAs""We trust in luck, and hope we don't ever need SLAs"
"Our users are responsible for backing up their laptops""Every day we're losing critical data that may be legally or fiscally required by the company."
"We don't have to do monthly backups.""Even though we know we SHOULD do monthly backups, until someone puts it in writing, we're not going to."
"We've been asked to shrink our backup budget...""The business has this crazy idea that backup is an IT function and problem."
"Tape is dead""Someone with a vested interest in selling lots of HDD storage has visited lately."
"We do per-incident support.""We have an Icarus support contract."
"It's too busy here to do capacity planning.""We're wasting money as fast as we can get the budget for it."
"We don't need to {clone or otherwise duplicate} our backups.""We're going to suffer a critical data loss situation."
"We only backup production data.""A lot of people's work within the company is unprotected."

 

The folks over at 37 Signals published a little piece of what I would have to describe as crazy fiction, about how the combination of cloud and more technically savvy users means that we’re now seeing the end of the IT department.

I thought long and hard about writing a rebuttal here, but quite frankly, their lack of logic made me too mad to publish the article on my main blog, where I try to be a little more polite.

So, if you don’t mind a few strong words and want to read a rebuttal to 37 Signals, check out my response here.

© 2012 The NetWorker Blog Suffusion theme by Sayontan Sinha