While NetWorker 7.6 is not available for download as of the time I write this, the documentation is available on PowerLink. For those of you chomping at the bit to at least read up on NetWorker 7.6, now is the time to wander over to PowerLink delve into the documentation.

The last couple of releases of NetWorker have been interesting for me when it comes to beta testing. In particular, I’ve let colleagues delve into VCB functionality, etc., and I’ve stuck to “niggly” things – e.g., checking for bugs that have caused us and our customers problems in earlier versions, focusing on the command line, etc.

For 7.6 I also decided to revisit the documentation, particularly in light of some of the comments that regularly appear on the NetWorker mailing list about the sorry state of the Performance Tuning and Optimisation Guide.

It’s pleasing, now that the documentation is out, to read the revised and up to date version of the Performance Tuning Guide. Regularly critics of the guide for instance will be pleased to note that FDDI does not appear once. Not once.

Does it contain every possible useful piece of information that you might use when trying to optimise your environment? No, of course not – nor should it. Everyone’s environment will differ in a multitude of ways. Any random system patch can affect performance. A single dodgy NIC can affect performance. A single misconfigured LUN or SAN port can affect performance.

Instead, the document now focuses on providing a high level overview of performance optimisation techniques.

Additionally, recommendations and figures have been updated to support current technology. For instance:

  • There’s a plethora of information on PCI-X vs PCIeXpress.
  • RAM guidelines for the server based on the number of clients has been updated.
  • NMC finally gets a mention as a resource hog! (Obviously, that’s not the words used, but it’s the implication for larger environments. I’ve been increasingly encouraging larger customers to put NMC on a separate host for this reason.)
  • There’s a whole chunk on client parallelism optimisation, both for the clients and the backup server itself.

I don’t think this document is perfect, but if we’re looking at the old document vs the new, and the old document scored a 1 out of 10 on the relevancy front, this at least scores a 7 or so, which is a vast improvement.

Oh, one final point – with the documentation now explicitly stating:

The best approach for client parallelism values is:

– For regular clients, use the lowest possible parallelism settings to best balance between the number of save sets and throughput.

– For the backup server, set highest possible client parallelism to ensure that index backups are not delayed. This ensures that groups complete as they should.

Often backup delays occur when client parallelism is set too low for the NetWorker server. The best approach to optimize NetWorker client performance is to eliminate client parallelism, reduce it to 1, and increase the parallelism based on client hardware and data configuration.

(My emphasis)

Isn’t it time that the default client parallelism value were decreased from the ridiculously high 12 to 1, and we got everyone to actually think about performance tuning? I was overjoyed when I’d originally heard that the (previous) default parallelism value of 4 was going to be changed, then horrified when I found out it was being revised up, to 12, rather than down to 1.

Anyway, if you’ve previously dismissed the Performance Tuning Guide as being hopelessly out of date, it’s time to go back and re-read it. You might like the changes.

 

When I first started in system administration, I asked a colleague who was talking about SNMP what the ETLA* stood for.

His response: Simply Not My Problem.

SNMP – The Simple Network Management Protocol, is something that a lot of people get hung up over in relation to NetWorker. There’s usually two complaints:

  1. Why do I have to pay for SNMP integration?
  2. Where’s the MIBs?

The first question is more easily answered after answering the second; there are no MIBs (well, no official MIBs) for NetWorker. NetWorker doesn’t support being controlled by SNMP; instead, its SNMP “module” simply supports sending SNMP traps – broadcasts onto the network.

So now we return to the first question – why do I have to pay for SNMP integration?

The short answer is you don’t. I wouldn’t. If I were back to being a regular system administrator and someone walked in with a quote for NetWorker and included the SNMP module in it, I’d grab the quote and draw a big red line through it as being a complete waste of money.

Why pay for being able to send SNMP traps when there are free tools available for practically every operating system that allow you to do just that?

As to whether you should get hung up about NetWorker not natively supporting SNMP traps without any additional licensing – honestly, it’s not worth getting worked up about. NetWorker has a very extensible notification system where you can assign any action (read: executable with any command line actions) to practically any event within the system. With that much flexibility you can whack a free SNMP utility on the system and go to town.


* ETLA – Extended Three Letter Acronym

 

So The Register has a story about how Microsoft is edging closer to delivering it’s cloud based system, Azure.

It seems inept that through the entire article, there wasn’t a single mention of the Sidekick Debacle. As you may remember, that debacle was sponsored by ‘Danger’, a Microsoft subsidiary. If you think Microsoft weren’t involved because Danger was a subsidiary, think again.

If we can learn anything from this, it’s that too many people like to close one eye and half shut the other one to make sure they don’t see all those dark and dangerous storm clouds racing around their silver linings.

Based on Microsoft’s track record, I wouldn’t trust Azure for a minute with a KB of my data even if they were paying me. Not until there’s an industry-wide alliance for certifying cloud based solutions and ensuring vendors actually treat customer data as if it were their own most sensitive and important data. Not until Microsoft are a gold member of that alliance and have come out of their first two audits with shining covers.

Until then when it comes to Azure, all I see are dark Clouds with no silver linings.

 

It’s fair to say that no one backup product can be all things to all people. More generally, it’s fair to say that no product can be all things to all people.

Security has had a somewhat interesting past in NetWorker; much of the attention to security for a lot of the time has been to with (a) defining administrators, (b) ensuring clients are who they say they are and (c) providing access controls for directed recoveries.

There’s a bunch of areas though that have remained somewhat lacking in NetWorker for security. Not 100% lacking, just not complete. For instance, user accounts that are accessed for the purposes of module backup and recovery frequently need higher levels of authority than standard users. Equally so, some sites want their <X> admins to be able to control as much as possible of the <X> backups, but not to be able to have any administrator privileges over the <Y> backups. I’d like to propose an idea that, if implemented, would both improve security and make NetWorker more flexible.

The change would be to allow the definition of administrator zones. An “administrator zone” would be a subset of a datazone. It would consist of:

  1. User groups:
    • A nominated “administrator” user group.
    • A nominated “user” user group.
    • Any other number of nominated groups with intermediate privileges.
  2. A collection of the following:
    • Clients
    • Groups
    • Pools
    • Devices
    • Schedules
    • Policies
    • Directives
    • etc

These obviously would still be accessible in the global datazone for anyone who is a datazone administrator. Conceptually, this would look like the following:

Datazone with subset "administrator" zonesThe first thing this should point out to you is that administrator zones could, if desired, overlap. For instance, in the above diagram we have:

  1. Minor overlap between Windows and Unix admin zones (e.g., they might both have administrative rights over tape libraries).
  2. Overlap between Unix and Oracle admin zones.
  3. Overlap between Windows and Oracle admin zones.
  4. Overlap between Windows and Exchange admin zones.
  5. Overlap between Windows and MSSQL admin zones.

Notably though, the DMZ Admin zone indicates that you can have some zones that have no overlap/commonality with other zones.

There’d need to be a few rules established in order to make this work. These would be:

  1. Only the global datazone can support “<x>@*” user or group definitions in a user group.
  2. If there is overlap between two zones, then the user will inherit the rights of the highest authority they belong to. I.e., if a user is editing a shared feature between the Windows and Unix admin zones, and is declared an admin in the Unix zone, but only an end-user in the Windows zone, then the user will edit that shared feature with the rights of an admin.
  3. Similarly to the above, if there’s overlap between privileges at the global datazone level and a local administrator zone, the highest privileges will “win” for the local resource.
  4. Resources can only be created and deleted by someone with data zone administrator privileges.
  5. Updates for resources that are shared between multiple administrator zones need to be “approved” by an administrator from each administrator zone that overlaps or a datazone administrator.

Would this be perfect? Not entirely – for instance, it would still require a datazone administrator to create the resources that are then allocated to an administrator zone for control. However, this would prevent a situation occurring where an unprivileged user with “create” options could go ahead and create resources they wouldn’t have authority over. Equally, in an environment that permits overlapping zones, it’s not appropriate for someone from one administrator zone to delete a resource shared by multiple administrator zones. Thus, for safety’s sake, administrator zones should only concern themselves with updating existing resources.

How would the approval process work for edits of resources that are shared by overlapping zones? To start with, the resource that has been updated would continue to function “as is”, and a “copy” would be created (think of it as a temporary resource), with a notification used to trigger a message to the datazone administrators and the other, overlapping administrators. Once the appropriate approval has been done (e.g., an “edit” process in the temporary resource), then the original resource would be overwritten with the temporary resource, and the temporary resource removed.

So what sort of extra resources would we need to establish this? Well, we’ve already got user groups, which is a starting point. The next step is to define an “admin zone” resource, which has fields for:

  1. Administrator user group.
  2. Standard user group.
  3. “Other” user groups.
  4. Clients
  5. Groups
  6. Pools
  7. Schedules
  8. Policies
  9. Directives
  10. Probes
  11. Lockboxes
  12. Notifications
  13. Labels
  14. Staging Policies
  15. Devices
  16. Autochangers
  17. etc.

In fact, pretty much every resource except for the server resource itself, and licenses, should be eligible for inclusion into a localised admin group. In it’s most basic, you might expect to see the following:

nsradmin> print type: NSR admin zone; name: Oracle
type: NSR admin zone;
name: Oracle;
administrators: Oracle Admins;
users: Oracle All Users;
other user groups: ;
clients: delphi, pythia;
groups: Daily Oracle FS, Monthly Oracle FS,
Daily Oracle DB, Monthly Oracle DB;
pools: ;
schedules: Daily Oracle, Monthly Oracle;
policies: Oracle Daily, Oracle Monthly;
directives: pythia exclude oracle, delphi exclude oracle;
...etc...

To date, NetWorker’s administration focus has been far more global. If you’re an administrator, you can do anything to any resource. If you’re a user, you can’t do much with any resource. If you’ve been given a subset of privileges, you can use those privileges against all resources touched by those privileges.

An architecture that worked along these lines would allow for much more flexibility in terms of partial administrative privileges in NetWorker – zones of resources and local administrators for those resources would allow for more granular control of configuration and backup functionality, while still keeping NetWorker configuration maintained at the central server.

 

For what it’s worth, I believe that the continuing lack of support for renaming clients as a function within NMC, (as opposed to the current, highly manual process), represents an annoying and non-trivial gap in functionality that causes administrators headaches and undue work.

For me, this was highlighted most recently when a customer of mine needed to shift their primary domain, and all clients had been created using the fully qualified domain name. All 500 clients. Not 5, not 50, but 500.

The current mechanisms for renaming clients may be “OK” if you only rename one client a year, but more and more often I’m seeing sites renaming up to 5 clients a year as a regular course of action. If most of my customers are doing it, surely they’re not unique.

Renaming clients in NetWorker is a pain. And I don’t mean a “oops I just trod on a pin” style pain, but a “oh no, I just impaled my foot on a 6 inch rusty nail” style pain. It typically involves:

  • Taking care to note client ID
  • Recording the client configuration for all instances of the client
  • Deleting all instances of the client
  • Rename the index directory
  • Recreate all instances of the client, being sure on first instance creation to include the original client ID

(If the client is explicitly named in pool resources, they have to be updated as well, first clearing the client from those pools and then re-adding the newly “renamed” client.)

This is not fun stuff. Further, the chance for human error in the above list is substantial, and when we’re playing with indices, human error can result in situations where it becomes very problematic to either facilitate restores or ensure that backup dependencies have appropriate continuity.

Now, I know that facilitating a client rename from within a GUI isn’t easy, particularly since the NMC server may not be on the same host as the NetWorker server. There’s a bunch of (potential pool changes), client resource changes, filesystem changes and the need to put in appropriate rollback code so that if the system aborts half-way through it can revert at least to the old client name.

As I’ve argued in the past though, just because something isn’t easy doesn’t mean it shouldn’t be done.

 

Some time ago, I posted a blog entry titled Carry a Jukebox with you, if you’re using Linux, which referred to using linuxvtl with NetWorker. The linuxvtl project is run by my friend Mark Harvey, who has been working with enterprise backup products as long as me.

At the time I blogged, the key problem with the LinuxVTL implementation was that NetWorker didn’t recognise the alternate device IDs generated by the code – it relied on WWNN’s, which were the same for each device.

I was over the moon when I received an email from Mark a short while ago saying he’s now got multiple devices working in a way that is compatible with NetWorker. This is a huge step forward for Linux VTL.

So, what’s changed?

While I’ve not had confirmation from Mark, I’m working on the basis that you do need the latest source code (mhvtl-2009-11-10.tgz as of the time of writing).

The next step, to quote Mark, is that we need to step away from StorageTek and define the library as SpectraLogic:

p.s. The “fix” is to define the robot as a Spectralogic NOT an L700.
The STK L700 does not follow the SMC standards too well. It looks like
NetWorker uses the ‘L700′ version and not the standards.
The Spectralogic follows the SMC standards (or at least their
interruption is the same as mine :) )

The final part is to update the configuration files to include details that allow the VTL code to generate unique WWNNs for NetWorker’s use.

Starting out with just 2 devices, here’s what my inquire output now looks like:

[root@tara ~]# inquire -l

-l flag found: searching all LUNs, which may take over 10 minutes per adapter
	for some fibre channel adapters.  Please be patient.

scsidev@0.0.0:SPECTRA PYTHON    5500|Autochanger (Jukebox), /dev/sg2
			        S/N:	XYZZY
			        ATNN=SPECTRA PYTHON          XYZZY
			        WWNN=11223344ABCDEF00
scsidev@0.1.0:QUANTUM SDLT600   5500|Tape, /dev/nst0
			        S/N:	ZF7584364
			        ATNN=QUANTUM SDLT600         ZF7584364
			        WWNN=11223344ABCDEF01
scsidev@0.2.0:QUANTUM SDLT600   5500|Tape, /dev/nst1
			        S/N:	ZF7584366
			        ATNN=QUANTUM SDLT600         ZF7584366
			        WWNN=11223344ABCDEF02

As you can see – each device has a different WWNN now, which is instrumental for NetWorker. (Note, I have adjusted the spacing slightly to make sure it fits in.)

Finally, here’s what my /etc/mhvtl/device.conf and /etc/mhvtl/library_contents files now look like:

[root@tara mhvtl]# cat device.conf

VERSION: 2

# VPD page format:
# <page #> <Length> <x> <x+1>... <x+n>

# NOTE: The order of records is IMPORTANT...
# The 'Unit serial number:' should be last (except for VPD data)
# i.e.
# Order is : Vendor ID, Product ID, Product Rev and serial number finally
# Zero, one or more VPD entries.
#
# Each 'record' is sperated by one (or more) blank lines.
# Each 'record' starts at column 1

Library: 0 CHANNEL: 0 TARGET: 0 LUN: 0
 Vendor identification: SPECTRA
 Product identification: PYTHON
 Product revision level: 5500
 Unit serial number: XYZZY
 NAA: 11:22:33:44:ab:cd:ef:00

Drive: 1 CHANNEL: 0 TARGET: 1 LUN: 0
 Vendor identification: QUANTUM
 Product identification: SDLT600
 Product revision level: 5500
 Max density: 0x46
 NAA: 11:22:33:44:ab:cd:ef:01
 Unit serial number: ZF7584364
 VPD: b0 04 00 02 01 00

Drive: 2 CHANNEL: 0 TARGET: 2 LUN: 0
 Vendor identification: QUANTUM
 Product identification: SDLT600
 Product revision level: 5500
 Max density: 0x46
 NAA: 11:22:33:44:ab:cd:ef:02
 Unit serial number: ZF7584366
 VPD: b0 04 00 02 01 00

[root@tara mhvtl]# cat library_contents
# Define how many tape drives you want in the vtl..
# The ‘XYZZY_…’ is the serial number assigned to
# this tape device.
Drive 1: ZF7584364
Drive 2: ZF7584366
# Place holder for the robotic arm. Not really used.
Picker 1:
# Media Access Port
# (mailslots, Cartridge Access Port, <insert your favourate name here>)
# Again, define how many MAPs this vtl will contain.
MAP 1:
MAP 2:
MAP 3:
MAP 4:
# And the ‘big’ on, define your media and in which slot contains media.
# When the rc script is started, all media listed here will be created
# using the default media capacity.
Slot 1: 800843S3
Slot 2: 800844S3
Slot 3: 800845S3
Slot 4: 800846S3
Slot 5: 800847S3
Slot 6: 800848S3
Slot 7: 800849S3
Slot 8: 800850S3
Slot 9: 800851S3
Slot 10: 800852S3
Slot 11: 800853S3
Slot 12: 800854S3
Slot 13: 800855S3
Slot 14: 800856S3
Slot 15: 800857S3
Slot 16: 800858S3
Slot 17: 800859S3
Slot 18: 800860S3
Slot 19: 800861S3
Slot 20: 800862S3
Slot 21: BIG990S3
Slot 22: BIG991S3
Slot 23: BIG992S3
Slot 24: BIG993S3
Slot 25: BIG994S3
Slot 26: BIG995S3
Slot 27: BIG996S3
Slot 28: BIG997S3
Slot 29: BIG998S3
Slot 30: BIG999S3
Slot 31: CLN001L1
Slot 32: CLN002L1

NOTE in the “device.conf” file the NAA entries – these are key!

With these changes done, jbconfig worked without missing a beat, and suddenly I had a 2 drive VTL running.

Great going, Mark!

While I’ve not yet tested, I suspect this fix will also ensure that the VTL can be configured on multiple storage nodes, which will be a fantastic improvement for library support work as well.

[Edit, 2009-11-18]

I’m pleased to say that the changes that have been made allow for the VTL to be created on more than one storage node. This presents excellent opportunities for debugging, testing and training:

LinuxVTL on server and storage node

 

So yesterday I got an excited email from Virgin’s Velocity Rewards Program*.

It enthusiastically read:

Preston, surprise! You’ve turned gold!

This was news to me. I looked at my skin and indeed, I was my normal complexion – I had not suddenly morphed into a victim in a bad James Bond movie.

Reading on, I found out:

We’ve got a treat for you – a free upgrade to Velocity Gold! Given you came so close to making it on your own, we wanted to say thanks so much for your ongoing commitment to the Virgin Blue Group, we really love having you around.

I initially thought that Virgin had lowered their standards somewhat on their Velocity rewards program – after all, I hadn’t flown with them since February 2006. However, I decided after reading the rather spartan programme of goodies that one got with Gold Velocity rewards that it was just some marketing stunt to get me to go fly Virgin again after such a long absence. “Aww….”, I thought to myself, “Some programmer has developed an algorithm that puts me into the ‘needs to fly with us again’ bucket.”

About 3 hours later though, the explanation came through – it was just an error. Fortunately though, there was an explanation … it was caused by Friday the 13th:

friday the 13th strikes

Oops! Due to an error you’ve received our previous email by mistake. Please disregard the free upgrade communication as unfortunately you do not qualify for that upgrade.

We apologise for any inconvenience caused.

Blaming it on Friday the 13th explains everything. Way to go, Virgin. A pity really, I was half considering booking a flight with the Gold Rewards for a Friday afternoon sometime, just so I could hear the cabin crew doing their funny safety messages. (The best I ever heard was on a return trip from Melbourne, “In the event of your aeroplane becoming a cruise ship, there is a life jacket under your seat … your life jacket has a light, and a whistle for attracting sailors.”)


* Note – I said the email was excited. It didn’t really excite me all that much.

 

Something that continues to periodically come up is the need to remind people running manual staging to ensure they specify both the SSID and the Clone ID when they stage. I did some initial coverage of this when I first started the blog, but I wanted to revisit and demonstrate exactly why this is necessary.

The short version of why is simple: If you stage by SSID alone, NetWorker will delete/purge all instances of the saveset other than the one you just created. This is Not A Good Thing for 99.999% of what we do within NetWorker.

So to demonstrate, here’s a session where I:

  1. Generate a backup
  2. Clone the backup to tape
  3. Stage the saveset only to tape

In between each step, I’ll run mminfo to get a dump of what the media database says about saveset availability.

Part 1 – Generate the Backup

Here’s a very simple backup for the purposes of this demonstration, and the subsequent mminfo command to find out about the backup:

[root@tara ~]# save -b Default -LL -q /etc
save: /etc  106 MB 00:00:07   2122 files
completed savetime=1258093549

[root@tara ~]# mminfo -q "client=tara.pmdg.lab,name=/etc" -r volume,ssid,cloneid,
savetime
 volume        ssid          clone id  date
Default.001    2600270829  1258093549 11/13/2009
Default.001.RO 2600270829  1258093548 11/13/2009

There’s nothing out of the ordinary here, so we’ll move onto the next step.

Part 2 – Clone the Backup

We’ll just do a manual clone to the Default Clone pool. Here we’ll specify the saveset ID alone, which is fine for cloning – but is often what leads people to being in the habit of not specifying a particular saveset instance. I’m using very small VTL tapes, so don’t be worried that in this case I’ve got a clone of /etc spanning 3 volumes:

[root@tara ~]# nsrclone -b "Default Clone" -S 2600270829
[root@tara ~]# mminfo -q "client=tara.pmdg.lab,name=/etc" -r volume,ssid,cloneid,
savetime
 volume        ssid          clone id  date
800843S3       2600270829  1258094164 11/13/2009
800844S3       2600270829  1258094164 11/13/2009
800845S3       2600270829  1258094164 11/13/2009
Default.001    2600270829  1258093549 11/13/2009
Default.001.RO 2600270829  1258093548 11/13/2009

As you can see there, it’s all looking fairly ordinary at this point – nothing surprising is going on at all.

Part 3 – Stage by Saveset ID Only

In this next step, I’m going to stage by saveset ID alone rather than specifying the saveset ID/clone ID, which is the correct way of staging, so as to demonstrate what happens at the conclusion of the staging. I’ll be staging to a pool called “Big”:

[root@tara ~]# nsrstage -b Big -v -m -S 2600270829
Obtaining media database information on server tara.pmdg.lab
Parsing save set id(s)
Migrating the following save sets (ids):
 2600270829
5874:nsrstage: Automatically copying save sets(s) to other volume(s)

Starting migration operation...
Nov 13 17:34:00 tara logger: NetWorker media: (waiting) Waiting for 1 writable
volume(s) to backup pool 'Big' disk(s) or tape(s) on tara.pmdg.lab
5884:nsrstage: Successfully cloned all requested save sets
5886:nsrstage: Clones were written to the following volume(s):
 BIG991S3
6359:nsrstage: Deleting the successfully cloned save set 2600270829
Successfully deleted original clone 1258093548 of save set 2600270829
from media database.
Successfully deleted AFTD's companion clone 1258093549 of save set 2600270829
from media database with 0 retries.
Successfully deleted original clone 1258094164 of save set 2600270829
from media database.
Recovering space from volume 4294740163 failed with the error
'Cannot access volume 800844S3, please mount the volume or verify its label.'.
Refer to the NetWorker log for details.
6330:nsrstage: Cannot access volume 800844S3, please mount the volume
or verify its label.
Completed recover space operation for volume 4177299774
Refer to the NetWorker log for any failures.
Recovering space from volume 4277962971 failed with the error
'Cannot access volume 800845S3, please mount the volume or verify its label.'.
Refer to the NetWorker log for details.
6330:nsrstage: Cannot access volume 800845S3, please mount the volume
or verify its label.
Recovering space from volume 16550059 failed with the error
'Cannot access volume 800843S3, please mount the volume or verify its label.'.
Refer to the NetWorker log for details.
6330:nsrstage: Cannot access volume 800843S3, please mount the volume
or verify its label.

You’ll note there’s a bunch of output there about being unable to access the clone volumes the saveset was previously cloned to. When we then check mminfo, we see the consequences of the staging operation though:

[root@tara ~]# mminfo -q "client=tara.pmdg.lab,name=/etc" -r volume,ssid,cloneid,
savetime
 volume        ssid          clone id  date
BIG991S3       2600270829  1258095244 11/13/2009

As you can see – no reference to the clone volumes at all!

Now, has the clone data been erased? No, but it has been removed from the media database, meaning you’d have to manually scan the volumes back in order to be able to use them again. Worse, if those volumes only contained clone data that was subsequently removed from the media database, they may become eligible for recycling and get re-used before you notice what has gone wrong!

Wrapping Up

Hopefully the above session will have demonstrated the danger of staging by saveset ID alone. If instead of staging by saveset ID we staged by saveset ID and clone ID, we’d have had a much more desirable outcome. Here’s a (short) example of that:

[root@tara ~]# save -b Default -LL -q /tmp
save: /tmp  2352 KB 00:00:01     67 files
completed savetime=1258094378
[root@tara ~]# mminfo -q "name=/tmp" -r volume,ssid,cloneid
 volume        ssid          clone id
Default.001    2583494442  1258094378
Default.001.RO 2583494442  1258094377
[root@tara ~]# nsrclone -b "Default Clone" -S 2583494442

[root@tara ~]# mminfo -q "name=/tmp" -r volume,ssid,cloneid
 volume        ssid          clone id
800845S3       2583494442  1258095244
Default.001    2583494442  1258094378
Default.001.RO 2583494442  1258094377
[root@tara ~]# nsrstage -b Big -v -m -S 2583494442/1258094377
Obtaining media database information on server tara.pmdg.lab
Parsing save set id(s)
Migrating the following save sets (ids):
 2583494442
5874:nsrstage: Automatically copying save sets(s) to other volume(s)

Starting migration operation...

5886:nsrstage: Clones were written to the following volume(s):
 BIG991S3
6359:nsrstage: Deleting the successfully cloned save set 2583494442
Successfully deleted original clone 1258094377 of save set 2583494442 from
media database.
Successfully deleted AFTD's companion clone 1258094378 of save set 2583494442
from media database with 0 retries.
Completed recover space operation for volume 4177299774
Refer to the NetWorker log for any failures.

[root@tara ~]# mminfo -q "name=/tmp" -r volume,ssid,cloneid
 volume        ssid          clone id
800845S3       2583494442  1258095244
BIG991S3       2583494442  1258096324

The recommendation that I always make is that you forget about using saveset IDs alone unless you absolutely have to. Instead, get yourself into the habit of always specifying a particular instance of a saveset ID via the “ssid/cloneid” option. That way, if you do any manual staging, you won’t wipe out access to data!

 

I’ve been using Parallels Desktop for the Mac for several years now – in fact, when I originally started using it, VMware were only talking about doing a desktop virtualisation product for the Mac.

That’s partly why I stay loyal to Parallels – they supported the platform sooner. The other reason is that after years of hearing tripe from VMware employees about why you couldn’t mix windows from both operating systems, Parallels went ahead and did it with Coherence. (Yes, VMware Fusion’s Unity functionality went there too, around the same time, but the amount of times I heard it wasn’t a feature they were interested in within Workstation, as an example, drove me nuts.)

So when Parallels Desktop v5 for the Mac came out, I jumped on the upgrade bandwagon. It’s given me one major positive – I can now run Solaris 10 AMD 64-bit on my Mac Pro; that was the one remaining OS I absolutely need to periodically run that I was being blocked from doing previously. After about 3 days of installing, reinstalling, downloading more recent versions of Solaris 10 AMD, I even finally managed to get networking operational within the VM too, which meant it was useful.

Other than that, Parallels v5 has been a bit of a disappointment. You see, currently for me, Coherence mode only works if I’ve got a single monitor attached to my machine. The only time that happens is when I’m using my laptop away from my desk, or when I’m traveling – and those are times I’m less likely to run VMs. Since Coherence doesn’t work for me 99% of the time, that means I can’t really try out the Crystal mode – though I’ll admit, I’m unlikely to use it heavily; I dislike the “shared apps” approach offered by Parallels, and the new Crystal view mode seems to rely heavily on this.

It does continue to annoy me that I don’t have the option of virtually turning the monitor off – why can’t I close the console for a running VM? Surely that’s not so huge a thing that it can only be limited to enterprise class virtualisation – aka ESX, VMware Server or Parallels Server. When you have 10+ VMs running at once, even minimised all those consoles start to get annoying.

While overall I’m pretty happy with Parallels performance in terms of memory and CPU, recent support cases have highlighted to me that when it comes to translated IO, Parallels struggles – it seems to peak at about 20MB/s per VM, regardless of the throughput capabilities of the attached device. I first noticed that under v4, and was unhappy to see no change in v5.

If you’re currently a Parallels Desktop for Mac user, and you haven’t yet upgraded, I’d suggest holding off until the next build of v5, rather than jumping into the initial build released. Hopefully by then they’ll at least have Coherence reliably working again.

[Edit, 2009-11-14]

I’d like to take back my comments about Parallels 5 still having mediocre IO performance. I just realised, a short while ago, that one of the VMs I’d been testing with had failed in its VMware Tools update. Now, with both VMs in this particular test config updated to Parallels Tools v5, instead of getting 14-17MB/s transfer speed between them as I’d been getting under Paralles v4, I’m now getting 45-57MB/s. Now that’s a performance improvement.

 

It goes without a doubt that we have to get smarter about storage. While I’m probably somewhat excessive in my personal storage requirements, I currently have 13TB of storage attached to my desktop machine alone. If I can do that at the desktop, think of what it means at the server level…

As disk capacities continue to increase, we have to work more towards intelligent use of storage rather than continuing the practice of just bolting on extra TBs whenever we want because it’s “easier”.

One of the things that we can do to more intelligently manage storage requirements for either operational or support production systems is to deploy deduplication where it makes sense.

That being said, the real merits of target based deduplication become most apparent when we compare it to source based deduplication, which is where the majority of this article will now take us.

A lot of people are really excited about source level deduplication, but like so many areas in backup, it’s not a magic bullet. In particular, I see proponents of source based deduplication start waving magic wands consisting of:

  1. “It will reduce the amount of data you transmit across the network!”
  2. “It’s good for WAN backups!”
  3. “Your total backup storage is much smaller!”

While each of these facts are true, they all come with big buts. From the outset, I don’t want it said that I’m vehemently opposed to source based deduplication; however, I will say that target based deduplication often has greater merits.

For the first item, this shouldn’t always be seen as a glowing recommendation. Indeed, it should only come into play if the network is a primary bottleneck – and that’s more likely going to be the case if doing WAN based backups as opposed to regular backups.

In regular backups while there may be some benefit to reducing the amount of data transmitted, what you’re often not told is that this reduction comes at a cost – that being increased processor and/or memory load on the clients. Source based deduplication naturally has to shift some of the processing load back across to the client – otherwise the data will be transmitted and thrown away. (And otherwise proponents wouldn’t argue that you’ll transmit less data by using source based backup.)

So number one, if someone is blithely telling you that you’ll push less data across your network, ask yourself the following questions:

(a) Do I really need to push less data across the network? (I.e., is the network the bottleneck at all?)

(b) Can my clients sustain a 10% to 15% load increase in processing requirements during backup activities?

This makes the first advantage of source based deduplication somewhat less tangible than it normally comes across as.

Onto the second proposed advantage of source based deduplication – faster WAN based backups. Undoubtedly, this is true, since we don’t have to ship anywhere near as much data across the network. However, consider that we backup in order to recover. You may be able to reduce the amount of data you send across the WAN to backup, but unless you plan very carefully you may put yourself into a situation where recoveries aren’t all that useful. That is – you need to be careful to avoid trickle based recoveries. This often means that it’s necessary to put a source based deduplication node in each WAN connected site, with those nodes replicating to a central location. What’s the problem with this? Well, none from a recovery perspective – but it can considerably blow out the cost. Again, informed decisions are very important to counter-balance source based deduplication hyperbole.

Finally – “your total backup storage is much smaller!”. This is true, but it’s equally an advantage of target based deduplication as well; while the rates may have some variance the savings are still great regardless.

Now let’s look at a couple of other factors of source based deduplication that aren’t always discussed:

  1. Depending on the product you choose, you may get less OS and database support than you’re getting from your current backup product.
  2. The backup processes and clients will change. Sometimes quite considerably, depending on whether your vendor supports integration of deduplication backup with your current backup environment, or whether you need to change the product entirely.

When we look at those above two concerns is when target based deduplication really starts to shine. You still get deduplication, but with significantly less interruption to your environment and your processes.

Regardless of whether target based deduplication is integrated into the backup environment as a VTL, or whether it’s integrated as a traditional backup to disk device, you’re not changing how the clients work. That means whatever operating systems and databases you’re currently backing up you’ll be able to continue to backup, and you won’t end up in the (rather unpleasant) situation of having different products for different parts of your backup environment. That’s hardly a holistic approach. It may also be the case that the hosts where you’d get the most out of deduplication aren’t eligible for it – again, something that won’t happen with target based deduplication.

The changes for integrating target based deduplication in your environment are quite small –  you just change where you’re sending your backups to, and let the device(s) handle the deduplication, regardless of what operating system or database or application or type of data is being sent. Now that’s seamless.

Equally so, you don’t need to change your backup processes for your current clients – if it’s not broken, don’t fix it, as the saying goes. While this can be seen by some as an argument for stagnation, it’s not; change for the sake of change is not always appropriate, whereas predictability and reliability are very important factors to consider in a data protection environment.

Overall, I prefer target based deduplication. It integrates better with existing backup products, reduces the number of changes required, and does not place restrictions on the data you’re currently backing up.

© 2012 The NetWorker Blog Suffusion theme by Sayontan Sinha