How important is it to clone?

This post has now been moved to the Enterprise Systems Backup Blog. You can find it here.

11 thoughts on “How important is it to clone?”

  1. Hi Preston,

    Your first two points hit the mark but the argument about backup to another site hasn’t convinced me.

    First, it sounds like you using “historical backups” in the sense of archive. While backup and archive both make copies of data, they have independent lifecycles and purposes. It’s comparing apples and oranges — both of which are good 😉

    Second, saying “not good” when you’ve had a site failure isn’t exactly relevant to the point being discussed. It’s equivalent to saying “not good” when all your backups are destroyed. True enough but it says nothing about how well protected the “other copies” are.

  2. Hi Mike,

    You raised two concerns about the argument over backing up to another site not being sufficient reason to avoid cloning. I’ll cover both off.

    In the first, when I said “historical backups”, I didn’t mean archives – backups and archives are indeed not the same thing. What I was referring to is the long term retention backups, such as say, monthly backups that are kept for a period of years, rather than weeks or days.

    Typically when you backup to another site, you’re not backing up to standalone tape drives – the vast majority of the time it’s some form of mass storage system, regardless of whether that’s a PTL, VTL, or disk backup unit (DBU). So, this means that you’ll have more than just the most recent backup stored “online” or “nearline” at the disaster recovery/business continuity site. (Let’s refer to it as the BCS).

    A good backup system should keep a particular percentage of backup history online – the actual amount will depend on the frequency of recovery requests* – thus, if the BCS say, burns to the ground and you don’t clone, there’s a high likelihood you’ll have lost backups prior to the most recent backup as well, and that more than likely means losing data that you’ll want to recover at some point. Hence, unless you say, remove every tape from your BCS as soon as the backup completes, you’re going to lose data in a BCS failure in this scenario.

    However, that still doesn’t protect you from media failure. Say you do remove every piece of media from the BCS as soon as you’ve finished writing to it. What happens then if you request that the media comes back from its offsite storage location to the BCS, you load it into a drive to recover data that say, the financial regulators are requiring you recover, and the drive chews the tape, or the tape was dropped by the storage people and fails to mount?

    This brings us to my second point, the “Not good” point when it comes to dealing with a production site failure. Forgive my Australian tendency to understate problems. “Terrifying beyond all possible belief” I guess is the expression I would use if I were being completely open – that’s to describe the notion of having my production site disappear and having to rely on either of the following scenarios:

    (a) A cold BCS that needs to be bootstrapped from backups where there is only one copy

    (b) A ‘warm’ BCS that has been kept mostly in sync but still needs some recoveries to be done where there is only one copy

    (c) A ‘hot’ BCS that has been kept fully in sync but _suddenly_ I’m in a position where all my eggs are in the one basket, particularly if there are regulatory reasons why backups and originals can’t be kept together. (E.g., a common reason to backup prod to BCS is to get data “immediately offsite”. If you don’t have any provisioning in your backup solution for cloning, and you’re suddenly running off your BCS site with no production site to backup to, you may have a solution that is not looked too favorably upon by the powers that be.)

    The ultimate problem with failing to clone though is the risk of cascading failures: that being the potential for the media you’re trying to recover from experiencing some fault while you’re doing the recovery, thus invalidating the media and preventing the recovery from succeeding.


    * Something I cover in my book, “Enterprise Systems Backup and Recovery: A Corporate Insurance Policy.”

  3. The thing is, the cloning code in Networker is really limited. Unless you have a really, really simple setup, the built-in “clone at group completion” is not going to get it for you. Anyone who is serious about creating duplicates ends up scripting it.

    What really stinks is that it is ridiculously hard to get parallel cloning working. I have it at our site because we had EMC engaged and they provided custom scripts to do it. It works great and we could not get our cloning done without it. That said, cloning form multiple source drives to multiple target drives should just be built into the product.

    At least, it should be if cloning is so very crucial.

    1. Scott, you raise valid points – cloning in NetWorker isn’t always as admin-friendly as it could be. I do think that at least some of the time this happens its because the solution hasn’t been sized/scoped correctly. That is, it’s not uncommon to see say, physical tape libraries purchased with only just enough drives to facilitate backup, and maybe one extra drive that is meant to somehow enable cloning to work successfully. Please note I’m not saying this is your experience, just that it is a common one.

      While I’m not a big fan of VTLs, I do think that when looking at hardware aspects to solutions, having a VTL + PTL, where you backup first to the VTL then clone to the PTL, does really give cloning a boost in NetWorker, so long as the VTL is configured optimally. (By optimally I mean lots of virtual drives, and lots of small virtual tapes.)

      But looking at the software side of it, the NetWorker side, yes, there’s still a lot to be desired. Your experience, being that group cloning wasn’t sufficient, isn’t uncommon. It does most suit smaller sites, or sites where there’s no groups overlapping. Indeed, many sites don’t use group cloning, and do all cloning via scripts. Indeed, the company that I work for produces software to assist in this. If nothing else, this points to the value of having a framework based backup product such as NetWorker that allows for site-customisation to such a level.

      That being said, NetWorker 7.5 has introduced some additional features into nsrclone, such as the ability to specify how many copies you want made, etc., which will (to a degree) make scripting of cloning operations easier. It would be nice if some of these additional features made it into the group cloning criteria, but I’m not sure whether that will happen or not.

      Being able to generate multiple copies of a backup simultaneously is something that NetWorker is sadly lacking at the moment. I suggest if you feel strongly on this, you should send an email to networker_usability@emc.com – this is an email address that really is monitored actively by the product managers, and feedback to this email address is vital. Having periodically talked to several of the product managers, and having known former EMC (and Legato) NetWorker product managers, I know that the best way to get features added, or planned features expedited, is for real customer prompting.

  4. The main thing nsrclone is lacking is the ability to clone from multiple tapes simultaneously. eg. I have a list of tapes (or, frankly, savesets generated via mminfo) to clone: XXX YYY ZZZ and I have 6 available tape drives for cloning operations. Without (very) extraordinary measures, networker will use 1 source device and 1 target device and clone serially. It will not use 3 sources and 3 targets if those resources are available.

    Obviously, I have this functionality in my environment. But it should be built into the product. If I buy enough tape drives to complete backups in my backup window, the rest of the day and all of those tape drives can be used for cloning operations (reserving a few drives for ad-hoc backups and recoveries, of course!)

    And yes, I agree with your thoughts on VTL is a first backup target — particularly if you have really LARGE datasets and you care about recovery time. You really don’t want to have a high multiplexing ratio when you backup like you would normally have with high speed/high density tape media.

  5. Hi Scott,

    It’s probably important to note however that nsrclone is designed to be single threaded, for want of a better term. If you want multiple cloning operations to run simultaneously, you can run multiple nsrclone operations, which I gather is how the custom scripts designed for your site work.

    Similarly, you can achieve multi-drive cloning out of standard post-group cloning by ensuring that you have multiple groups running, and potentially different pools. That perhaps isn’t so easy to achieve, but it is still an option.

  6. If you run nsrclone manually and you launch a second instance, cloning from different sources to a different pool (or even the same target pool) you WILL NOT get parallel cloning. the second instance will wait for the first to finish. There is much more to it than that.

    EMC’s internal critical account team has developed a solution that must be customized on a per customer basis. nsrclone by design is meant to be run serially. getting it to do otherwise takes substantial effort.

    The second idea doesn’t work either.

    By default, networker will queue up subsequent nsrclone operation whether launched from the CLI or via automated mechanism built into group completion.

    Been there. done it. do not have the t-shirt.

    parallel cloning should be a feature that is easily enabled within the product. If cloning matters to you and you have a large busy environment, you can’t get the job done without it.

  7. Hi Scott,

    What you’re describing as nsrclone being totally linear when having multiple instances run with separate resource requirements it totally contrary to my 12+ years of use of NetWorker.

    Even running up a quick test environment in my lab, I was immediately able to have 2 nsrclone processes simultaneously reading from disk backup units and writing out to alternate media. If nsrclone didn’t support this form of serialisation, you wouldn’t be able to have multiple groups cloning simultaneously.

    Maybe if you setup multiple cloning jobs from within NMC as manual clones it may happen here, but my question is why use a GUI to initiate a command line activity?

    If you’re experiencing this sort of issue, there’s some bug or misconfiguration occurring for you.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.