{"id":3355,"date":"2011-10-14T06:16:56","date_gmt":"2011-10-13T20:16:56","guid":{"rendered":"http:\/\/nsrd.info\/blog\/?p=3355"},"modified":"2011-10-14T06:16:56","modified_gmt":"2011-10-13T20:16:56","slug":"where-rim-went-wrong","status":"publish","type":"post","link":"https:\/\/nsrd.info\/blog\/2011\/10\/14\/where-rim-went-wrong\/","title":{"rendered":"Where RIM went wrong"},"content":{"rendered":"<p>(Quick note: I posted this on my personal blog &#8211; insufficient coffees thus far this morning, and decided to repost here.)<\/p>\n<p>In case it\u2019s not been immediately obvious to anyone, I\u2019ve done some simple diagrams to explain where RIM went wrong in this catastrophic outage they\u2019ve been suffering.<\/p>\n<p>You see, most companies implement what we call\u00a0<em>redundant infrastructure<\/em>. In systems that require high availability, this is often accomplished with something as simple as clustered (either LAN or WAN) hardware and communications. Sometimes it\u2019s designed that each component runs at the same time, sharing the load, but if one fails, the other one takes over and runs all the load. In simple terms, it looks like this:<\/p>\n<p><a href=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2011\/10\/RIM-hardware.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3356\" title=\"Active\/Active Cluster\" src=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2011\/10\/RIM-hardware.jpg\" alt=\"Active\/Active Cluster\" width=\"553\" height=\"238\" srcset=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2011\/10\/RIM-hardware.jpg 553w, https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2011\/10\/RIM-hardware-300x129.jpg 300w, https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2011\/10\/RIM-hardware-500x215.jpg 500w\" sizes=\"auto, (max-width: 553px) 100vw, 553px\" \/><\/a><\/p>\n<p>That all makes sense, right?<\/p>\n<p>Unfortunately, RIM seemed more focused on having failover capabilities for upper level management, so it instead clustered its\u2019 CEOs:<\/p>\n<p><a href=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2011\/10\/RIM-CEOS.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3357\" title=\"Active\/Active CEOs\" src=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2011\/10\/RIM-CEOS.jpg\" alt=\"Active\/Active CEOs\" width=\"402\" height=\"181\" srcset=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2011\/10\/RIM-CEOS.jpg 402w, https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2011\/10\/RIM-CEOS-300x135.jpg 300w\" sizes=\"auto, (max-width: 402px) 100vw, 402px\" \/><\/a><\/p>\n<p>The supposed theory behind this is that the two CEOs, working in an active\/active arrangement, could handle load better and get the job done better than a single CEO \u2013 and provide resiliency!<\/p>\n<p>&nbsp;<\/p>\n<p>Unfortunately though, the hardware resiliency wasn\u2019t as up to scratch, and when it started to fail, RIM started having a catastrophic outage.<\/p>\n<p>&nbsp;<\/p>\n<p>Now, you may have expected at that point for the active\/active CEO cluster to step in and help. Unfortunately though, they\u2019ve barely been heard from. So, in cluster terms, we have to assume a sort of reversed split-brain situation has occurred, where both components of the cluster think the other component is still running:<\/p>\n<p><a href=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2011\/10\/RIM-splitbrain.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3358\" title=\"RIM-splitbrain\" src=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2011\/10\/RIM-splitbrain.jpg\" alt=\"RIM-splitbrain\" width=\"480\" height=\"180\" srcset=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2011\/10\/RIM-splitbrain.jpg 480w, https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2011\/10\/RIM-splitbrain-300x112.jpg 300w\" sizes=\"auto, (max-width: 480px) 100vw, 480px\" \/><\/a><\/p>\n<p>And there you have it \u2013 why RIM is having their current outage.<\/p>\n<p>&nbsp;<\/p>\n<p>It\u2019s also a lesson for all you other companies out there: you need\u00a0<em>fault tolerant<\/em>\u00a0infrastructure as well as\u00a0<em>CEOs<\/em>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>(Quick note: I posted this on my personal blog &#8211; insufficient coffees thus far this morning, and decided to repost&hellip;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[3,4,5,12,13],"tags":[133,165,207,232,380,711,835],"class_list":["post-3355","post","type-post","status-publish","format-standard","hentry","category-architecture","category-aside","category-backup-theory","category-general-technology","category-general-thoughts","tag-availability","tag-blackberry","tag-ceos","tag-cluster","tag-fault-tolerance","tag-outage","tag-rim"],"aioseo_notices":[],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/pKpIN-S7","jetpack_sharing_enabled":true,"jetpack_likes_enabled":true,"_links":{"self":[{"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/posts\/3355","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/comments?post=3355"}],"version-history":[{"count":0,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/posts\/3355\/revisions"}],"wp:attachment":[{"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/media?parent=3355"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/categories?post=3355"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/tags?post=3355"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}