{"id":4760,"date":"2013-04-08T07:43:07","date_gmt":"2013-04-07T21:43:07","guid":{"rendered":"http:\/\/nsrd.info\/blog\/?p=4760"},"modified":"2018-12-11T14:13:35","modified_gmt":"2018-12-11T04:13:35","slug":"debugging-device-daemons-on-a-storage-node","status":"publish","type":"post","link":"https:\/\/nsrd.info\/blog\/2013\/04\/08\/debugging-device-daemons-on-a-storage-node\/","title":{"rendered":"Debugging &#8211; device device daemons on a storage node"},"content":{"rendered":"<p>On one of my lab environments I recently had a problem where the devices (AFTD) on a firewalled storage node suddenly stopped working.<\/p>\n<p>This manifested in a fairly odd way:<\/p>\n<ul>\n<li>NetWorker was still running on the storage node;<\/li>\n<li><strong>nsrports<\/strong> was still perfectly functional in both directions;<\/li>\n<li><strong>echo &#8220;print type: NSRLA&#8221; | nsradmin -p 390113 -s storageNode -i<\/strong> also worked perfectly normally.<\/li>\n<\/ul>\n<p>The storage node could even initiate backups &#8230; it just couldn&#8217;t write them, because the daemons weren&#8217;t running and it was configured to use itself for backups.<\/p>\n<p>So what was going wrong?<\/p>\n<p>After stopping and restarting the services a couple of times around checking firewall rules, I decided it&nbsp;<em>couldn&#8217;t<\/em> be the firewall \u2013 I was in full control of it, and while I&#8217;m not in any means an expert on firewalls, I was 100% certain that there&#8217;d been no changes to firewall rules at all in the last few months.<\/p>\n<p>Next step was the logs, which revealed a particularly odd error, one I&#8217;d not seen before:<\/p>\n<pre>42503 06\/04\/13 18:03:55 4 2 12 1103501632 3915 0 mondas nsrmmd RPC severe Remote system error - No route to host \n57925 06\/04\/13 18:03:57 2 0 0 1103501632 3915 0 mondas nsrmmd NSR warning Exiting idle nsrmmd #8 after 3 unsuccessful remap attempts to \/nsr\/tmp\/snmd_mmf.map memory map file. \n83447 06\/04\/13 18:03:57 2 0 0 1103501632 3915 0 mondas nsrmmd NSR warning shutdown nsrmmd 8 with pid 3915. \n42503 06\/04\/13 18:03:55 4 2 12 2396641600 3916 0 mondas nsrmmd RPC severe Remote system error - No route to host \n57925 06\/04\/13 18:03:58 2 0 0 2396641600 3916 0 mondas nsrmmd NSR warning Exiting idle nsrmmd #9 after 3 unsuccessful remap attempts to \/nsr\/tmp\/snmd_mmf.map memory map file. \n83447 06\/04\/13 18:03:58 2 0 0 2396641600 3916 0 mondas nsrmmd NSR warning shutdown nsrmmd 9 with pid 3916. \n33638 06\/04\/13 18:04:01 1 5 0 1013503696 3915 0 mondas nsrmmd NSR notice Shutting down nsrmmd #8, with PID 3915, at HOST mondas \n33638 06\/04\/13 18:04:01 1 5 0 2306643664 3916 0 mondas nsrmmd NSR notice Shutting down nsrmmd #9, with PID 3916, at HOST mondas \n42503 06\/04\/13 18:04:03 4 2 12 4123437376 3909 0 mondas nsrmmd RPC severe Remote system error - No route to host \n57925 06\/04\/13 18:04:05 2 0 0 4123437376 3909 0 mondas nsrmmd NSR warning Exiting idle nsrmmd #2 after 3 unsuccessful remap attempts to \/nsr\/tmp\/snmd_mmf.map memory map file.<\/pre>\n<p>That file is new in NetWorker 8, and belongs to the <em>nsrsnmd<\/em> daemon, a new process which controls daemons on each individual storage node, alleviating that control process from nsrd on the server itself, and remains on disk between restarts of the daemons.<\/p>\n<p>A search of&nbsp;<em>support.emc.com<\/em> yielded nothing for this particular error \u2013 not unusual for a fairly esoteric looking error on a reasonably newish release of NetWorker, so without many more options to immediately try, I decided to try out the technique so disliked by engineering \u2013 I shutdown NetWorker and removed the \/nsr\/tmp directory. Upon restart &#8230; Voil\u00e0! All device daemons started up.<\/p>\n<p>I&#8217;m guessing somehow the file became corrupted; looking back over my lab logs, backup failures had started about 6 hours after a power outage, and while I&#8217;d managed to perform a controlled shutdown on UPS, the only thing I can think of is that the power&nbsp;<em>surge<\/em> before the outage may have caused a minor glitch. I doubt I&#8217;ll ever know the exact cause.<\/p>\n<p>Not fully knowing the purpose of the \/nsr\/tmp\/snmd_mmf.map file, I don&#8217;t know why it isn&#8217;t deleted on initial daemon startup, but there&#8217;s likely a reason behind it.<\/p>\n<p>In the meantime, if suddenly all your daemons stop working on a storage node under NetWorker 8, that file may be a candidate for removal.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>On one of my lab environments I recently had a problem where the devices (AFTD) on a firewalled storage node&hellip;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[16],"tags":[295,630],"class_list":["post-4760","post","type-post","status-publish","format-standard","hentry","category-networker","tag-debugging","tag-networker-8"],"aioseo_notices":[],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/pKpIN-1eM","jetpack_sharing_enabled":true,"jetpack_likes_enabled":true,"_links":{"self":[{"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/posts\/4760","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/comments?post=4760"}],"version-history":[{"count":9,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/posts\/4760\/revisions"}],"predecessor-version":[{"id":7467,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/posts\/4760\/revisions\/7467"}],"wp:attachment":[{"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/media?parent=4760"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/categories?post=4760"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/tags?post=4760"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}