When we notice our cloud has stopped raining, it’s time to take a look under the hood to see what happened? Or, is there a better place to look before we raise the hood? A few questions to ask:
1) Was it something I did?
2) Was it something that happened inside one of the Azure instances?
3) Did the application run out of work?
4) Where can I look to see what was going on when it stopped?
Only you can answer the first question. If all of your tests aren’t, or weren’t passing, and promoted something to a production instance you might be able to answer this fairly easily.
The second question assumes you’ve can get to your management portal and look at the analytics surfaced by Azure. There might have been, or might be, a problem with one or more of your instances restarting. I’ve never seen either of my instances stay down after a restart unless there was an unhandled exception getting tossed around. Usually I find these problems in the local dev fabric before I promote. Sometimes I don’t though, so on a few occasions even though my tests were passing I had missed some critical piece of configuration that my local configuration had, and the cloud config was missing. I call this PIBKAC – problem is between keyboard and chair. Usually the analytics are enough to tell you if there were problems. And from there you can fix configuration if needed, or restart your instances or other Azure feature you’ve got tied to the application.
The third question is kind of a sunny day scenario where the solution is going what its supposed to in a very performant way. However, sometimes ports can get ignored b/c of a configuration issue like I mentioned prior as one example. If you’ve been storing your own health monitoring points you can probably tell if your application has stopped listening for new requests, or simply just can’t process anything.
The fourth question talks about having something that’s looking around the instance(s) and capturing some of your system health points: how many messages am I receiving and trying to process; how quickly am I processing the incoming messages; are there any logs that can tell me what was going on when it stopped raining.
I’ve been using Enterprise Library from the PnP team for >6 years and I still love the amount of heavy lifting it does for me. The wire-ups are usually easy and straightforward and the support behind each libary drop is constant and focused. Recently Enterprise Libary 6 dropped with a bit of overhauling to target 4.5 among other things, and here’s a blog post by Soma that discusses a few at a high-level.
I’ve used the Data and Logging Application Blocks, as well as Unity successfully. I had recently started wiring my solution to use the Azure Diagnostics listener to capture some of the diagnostic events, particularly instance restarts from configuration changes. Now, I think/hope I can use the logging application block to wire all of my logging events and push them to something simple like blob or table storage.
I’ve never like a UI that I have to open up and look through, it just makes my eyes tired and its annoying – I’d like to have something a little more easier to lookup fatal and critical logs first then go from there. PowerShell (PS) looks cool and fitting for something like this, and I can probably do something quick and dirty from my desktop to pull down critical, fatal, or warning logs but I’m not a PS junkie. But it would make for an interesting exercise to get some PS on me. Oh, on a side not I picked up this book to (re)start my PS journey and so far it’s been worth the price I paid. Some of the EntLib docs mentioned pushing data to Azure storage so I may just start there to see if this can work.
Here’s the doc and code downloads if you want to take a look around.