Using the BPPM Baseline to look back at monitoring history
You’ve probably heard the quote “Those who cannot remember the past are condemned to repeat it”. Who would ever argue with that logic? When it comes to managing your IT assets, the observation of history can serve us in another useful way. Stated differently in the context of monitoring you could say, in order to know how to act upon something in the future, you must know how it has behaved in the past. That is precisely what the BPPM Baseline takes care of. Application Performance Management (APM) is going through an evolutionary advance that is completely changing the way we monitor and act upon data, and the BPPM Baseline is major part of this advancement.
Recently we discussed and showed you “Understanding BPPM Analytics”. In that presentation we covered BPPM Analytics and some of the technical components associated with the BPPM 9.5 software suite, which you’ve probably heard numerous times but may not fully understand. Be sure to watch that presentation if you have questions before or after reading this. It is complementary to this information.
In this presentation we are going to discuss just one of those items. The BPPM Baseline, how it works and why you should be using it right now.
I’ve been involved with Enterprise Systems Management (ESM) or Business Service Management (BSM), and now APM, pick your label, for almost 20 years. One item has been the plague of monitoring that entire time: the false alert. Even if the monitoring is accurate the vast majority of time, it only takes a few wrong alerts waking people up in the middle of the night to spoil all of your efforts and credibility. Up until just a few years ago it was a problem without a legitimate solution. They were simply an unavoidable part of the job and you did what you could to avoid them. No More! With BMC Analytics and the BPPM Baseline monitoring capability you can say good-bye to those days forever! Now let me explain why.
The Dawn of Monitoring and the Creation of the Threshold WAG
At the dawn of IT monitoring, people understood you had to collect data and with that data determine when something was wrong. People jumped in and specified what they thought the point or range was when things would become problematic. At that moment we had the creation of the monitoring Wild Assumed Guess (WAG). And yes, I knowingly added a few letters to one of those words.
As the person who was responsible for setting up the monitoring I relied on the subject matter expert or IT asset owner to tell me what they thought the alarm ranges should be. This was always the safest first step to take to try and avoid false alerts. After all, they were the subject matter experts on their components, not me. Most of the time, I was given a WAG that would eventually be seen as erroneous. If they deferred to me, my immediate inclination was to let the monitoring collect data and circle back over time to make a determination later as to what the ranges should be. During that collection period, if there were any type of issue or abnormality I could use that to guide me even more in my alert range specifications. I could look at the values collected just prior to the issue and use that information to setup proactive alert settings. Initially this worked for the most part. However it only takes a few false alerts to cement bad feelings with the people being woken up wrongly.
Until the last few years there was almost nothing that could be done about this issue. The only way to really avoid false alerts would be to constantly watch the monitoring around the clock and look back in time to see if the pattern(s) were normal or abnormal. That is extremely difficult for any person or operations staff to do with just a few hundred parameters, and absolutely impossible with todays IT infrastructures consisting of thousands upon thousands of monitored parameters. There had to be a way to take the data collection and analyze it automatically.
Monitoring Necessity Leads to Evolutionary Advance
I remember going to a BMC seminar around 2005 and seeing a booth called ProactiveNet. The pitch they gave sounded like music to my ears. Use their software and replace of all of your absolute thresholds. Let the history of the data collection dictate what is normal or abnormal and let it set the alert ranges. This was totally in line with my approach of collecting data, and then using the past behavior to set the alert ranges. It made perfect sense to me! My individual approach however was hindered by the time I made the observation, and by the amount or length of time I had collected monitoring data. If I only had two weeks worth of collected data, then my ability to determine normalcy was limited to just two weeks. The ProactiveNet software performed this action around the clock continuously. This solution sounded perfect to me.
In June of 2007, BMC Software purchased the ProactiveNet solution. The press release stated, “BMC Software today announced that it has completed the acquisition of ProactiveNet, Inc. the most advanced ‘early warning system’ for IT on the market”.
Over the last 7 years BMC has taken that one product and made it the flagship of their Performance and Availability monitoring space. After seven years, it has evolved more and more and with the recent release of BPPM 9.5 in January of this year, it has matured into an incredible product. This is now where BPPM Analytics and BPPM Baselines come in.
Looking Back and Into the Future Proactively
We now understand in order to determine if something is acting abnormal we must first determine what is normal. In order to determine this we have to look backward at the historical data. This is precisely how BPPM Analytics works to establish each of your key parameter’s “Baselines”. It takes the prior history of the parameter, and with a BMC black box algorithm it creates the baseline ranges. So a baseline is simply a backward observation, which is used going forward to determine normal or abnormal behavior. And that is the key. Using the historical baseline you can then easily determine when something is behaving abnormally. The BPPM software does this automatically now. No more WAG!!! No more finger pointing on who specified what and when wrongly. The software is now capable of performing this action without any human involvement. And with the size of most IT enterprises today, it couldn’t have come at a better time.
But wait, there’s more! If you were thinking of a baseline as a single range component, you would be wrong. We can thank the out of date absolute threshold for our initial perceptions of what a baseline is. A monitoring baseline is more than a single range component. Let me explain this by providing an absolute threshold example everyone is familiar with.
If you wanted to specify an alert range for CPU Utilization, how would you do it? Almost everyone would say something like this. If the CPU utilization falls within 95 and 97% then generate a WARNING, and if it goes over 97% generate an ALARM. This seems perfectly logical and very very normal, however something is missing from this.
The specifications provided in the example above are in one direction. UP. We have been trained to think almost entirely in one direction with alert settings. What if the CPU falls to 10% utilization or even lower to 1% utilization? If this is a production server it’s CPU should always be busy processing the business service. After all if it’s being under utilized, that might mean there is a application or database issue which needs attention, just as much as if it is over utilized.
This is where a BPPM baseline far exceeds the use of absolute thresholds. You can alert for an upper and a lower baseline deviation, or abnormality. That is exactly what you need today. Using a BPPM baseline we can now determine if the CPU is being overly utilized or under utilized, based on past behavior. Having that action performed on a minute-by-minute basis with BPPM Baselines is evolutionary at the least, and revolutionary at best. Having it performed on thousands of parameters every minute without anyone required to assess normal and abnormal behavior is brilliant.
Going Beyond the Basics of Baselines
You now hopefully have an understanding of why the BPPM baseline is so different in its use, compared to the old way of monitoring, and how it can aid you in determining what is normal and abnormal about past behavior. By using this automatic abnormality detection, you can now move beyond and replace your absolute thresholds, and get rid of false alerts. What I will cover next week, in part 2, is what types of BPPM baselines there are and how to use each of them.
Like I said, there isn’t just one Baseline specification. BMC has provided more than one, and I’ll explain how each BPPM Baseline type can be used in different ways. For now it’s important that people start to really understand just what a BPPM Baseline is, and then start getting in the practice of using them. Lets get rid of false alerts once and for all!! I know that is absolutely fine with me.
In the mean time, be sure to watch our previous “Understanding BPPM Analytics” video presentation. In it we explain Key Performance Indicators (KPI), Signature Thresholds, Intelligent Thresholds and how they all work together with the BPPM Baseline. It’s only a few minutes long and its sole purpose is to help you understand all the jargon that’s flying around.
Continue with our other BMC BPPM Analytics and Baselines Posts…
Understanding The Different BMC BPPM Baselines – What are the 5 user specific types of BPPM Baselines and what makes them different
Setting up BMC BPPM Intelligent Thresholds – Convert problematic static absolute thresholds into dynamic thresholds. See specific instruction on how to setup BMC BPPM Intelligent Thresholds
Are you using the BMC BPPM 9.5 advantages in your infrastructure yet? Let us get your BPPM 9.5 architectural design started today and get rid of your false alerts once and for all! Why wait?
If you found this information useful, use the share links below to post to Twitter or LinkedIn…