If you’ve been working with BMC ProactiveNet Performance Manager (BPPM) for any length of time, you may have noticed that the performance of the tool can degrade, frustrating users. There are a number of ways performance can go sideways:
- Server startup takes too long
- Delays during or immediately after user login
- Slow response time
Wringing the optimal performance out of your BPPM deployment is a deep subject that will take awhile to cover fully, but there are some basics that you can look at right away.
First, we’ll look at cell design. You probably know that BPPM’s event processing is done by components called cells. Individual cells are capable of handling large volumes of event traffic, though large shops and even shops with moderate event traffic may want to consider breaking up the workload into multiple cells.
Since BPPM cells are single-threaded processes (competing workloads have to wait in line for processing), and the console displays are populated by queries that the cell processes, display processing competes with event processing as new events arrive. Consequently, event storms or other periods of peak event flow can dramatically slow down event views in the console.
We like to create a dedicated cell just for console views and user interaction. We do this for a couple of reasons. First, by eliminating competition between user views and event processing, it speeds response times for the users. Second, it means that changes to event processing rules that require the cell to be restarted can be completed without disrupting users.
Generally, we deploy the cells in three tiers as described below:
- Event Processor – Monitoring sources send events to the bottom tier cells for refinement, low-level correlation such as de-duplication, and filtering.
- Worker – After passing this stage, we propagate them to the worker cell at the next tier. The worker cell handles any global correlations, global black out applications, and integrations such as Remedy Incident Management.
- Display – Finally, any actionable alerts are propagated to the top tier cell for operator attention. Service Impact Management should also be handled in the top level cell for user visualization.
There are other valid approaches, but we’ve found this to work well in most cases.
Was this helpful? Join our mailing list to stay informed.
[contentblock id=3 img=html.png]