Advertisment

Proactive Alerts Management

author-image
PCQ Bureau
New Update

Downtime is perhaps one word IT managers dread the most. It's to avoid

downtime that they invest heavily in management software, services, etc. In

today's complex IT infrastructure, there are hundreds of different servers

running, and many are inter dependant on each other. So, even if one goes down,

it won't just bring down one service, but several of them.

Advertisment

Imagine a scenario where your application server goes offline due to a minor

fault. This would affect other business applications that are dependant on it,

resulting not only in downtime, but also business losses. Even though bringing

it back up could be a half-a-minute job, but finding out about it and then

finding the right server in the data center, and fixing it can take longer. If

you have an online business portal, then even this much time can translate into

huge financial losses. So what do you do?

Direct

Hit!
Applies

to:
IT managers
USP:

A platform that allows centralized management of distributed IT infrastructure

Links:

http://tinyurl.com/mnecx 
Google

keywords:
HP OpenView Operation

That's where a good management solution comes in, which not only reports

services that are down, but also takes proactive action to correct the problem.

We'll take you through one such product called the HP OpenView Operations (HPOVO)

Manager 7.5. This allows you to centrally manage a distributed and heterogeneous

IT infrastructure proactively.

Advertisment

Key functions of HPOVO



The OpenView Operations Manager provides centralized operations management by
detecting problem events, automatically takes action on some events while

sending the other events for management to the processing console. You can set

predefined policy criteria on some of the critical events to take automatic

actions. For example, if for some reason the IIS server stops and an event is

generated and passed to the OpenView Operations Management, its translated into

an event code and the predefined policy rules are executed to restart the IIS

service. Events are key data that tell us about the small or big problems in the

infrastructure. OpenView Operations gives you a framework to present this data

to a centralized repository, so that immediate action could be taken remotely or

manually before it affects the business processes. With this, you can manage

operating systems, applications, middleware (e.g., databases), and services,

allowing operators to work collaboratively in troubleshooting problems.

Here, you can see detailed information about the error. You are also given a remedy to troubleshoot it

How it works



Before Implementing HP OpenView Operations Manager in live state, it's
necessary to understand how it works. This helps the IT manager to deploy the

system seamlessly. Plus he gets an idea of which services and applications can

be managed effectively using this system. The software is a distributed

intelligent automation management system, which also provides fault management

and workflow. The key steps this system uses are as follows:

Advertisment

1. Collecting data from events log data, general system messages and SNMP

traps: Intelligent agents are installed on all servers. These detect any failure

and performance degradation of any source on the managed system. They monitor

the system and application log files, general system messages, SNMP traps and

variables, hardware components (such as disks and CPUs) and custom variables

from any application. All events are collected, even if the network connection

to the central management station is down.

2. Collection of processing data: The data of collected events is converted

into a standard internal format, regardless of the original source. Then the

irrelevant and duplicated events are filtered out and stored in a central

repository. Events can trigger pre-defined automatic actions, including sending

messages to the operator console.

Processing also includes adding important or critical status information and

grouping events into categories such as 'security' or 'OS.' Using the

built-in notification service, events can be automatically forwarded to other

applications, for example, sending SMS alerts.

Advertisment
You can run commands or scripts to resolve the problem from the same window. This saves time and reduces downtime 

3. Presenting events' data to operator console: The events' data is

presented to Operator console in a consistent format in six different

color-coded severity states, which clearly indicate the severity of failure or

performance degradation.

The operator can drill down to information about available actions and

annotations attached to a message. It also gives an event-specific instructions

guide to the operator with problem resolution process, to quickly resolve a

problem.

Advertisment

4. Action taken: HP OpenView Operations Manager provides flexible mechanisms

to trigger response to every critical event. As said earlier, automated

pre-defined actions can be fired automatically when an event occurs. In order to

facilitate troubleshooting, operators can initiate pre-defined tool actions with

a single mouse-click to fix a problem or to gather additional data such as

services running on the managed system.

All information resulting from the action execution is stored in a central

database to automate the resolution of problems over time. Operators can also

own and acknowledge events or escalate them to other operators and applications.

Plus it remains in the system.

Key benefits



One key benefit of this product is that it supports all possible OS platforms,
be it Windows, UNIX or Linux. Being a centralized point of control for the

network, servers, operating systems, applications and services, the software

makes it easy for a system manager to collect and manage all IT infrastructure

components of a business service.

Advertisment

It has out-of-the-box policy based management intelligence, which can be

extended by using application specific OpenView Operations for Windows Smart

Plug-Ins. The system can be scalable to heterogeneous environments of all sizes,

from 10—1000+ servers. And it has a capability to manage both Microsoft.NET

and J2EE applications.

As soon as the error or event is generated on the managed node, it gets transferred to the HPOVO console 

Implementation and use



The system is pretty straight forward to deploy on your setup. All you need is a
Pentium 4 machine with at least 512 MB RAM, hard disk space of 10 GB and Windows

2000/2003 server running the DNS server. The product has two components: server

and client agent.

Advertisment

First install HP OpenView Operations on the above mentioned setup and install

the agent from the standalone agent CD on the respective client machines that

you want to manage. Both the installations are straight forward using the tool's

wizard. As the system uses a SQL server database for storing information, you

can use its standalone database server, or can point the database server to the

remote SQL server running on your setup. In that case you have to provide the

server name and SA password to the wizard so that it can create the database

structure. Once the setup is over, reboot the system. You are now ready to use

this system.

On the server, launch the HPOVO console, and you will get an MMC divided into

two parts. The left panel contains five components (services, node tool reports,

graphs, and policy management). On the right panel the console will show you

details of components selected from the left panel. Start by adding the managed

nodes (that have an agent running) to this system, in order to monitor them. To

do so, first select the 'Node' from the left panel and right click on it.

After seeing the color of the alert on the left side, operator can see fault details by double clicking on the managed node

From the context menu again select Configure>node, which will open another

window that's divided into two parts. The left part will show you all

respective networks while right part will show all managed nodes by HP OpenView

operations. As we have used all Windows machines, select 'Microsoft Windows

Networks' from the left panel and drag and drop the machines that you want to

monitor to the right panel. Once the machines are added, the HP OpenView

Operations will hunt for all events raised from the managed nodes and show them

on the console.

Here, we will show how you can troubleshoot a problem remotely using HP

OpenView Operations. For this, let's take a very simple example of a DNS

service, which is running on one of the managed servers. We deliberately killed

the process of DNS.EXE. This raised an alert to the HPOVO console.

Double clicking on the warning alert will show you the error messages. In the

same window, on clicking the Instructions tab, you will be shown a remedy to

trouble shoot the problem. To execute the remedy from the same window, just

click on the Commands section, where you will see the command that will restart

the DNS service.

Just click on the Start button to execute the remedy to the target machine,

remotely. As this is a known case, a pre-defined solution is already available.

But in a real scenario, with different applications running, the errors

generated would only be known to the systems manager, and he also knows how to

work around to tackle those.

In such a case, you can write the course of action in policy rules in any one

or a combination of VB Script, Perl script, WMI script or as a Batch file, for

that error. This reduces the overall troubleshooting time and downtime, plus it

makes the whole system more transparent. The IT manager can track changes at

each level in his IT Infrastructure. This was a small example to show how it

works. You can create far more complex scripts.

Advertisment