Operational
Management covers non-functional production
capabilities (monitoring, provisioning, etc) found in
an SOA environment. Operational Management in a
service-oriented environment is primarily concerned
with the following challenges:
- Deployment,
which focuses on the ability to manage a multitude of
services, from a centralized console, in a consistent
manner throughout the enterprise. Managing deployment
includes the tasks of configuring the services,
deploying the service to servers, and displaying the
status of all the services on all the servers.
- Versioning,
which focuses on the ability to ensure backward
compatibility, by ensuring that the older versions of
consumer requests are served by the older versions of
service instances. It allows rollout of newer
versions to a limited user group, prior to a
full-blown release, thereby reducing the overall risk
of exposure to a new version.
- A Service Level
Agreement (SLA) is a collection of service-level
objectives (SLOs) agreed upon by a service provider
and a consumer. A SLO is a proposed acceptable range
of a single verifiable measurement – such as
request processing time – that's important to
the consumer. For example, an SLO might state that
request processing time not exceed 30 milliseconds
for requests with less than 100 data elements.
- Root
Cause Analysis which is the ability to diagnose and
correct problems. This is one of the primary goals of
an SOA management system. Monitoring determines that
there is or soon will be a problem. Beyond that, the
management system should offer tools to narrow down
the cause of the problem.
- Virtualization
is an umbrella category of a set of capabilities that
are primarily concerned with insulating service
consumers from change, and with providing service
providers with implementation and deployment
flexibility. Chief among these capabilities are
transformation and routing.
- Logging
and Auditing focus on the ability to trace the life
cycle of the service call. Logging and auditing
typically require disk I/O (unless guaranteed
retention of certain entries is not required) and
therefore are expensive tasks that should be held to
the minimum necessary to implement the non-functional
requirements. Services need to be able to perform
role-based logging "on-demand" or "on-error".
"On-demand" logging is the ability to turn logging on
or off from a management console without the need to
restart the service. "On-error" logging is a feature
by which the application logs only the errors in a
very descriptive mode.
- Availability
Monitoring determines if a service is up and running.
It can be implemented by a "ping" mechanism that
periodically executes a dummy request or a "push"
mechanism built into the service that periodically
generates “heartbeat” event messages that
can be monitored. Asynchronous push mechanisms work
better in practice, as it minimizes polling, and the
system can be designed to perform a "health check"
before publishing the heartbeat.
- Accessibility
Monitoring determines if a service can be used. Just
because a service is "available" does not mean it is
"accessible." The lack of accessibility may be due to
reasons such as an insufficient number of worker
threads to handle the request under high load
conditions, unavailable resources like a database, or
inability to gain the cooperation of other requisite
services.
- Performance
Monitoring profiles the execution of a service call
and provides operational statistics. Its numbers
measure both throughput and latency. Throughput
measures the extent of usage of the service and
determines scalability requirements. Latency is a
measure of the round-trip time and can help identify
bottleneck subcomponents or resources.
- Resource
Monitoring is the ability to monitor and record the
usage of various consumable system resources under
load, such as memory use or concurrent request
counts.
- Fault
Monitoring is the ability to recognize and notify an
operator when an application component has failed
during request processing.
- Notification
is the ability to alert an operator to a problem that
was discovered as a result of monitoring.
Notification can be as simple as e-mail, or as
complex as custom integration with a third-party
network management system (NMS).
- Probing
is an active management component that initiates
synthetic requests to trigger performance and
availability monitoring. This often lets the system
manager discover problems before users encounter or
report them.
- Analytics
and Reporting is responsible for the collection of
metrics from the individual managed resources, the
computation of trends and other analytics, and the
presentation of the resulting analysis and raw data
to interested parties.
|