Monitoring: Meters, Indicators & Alerts

The monitoring information for each of the followed services is available in the "Meters & alerts" view under the "Monitoring & usage" section. The default monitoring settings for each newly discovered service are automatically configured when the service is added to the Spatineo service repository. A single monitoring configuration is called a meter. The owner organisation has the possibility to add and change the meters for their own services though Spatineo Monitor to fine-tune the monitoring of the services. All the meters and their collected monitoring information for the services are visible to all the users of Spatineo Monitor following those services.

The request time graph for each meter of each followed service is visible at the top part of the "Meters & alerts" view. The selected service can be changed by clicking one of the services on the service list on left. The monitoring information for an active meter is displayed by either clicking the name of the meter in the services list, or by selecting the meter in the drop-down list at the top of the "Meters & alerts" view.

Each organisation typically has their own thresholds for the Quality of Service of the services they use, whether the services are owned by them or provided by external parties. The follower organisation specific response time and error amount thresholds for a service are specified by an indicator. The indicators take the monitoring information produced by one meter, and derive the value for the current Quality of Service status for the service based on the monitoring results produced by that meter.

When an indicator changes Quality of Service status of a service is from "OK" to "Warning" or "Error", it creates an alert, and records this alert event in the monitoring database. When the status is changed back, another event is created and stored. The indicator can be configured to notify one or more users of these events using email or SMS messages.

905

How to measure a service: configuring meters

A meter is a recipe for measuring the response time of a service offering. An offering can be measured in different ways, and thus there may be one or more active meters for each offering. For Web Map Services this makes it possible for the owner of the service to measure a layer using different map projections, image sizes and formats. The Spatineo Monitoring Agents automatically vary the bounding box of the monitoring requests to simulate realistic service use. The "Service Info" view under the "Monitor & usage" section shows the offerings, or data sets, provided by the selected service. For WMS the offerings are the layers made available by the GetCapabilities operation of the service. For WFS services the table shown the feature types provided. The clock icon in the "Meters" column indicates that there is currently at least one active meter bound to the offering.

By selecting one of the offerings or a WMS service you can see a map preview of the offering's content. The map preview is currently only able to show the preview if the service provides the offering in either EPSG:4326 or the "Google" projection (EPSG:900913), and the service has no authentication configured. Selecting an offering also shows the list of all active meters for the offering in "Meters" list below the offering list.

Only the owner of the service can add and stop meters for a service. The reason for this is to protect the service providers from excessive monitoring traffic. You can claim the ownership of a service by clicking the link in the Owner field at the top of the "Service Info" tab. If one of your services is already claimed by another organisation, please contact us at support(at)spatineo.com.

Active meters can be stopped by clicking a meter name in the "Meters" list. Stopping the meter only makes the meter inactive, it does not remove the meter completely. This is because the old monitoring data collected using those meters would no longer be available if the meter would be completely removed (see the chapter [Monitoring timeline] for browsing the old meter data). Old meters can later be reactivated to resume monitoring using that meter.

Existing meters cannot be changed, because this would make it impossible to compare existing monitoring results with the new ones. If you want to make a change in the monitoring of an offering, or change the monitored offering of a service, you can add a new meter and stop the old one. New meters can be added by clicking the "Add new meter" button, giving the meter configuration and clicking "Activate new meter" button. First results of the monitoring made with the new meter will be available in the Spatineo Monitor within 10 minutes. All active meters for each service are displayed in the service list at the left side of the "Monitoring & usage" view.

Real-time monitoring of service response: the monitoring timeline

The monitoring response time results of all meters of the selected service are displayed on the "Meters & alerts" view under the "Monitoring & alerts". The top part of view contains a timeline graph of the measured response times for the selected meter. The displayed meter can be changed either by clicking the meter name in the service list, or by selecting it from the drop-down menu directly above the graph on the left.

1828

When a meter is stopped, no further monitoring data is collected by it. The previously collected monitoring data even for the stopped meters is still available for analysing. When there is available data for at least one previously active meter during the currently selected time period, an indicator icon with text "old meters available" is shown beside the meter selection drop-down menu. Any of the inactive meters can be selected using the drop-down menu.

The selected time period for the Meters & alerts view can be selected in a couple of ways. The timeline can be zoomed in by mouse-dragging over the interesting time span, and zoomed out again by pressing the "Zoom out" button. The timeline can also be scrolled to the left or to the right by clicking the arrow icons at both ends of the graph. The exact time period in days can be selected by clicking the date buttons after the "Period" label. There are also shortcuts to selecting the last day, last week and last month using the buttons below the graph on the right. The time selection affects all the components in the "Monitoring & alerts" view, and it's also kept when selecting another service to make comparing the results of two services easier.

By default the monitoring timeline shows average response times calculated using the appropriate time resolution for the current zoom level. Alternatively the minimum or the maximum response times can be selected for the timeline using the drop-down menu below the timeline on the right to reveal the baseline and peak response times for the shown monitoring data. It should be noted that the axis of the response time graph has logarithmic scale to accommodate both fast (100 - 300ms) and long response times (up to 60 s) in the same graph.

Analysing the problems using the response time histogram

In the "Monitoring & alerts" view below the monitoring timeline there is a histogram view of the response times of all the monitoring requests within the selected time period. The vertical axis shows the number of requests within each of the response time bins, and the horizontal axis the response time. The green histogram bars indicate an "ok" response time, the red ones slow, and the yellow ones requests between them. The colouring follows the organisation-specific indicator settings if an indicator has been set for this meter (see [configuring indicators]).

1124

The histogram view can be used to drill down into the monitoring request based on response time. Clicking the histogram at any point selects only the part of requests which have a response time between the boundaries of the bin at that point. The "Response times" table at the right side of the histogram will be automatically filtered to show only the selected requests. Clicking and dragging with mouse over the histogram can be used to select more than bin from the histogram. Click the selected region again to clear the request filtering.

The rightmost "error" bar in the histogram is special: it contains the monitoring requests that have failed for some reason. These requests are not normally shown in the "Response times" list. Select this bar to show only the errors. To show all the requests including the errors click and drag over to whole histogram to select all the bins. The requests in the "Response times" table can be sorted by any of the columns by clicking the column headers.

Setting the criteria for the Quality of Service: configuring indicators & alerts

The meters for each service are controlled by their owner organisation. Any Spatineo Monitor user can however have own requirements for the Quality of Service of the services they are using. Those requirements are set by configuring an indicator to one or more meters of the services. An indicator for the selected service and meter can be configured by clicking the "Alert settings" link above the "Recent alerts" table in the bottom left part of the "Meters & alerts" view.

1033

When setting up the indicator, it must be given a short name to be used in alert
messages generated by that indicator.

An indicator is always in one of the following four states:

  • OK: None of the indicator thresholds indicate a warning or error, and the
    monitoring for the used meter receiving data normally.
  • Warning: At least one of the indicator thresholds indicates a warning, but
    none indicate an error.
  • Error: At least one of the indicator thresholds indicates an error.
  • Insufficient data: The monitoring has not yet gathered enough data, the
    meter has been stopped by the service owner, or the meter can no longer be
    used for monitoring.

Indicators can be configured to react to

  • the average response time of the monitoring requests during the last hour,
  • percentage of error responses or timeouts during the last hour or,
  • both the average response time and the amount of recent errors.

Separate warning and error thresholds can be set for both indicators parts. If both are enabled ("use to trigger alerts" is checked) and the metric values for either are exceed, the indicator moves into state "Warning" or "Error" depending on the exceeded threshold value. When the measured values go down below the threshold values again, the indicator is moved back in either "Warning" or "OK" state. When there is not enough monitoring data available for calculating the Quality of Service, the indicator moves into state "Insufficient data". Each of these state changes generates an event. These events are visible in the "Recent alerts" table on the "Meters & alerts" view.

๐Ÿ“˜

Automatically setting alert thresholds

The magic wand tool can be used to calculate warning and alert thresholds by using the meter's monitoring data for the past week. The tool will evaluate this monitoring data and will then set the thresholds so that there will be warnings only in cases where the service behaviour is substantially different from that period.

In a typical case the indicators are configured to send the event messages to one or more SMS numbers or email addresses. Multiple receivers are separated with commas. This can be used for example to automatically notify technical maintenance personnel about problems in the services. Spatineo Monitor users can write short notes about each alert in the "Notes" area beside the "Recent alerts" table, letting other users know about the actions taken to fix a technical problem etc.

Please note that the indicators and alerts depend on the meters setup by the owners of the services. However, the owners of the services do not see the indicators set-up by other Spatineo Monitor users, even if they depend on the data provided by their meters. If an owner of a service stops a meter, or it can no longer be used for monitoring, the indicators depending on this meter move into state "Insufficient data". In these cases it may be necessary to configure an indicator for one of the other active meters for the given service. There is always at least one active meter for each monitored service.