Policy Alerts Overview
Kentik's policy-based alerting system is covered in the following topics:
- For information on settings for (alert) policies, see Alert Policies.
- For information on active or historical alerts, see Alerting.
- For information on alert-related notifications, see Notifications.
- For information on mitigation in response to alerts, see Mitigations.
Kentik's powerful alerting system analyzes your network traffic and detects anomalous patterns that may indicate adverse conditions threatening availability or performance. The system is built around alert policies that define a set of conditions. When those conditions are met the policy enters "alarm state" and generates an alert. A policy can be configured to generate notifications in response to an alert, as well as to initiate automatic mitigations, either built-in (e.g. Flowspec or RTBH) or third-party (e.g. Cloudflare, Radware, or A10).
For documentation of the UI used to manage Alerting in the portal, please refer to the following:
Alerts and Policies
Kentik's alerting system is implemented via alert policies, which are covered in the following topics:
When a policy is created — either by a Kentik user or, in the case of Policy Templates, by Kentik itself — it is assigned one of the types listed in the table below.
|Listed on Policies page
|Listed on DDoS Defense page
|Add Policy page
|Add Policy page
|Up/Down or Threshold
|Add Policy page
The trigger type shown above refers to the mechanism that causes a policy to generate an alert:
- Threshold policies: Policies whose types is Custom, DDoS, or Query-based are always threshold policies (see About Threshold Policies).
- Up/Down policies: Policies whose types is NMS may either be threshold policies or Up/Down policies (see About Up/Down Policies).
- The Custom and DDoS policy types may also be assigned to a Kentik-created policy template.
- The filters on the Policies page and the Policy Templates page may be used to determine which policy types are shown in the lists on those pages. If the NMS filter is checked both Up/Down policies and NMS threshold policies will be included in the list.
About Up/Down Policies
Up/Down policies are metrics-based, built on Kentik NMS (see NMS Overview). These policies alert you when a monitored entity, such as a device, interface, or BGP neighbor, is in an unhealthy state (e.g. a device is down). Like threshold alert policies (see About Threshold Policies), Up/Down policies can be added, cloned, and edited:
- Add: Create a policy from scratch via the Add Policy button on the portal's Alerting page, which opens the Add Policy Dialog. Click the UP/DOWN card to select it, then click Continue go to an Add Policy page that is specifically for Up/Down policies (see Up/Down Settings Page).
- Clone: Duplicate an existing policy by choosing Clone Policy from the Action menu at the right of each row in the Policies list (see Clone a Policy), then modify the policy's settings to make it different from the original.
- Edit: Modify the settings of an existing Up/Down policy by choosing Edit Policy from the Action menu at the right of each row in the Policies list, which takes you to the edit version of the Up/Down Settings Page.
Note: Because Up/Down policies are based on Kentik NMS, they are categorized as NMS filters for the purpose of filtering the Policies List.
About Threshold Policies
A threshold policy is essentially a set of comparative evaluations that, when one or more comparisons result in a match (see About Matches), can trigger an alert (the policy enters ALARM state; see Alert Status), which results in an action such as a notification and/or DDoS mitigation.
Threshold policies may have a type of Custom, DDoS, or Query-based. This type doesn't affect how a threshold policy functions, but it does affect the locations and circumstances in which it is displayed in portal (see Policy Types).
Each threshold policy defines the characteristics of your network traffic that will result in an individual alert and the response to be taken by the alerting system once an alert is triggered. The configuration of a policy covers the following areas:
- Evaluated traffic: What traffic flow data do you want to evaluate as it is ingested into Kentik?
The Data Sources, Policy Dimensions, Metrics, and Filters of the Policy’s Dataset tab, as well as general policy settings related to top-X depth and minimum volume, are used to define the scope of the traffic that will be evaluated. You can also set the time interval between evaluations.
- Comparison mode: What's the comparate to which the current traffic will be compared?
Current traffic can be compared to a static value, a historical baseline, and/or track when the traffic exists or not.
- Thresholds: What sorts of differences between the current traffic and the comparate will trigger an alarm?
Each alert can include up to five thresholds, each with its own comparison mode and settings that determine the conditions that will trigger an alarm, the timing for entering and leaving an alarm state, and the actions to take in response.
- Actions: What actions will occur in response to an alarm?
Each threshold includes settings for its own independent set of actions, which boil down to various options for notification (see Notifications) and/or mitigation (see Mitigation Overview). As an alert enters an alarm state it will also be added to the Alerts list (see Alerts List), on the Alerting page.
Once a policy is defined and saved it will appear in the list on the Policies Page, which is where policies can be added, cloned, and edited.
Threshold Alerting Concepts
The concepts covered in the following topics are crucial to understanding how the alerting system operates for threshold policies:
If a threshold policy is enabled, the flow data sent to Kentik from your network devices (routers, hosts, etc.) is evaluated at the specified evaluation frequency for a match between the characteristics of the evaluated traffic and the characteristics defined in any of the policy's thresholds (see Threshold Conditions). If a specified number of matches are found within a given period of time (see Threshold Frequency), an alarm is triggered and the system responds with the actions specified in the threshold that has been matched.
Note: Policies enable exceptionally powerful control but can be challenging to configure. The Kentik support team encourages you to contact us at firstname.lastname@example.org for assistance with alert policy configuration.
At Kentik, a key is an identifier that represents a unique combination of values for a given set of dimensions. Suppose, for example, that the dimensions are Destination IP/CIDR, Destination Port Number, and Protocol. Each unique combination of values for those three dimensions will constitute an individual key.
In the case of threshold alerting, the dimensions that comprise the key definition are chosen on the Dataset tab (see Policy Dataset Settings). The top-X ranking of traffic is performed by evaluating the volume of the traffic — as measured in the primary metric across the selected devices, and filtered by the specified filters — that is represented by each individual key.
Additional Alerting Concepts
In addition to the concepts covered above, various additional concepts that are important to threshold alerting are covered within the KB topics where the settings related to those concepts are made. For coverage of these additional concepts, refer to the following topics:
The pages used to configure and manage alerting and mitigation are:
- Alerting (main menu » Alerting): Provides information about current or previous alerts in your organization; see the remaining topics in this article.
- Policies (Alerting » Manage Policies): A list of alert policies (see Alerts and Policies), from which policies can be added, duplicated, and edited. This page (see Policies Page) enables access to the policy settings pages and dialogs (see Up/Down Policy Settings and Threshold Policy Settings), which allow you to specify the details of an alert policy.
- Policy Templates (Alerting » Configure Alert Policies » Policy Templates): A list of alert policy templates provided by Kentik to cover common situations of which customers might want to be notified; see Policy Templates. Templates can be duplicated and then edited to produce alerts tailored to the specifics of your situation.
- Mitigations (Protect » Mitigations): Provides information about your organization’s current and past mitigations; see Mitigations.
- Manage Mitigations (Settings » Mitigations): A page listing the available platforms on which to run a mitigation and methods to run for each platform (see Manage Mitigations). Platforms can be built in, like Remotely Triggered Black-Hole routing (RTBH), or third-party systems like Cloudflare Magic Transit, Radware DefensePro, or A10 Thunder TPS.
- Manual Mitigation (Protect » Mitigations): A dialog enabling you to apply a mitigation manually in real time without having a corresponding alert that is in alarm state; see Manual Mitigation.
- Silent Mode (Settings » Silent Mode): A list of “patterns” that each represent a set of conditions (dimension/value pairs) that, when matched, will prevent the triggering of alerts on the matching traffic; see About Silent Mode.
- Notifications (Settings » Notifications): A list of notification channels (see Notifications) that each represent a notification type (e.g. email) and notification targets (e.g. a set of email addresses).