Policy Alert Settings

Kentik Detect's policy-based Alerting system is covered in the following topics:

Notes:
- For alert policy administration, including notification channels and mitigation, see Policy Alert Admin.
- In addition to the policy-based system covered in this article, Kentik Detect also includes SQL Alerts, which involve specifying alert conditions using SQL rather than using the Alert Policy Settings tab of the Alerting page. SQL Alerts are configured via SQL Alerts Admin on the Alerts menu, and monitored via the SQL Alerts Incidents page. For more information, see SQL Alerts Overview and SQL Alert Settings.

Note: While SQL-based alerting remains available for customers with existing SQL-based alerts, users are encouraged to create all new alerts with Policy-based alerting.

 

 
 top
Alerting Tabs Overview

Policy-based alerts are defined and managed on the following tabs of the Alerting page (Alerts » Alerting):

  • History: A filterable list of alerts that have triggered alarms but are no longer in alarm state; see Alarm History.
  • Alarms: A filterable list of alerts that are currently in alarm state; see Current Alarms.
  • Alert Policies: A list of alert policies (see Alert and Policy Overview), from which policies can be added, duplicated, and edited. This tab (see Alert Policies Tab) enables access to the Alert Policy Settings page, which is a UI for specifying the details of an alert policy.
  • Notification Channels: A list of channels that each represent a notification mode (e.g. email) and notification targets (e.g. a set of email addresses); see Alert Notifications.
  • Mitigation: Two lists, one for each of the main components of the mitigation system (see Alert Mitigation):
    - Mitigation Platforms: The platform on which a mitigation will run, which could be built in, like Remotely Triggered Black-Hole routing (RTBH), or a third party system like Radware DefensePro or A10 Thunder TPS.
    - Mitigation Methods: An individual mitigation configuration to be run on a mitigation platform.
  • Learning Mode: A list of Alerts that are in learning mode, which means that no alarms will trigger because information for baseline values is still being collected; see Alert Learning Mode.
  • Kentik Alert Library: A list of preset alerts provided by Kentik to cover common situations about which customers might want to be notified; see Alert Presets. Presets can be duplicated and then edited to produce alerts that are tailored to the specifics of your situation.
  • Debug: A list showing values of keys (combinations of one or more dimension)from the most-recent evaluation of a chosen alert and the corresponding baseline (if any) for that alert; see Alert Debug.

 

 
 top
Alert and Policy Overview

A policy-based alert is essentially a set of comparative evaluations that, when one or more comparisons result in a match (see About Matches), can trigger an alarm, which results in an action such as a notification and/or DDoS mitigation. Alerting is implemented via alert policies, which each define the characteristics of an individual alert in the following areas:

  • Evaluated traffic: What traffic flow data do you want to evaluate as it is ingested into Kentik Detect?
    The Devices, Query, and Filters pane of the Alert Policy sidebar, as well as general policy settings related to top-X depth and minimum volume, are used to define the scope of the traffic that will be evaluated. You can also set the time interval between evaluations.
  • Comparison mode: What's the comparate to which the current traffic will be compared?
    Current traffic can be compared to a static value, a historical baseline, and/or an existence condition (i.e. specified traffic does or does not exist).
  • Thresholds: What sorts of differences between the current traffic and the comparate will trigger an alarm?
    Each alert can include up to five thresholds, each with its own comparison mode and settings that determine the conditions that will trigger an alarm, the timing for entering and leaving alarm state, and the actions to take in response.
  • Actions: What actions will occur in response to an alarm?
    Each threshold includes settings for its own independent set of actions, which boil down to various options for notification and/or mitigation. As an alert enters alarm state it will also be added to the list on the Alarms Dashboard (see Current Alarms), which lives on the Alarms tab of the Alerting page. When an alert leaves alarm state it is shifted to the list on the History tab instead (see Alarm History).

Once an alert policy is defined and saved it will appear in the list on the Alert Policies Tab of the Alerting page, which is where policies can be added, duplicated, and edited.

 

About Matches

If an alert policy is enabled, the flow data sent to Kentik Detect from your network devices (routers, hosts, etc.) is evaluated at the specified evaluation frequency for a match between the characteristics of the evaluated traffic and the characteristics defined in any of the alert's thresholds (see Activate If Settings). If a specified number of matches are found within a given period of time (see Activate When Settings), an alarm is triggered and the system responds with the actions specified in the threshold that has been matched.

Note: Alert policies enable exceptionally powerful control but can be challenging to configure. The Kentik support team encourages you to contact us at support@kentik.com for assistance with alert policy configuration.

 

 
 top
Alert Policies Tab

The Alert Policies tab of the Alerting page displays the Alert Policy List, which is a list of the alert policies (see Alert and Policy Overview) that are currently available to your organization. Policies can be added, duplicated, and edited from this list.

Note: For information on configuring an alert policy, see Alert Policy Settings.

 

 
 top  |  section
Alert Policies Tab UI

The Alert Policies tab is made up of the following UI elements:

  • Filter: A field at upper right that is used to filter the Alert Policy List. The following columns are searched for a match on the string entered in this field: ID, Policy Name, # of Devices, Dimensions, Learning Mode, and Metric.
  • Create Alert Policy: A button at upper right that opens the Alert Policy Settings page, where you can configure and save a new alert policy.
  • Alert Policy List: See Alert Policy List.
  • SQL Alerts link: A link (on "Looking for SQL Alerts") to the portal's SQL Alert Settings UI.

 

 
 top  |  section
Alert Policy List

The Alert Policy List is a table that lists all of the alert policies that are currently available to be used by your organization. Policies added to the list may be created in one of the following ways:

  • Created from scratch via the Create Alert Policy button.
  • Duplicated from an existing policy using the Copy button in the actions at the right of the existing policy's row in the Alert Policy List.
  • Duplicated from an alert policy preset on the Kentik Alert Library tab of the alerting page; see Alert Presets.

The Alert Policy List provides the following information and actions for each alert policy:

  • ID: The system-generated unique ID assigned when the alert policy was created.
  • Policy: The user-specified name of the alert policy.
  • # of Devices: The number of devices covered by the query for the alert policy. Devices are selected in the Devices pane of the sidebar of the Alert Policy Settings page (see Alert Policy Sidebar).
  • Dimensions: The dimensions defined in the alert policy, which combine to make a key definition that will determine how traffic is subdivided for evaluation (see About Keys). Dimensions correspond to the fields of the KDE main table as described in Main Table Schema.
  • Learning Mode: Indicates whether the alert policy is currently in Learning Mode, which prevents the alert from entering alarm state until the specified date (see General Policy Settings). True indicates that Learning Mode is enabled; false indicates that it is disabled.
  • Metric: The units (e.g. bits/s, packets/s, flows/s, etc.) by which this alert measures incoming flow data (see Alert Query Pane). The primary metric is listed first, follwed by secondary metrics (if any).
  • Actions: The following actions can be performed on an alert policy:
    - Edit: Opens the Alert Policy Settings page, where you can edit the settings of the alert's policy.
    Note: Clicking anywhere in a policy's row (other than the action buttons) opens that policy's settings page.
    - Delete: Opens a confirming dialog that allows you to delete the alert policy.
    - Copy: Duplicates the alert policy so that it can be modified without altering the original.
    - Activate: Toggles whether the alert policy is active. When the policy is active the Activate button is green and the circle (LED) next to the name in the Policy Name column is a solid green disc. When the policy is inactive the Activate button is black and the LED is a hollow gray circle.
    Note: If an alert has generated one or more alarms that are listed on the Alarms tab, switching the alert to inactive will remove those alarms from the Alarms tab list.

 

 
 top
Alert Policy Settings

Alert policy settings are covered in the following topics:

Note: The Kentik support team encourages you to contact us at support@kentik.com for assistance with alert policy configuration.

 

 
 top  |  section
Accessing Policy Settings

Alert policies are defined on the Alert Policy Settings page, which is accessed via the Alert Policies Tab of the alerting page (Alerts » Alerting):

  • To add a new alert policy, click the Create Alert Policy button at upper right.
  • To edit an existing alert policy, click anywhere in that policy's row in the Alert Policies list.

 

 
 top  |  section
Policy Title Pane

The policy title pane at the top of the Alert Policy Settings page includes the name of the page at left and the following three buttons at right:

  • View in Explorer: Opens a new browser window or tab and loads Data Explorer with the Devices, Query, and Filters panes set to match the corresponding panes in the sidebar of the alert policy.
  • Save: Save the current settings of the alert policy and return to the Alert Policies tab.
  • Cancel: Leave the Alert Policy Settings page without saving changes (if any) and return to the Alert Policies tab.

 

 
 top  |  section
Alert Policy Sidebar

The Alert Policy sidebar is used to specify the metrics that a given alert will be looking at and to focus the alert on a specific subset of total traffic. This subset is the same for all thresholds in the alert.

The sidebar includes three panes:

 

Alert Query Pane

The Query pane in the alert policy sidebar determines which aspects of ingested flow data will be evaluated for a match with the conditions specified in the thresholds of the alert:

  • Group by Dimensions: A selection box that operates as described in Dimension Selectors but without custom dimensions. The selector is used to choose one or more dimensions (see Using Multiple Dimensions) that combine to make a "key definition" that determines how traffic is subdivided for evaluation (see About Keys). Dimensions correspond to the fields of the KDE main table as described in Main Table Schema.
    Example: If the primary metric is packets/second, and the group-by dimensions are set to Source:AS Number and Destination:AS Number, then the top-X evaluation will involve looking at all unique combinations of source ASN and destination ASN to determine which combinations have the highest traffic volume as measured in pps.
  • Primary Metric: The units (e.g. bits/s, packets/s, flows/s, etc.) by which ingested flow data will be evaluated to determine top-X. For a list of available metrics, see Metric Options.
    Example: If the primary metric is bits/second then the top-X evaluation will rank keys by bps.
  • Secondary Metrics: A selection box that operates like the Group by Dimensions selector and is used to enable you to specify additional static comparators (see Policy Threshold Settings) that are based on a metric other than the primary metric. Each added comparator represents an additional condition that must be met in order to trigger an alarm. The Secondary Metrics selector supports simultaneous selection of multiple secondary metrics.

 

About Keys

A key is an identifier for a set of traffic data that corresponds to the key definition that is set with the Dimension Selector (Group By Dimension field in the Query pane of the sidebar). For example, if the selected dimensions are Destination:Interface, Destination:BGP AS_Path, and Full:Device then the key definition will be "Dest Interface, Dest BGP AS_Path, Device" and each unique combination of values for those three dimensions (e.g. "209:64901:8367") will constitute an individual key. The top-X ranking of traffic is performed by evaluating the volume, as measured in the primary metric, of the traffic (across the selected devices and filtered by the specified filters) that is represented by each individual key.

 

 
 top  |  section
General Policy Settings

The General Settings pane of the Alert Policy Settings page defines the overall properties of the alert policy.

This pane includes the following configuration options:

  • Policy Name: The name of the alert policy. Maximum of 50 characters; must include at least one letter.
  • Policy Description (optional): A description of the policy; used to summarize what the policy looks at and indicate what it is used for.
  • Track: The number of keys (unique combinations of dimension values; see About Keys) to evaluate for a match with the conditions specified in the alert's thresholds.
  • History Top: The number of keys for which to store baselines.
    Note: This setting has no effect if the comparison mode of all of the alert's thresholds is static.
  • Only look at the traffic over: The minimum value, as measured by the primary metric, that a key must have to be included in the top-X ranking and evaluated for a threshold match.
    - Auto: If checked (default), the "only look" value is auto-calculated using a formula based on the settings of the comparators in the policy's thresholds (see Activate If Settings).
    - Specified value: If Auto is not checked, specify the minimum value.
  • Learning Mode: Prevents the alert from entering alarm state (and triggering notifications and/or mitigations) until the specified date. Used for new policies to allow time to establish baselines.
    - Enable checkbox: Enables learning mode (default is checked).
    - Enabled until: If "Enabled" is checked, sets the date on which the alert will exit learning mode. The default learning period is six days.
  • Evaluation Frequency: The interval at which newly ingested traffic data is grouped by dimensions and the traffic data represented by the resulting keys is evaluated. Options include 30 sec, 1 min, 2 min (default), and 5 min.
  • Policy Dashboard: The ID of a dashboard to associate with this alert. When an alarm is generated by this alert, clicking the speedometer icon in that alarm's row of the Alarms List (see Current Alarms) will open the specified dashboard with filters set to the values of the alarm's key.

 

 
 top  |  section
Historical Baseline Settings

Baselining enables the alerting system to trigger an alarm based on a comparison of current traffic against historical traffic patterns. The historical baseline settings configure the baselines that are used for all thresholds whose comparison mode is either Baseline or If-Exist. If all of the alert's thresholds are set to a comparison mode of Static, the historical baseline settings have no effect.

 

Look-back Settings

Look-back settings set the scope and granularity of the history data used for baselining:

  • Look Back: Sets how far back in time the history starts. As traffic data becomes older than this value it is dropped from the data used for baselining.
    - Number: The number of time-units; default = 21.
    - Unit: Days (default) or hours.
  • Start Back: Sets how recently history ends. Traffic data newer than this value is not included in the data used for baselining. Excluding recent traffic allows you to keep current spikes or anomalies from skewing baseline values.
    - Override: If checked (default) the start-back values are ignored.
    - Number: The number of time-units; default = 1.
    - Unit: Days (default) or hours.
  • Look Back Step: Sets which historical values will be factored into the baseline. If the nature of the underlying traffic is cyclical on a daily basis, the unit may be set to days to ensure that the baseline will be based on data from the same time of day each day. Otherwise the unit is typically set to one hour, meaning that the baseline will be based on data from 1, 2, 3, 4, etc. hours ago.
    - Number: The number of time-units; default = 1.
    - Unit: Days (default) or hours.

 

Aggregation Settings

Aggregation settings define how the data points of the traffic history data are prepared for comparison with current traffic:

  • Minute -> Hourly Rollup Aggregation: The historical data set is made up of data points representing traffic totals for the time-slices whose duration (30 sec, 1 min, 2 min, or 5 min) is defined with the Evaluation Frequency setting under General Policy Settings. This rollup aggregation setting specifies the method for a first stage of aggregation in which those nominal "minute" data points are aggregated into one-hour baseline data points. Options include Max, Min, and Percentile: 98th (default), 95th, 75th, 50th, or 25th.
  • Final Aggregation: Specifies how the set of hourly data points resulting from the first stage of aggregation and further defined by the Look Back and Look Back Step settings are then aggregated into a single baseline value for each key (combination of dimensions) that can be compared to the current value for that key. Options include Max, Min, and Percentile: 99th, 98th, 95th (default), 90th, 80th, 50th, or 25th.
    Note: Policies watching for activity in excess of baseline typically use Max, 98th, or 95th percentile aggregation. Policies watching for activity below baseline typically use 25th percentile aggregation.

 

Look Around Settings

Look around settings enable the hourly data points used for final aggregation to represent a time-slice that is broader than a single hour. This capability is useful when traffic has a daily cycle that's too loose to be captured in a single hour slice (the time at which daily peaks/troughs occur varies by more than one hour). Look-around aggregation (if any) is applied before final aggregation.

  • Look Around Aggregation: The aggregation to apply, either None (default), Max, Min, or Percentile: 90th, 75th, 50th, or 25th. The "none" option disables look-around aggregation.
  • Look Around +/-: The number of hours (default = 1) on either side of each hourly data point that will be aggregated into that data point.
    Example: If Look Around is set to 2, then the hourly data point representing 7AM on a given day in the final aggregation will actually be an aggregate of the data points from 5AM to 9AM on that day.

 

 
 top  |  section
Policy Threshold Settings

A threshold is a collection of settings that define a set of conditions that must be matched in order for an alert to be activated, at which point the alert generates an alarm for each key for which conditions have been matched. Each alert policy includes at least one threshold by default, but may include up to five thresholds.

 

Add a Threshold

Thresholds are added to a policy with the Add Threshold menu, which allows you to select a criticality label (Critical, Major2, Major, Minor2, Minor) to apply to the new threshold. The labels are for convenience only; they have no effect on the operation of the alerting system (a user can, for example, create a Minor threshold with conditions that are more extreme than a Critical threshold in the same alert).

 

General Threshold Settings

The general settings for a threshold include its comparison mode (see Threshold Comparison Mode) as well as additional general settings that apply regardless of comparison mode, which include the following:

  • Threshold type: A drop-down menu allowing you to change the criticality label (see Add a Threshold).
  • Description: A user-specified text string describing the threshold.
  • Acknowledge Required: If checked, an alert that is no longer in alarm state will not be fully cleared until it is acknowledged manually in the Alarms tab (see Current Alarms).

 

Threshold Comparison Mode

The comparison mode determines what the primary traffic is compared to when evaluated by the alerting system to determine whether there is a match with the conditions defined in the threshold:

  • Static: Compare the primary value (in the current metric) for each key (see About Keys) to a literal (statically defined) value.
  • Baseline (default): Compare the primary value for each key to that key's comparison value.
  • If-exist: Check whether a key in the primary top-X is also in the comparison top-X.

If a threshold's comparison mode is set to Baseline or If-Exist then the meaning of "primary" and "comparison" in the above descriptions depends on the threshold's Direction setting (see Comparison Options):

  • If the direction is Current to History, the current top-X set is primary and the comparison set is historical (derived based on the alert's Historical Baseline settings).
  • If the direction is History to Current, the historical top-X set is primary and the comparison set is the current top-X.

 

Comparison Options

These settings are available only when the comparison mode is Baseline or If-Exists (not Static). They determine how the alerting system compares current and baseline values.

Comparison options include the following settings:

  • Direction: Determines which set of top-X keys (current or historical) is the primary set and which is used for comparison.
    - Not Applicable: Not used.
    - Current to History (default): The set of current keys is primary; the set of history keys is for comparison. For each key in the current top-X, compare the current value to that key's baseline value.
    - History to Current: The set of history keys is primary; the set of current keys is for comparison. For each key in the history top-X, compare the baseline value to the current value. This direction enables the system to identify keys that were but no longer are in the current top-X, e.g. a key that normally has high traffic volume that currently has no traffic.
  • Look At: Current Top (default = 25): If Auto is not checked, the number of keys to track in the primary top-X pool.
  • Look At: Store Top (default = 25): If Auto is not checked, the number of keys to compare with in the comparison top-X pool.
  • Auto: If checked:
    - the value of Look At: Current Top is set to the value of the Track field in General Policy Settings.
    - the value of Look At: Store Top is set to the value of the History Top field in General Policy Settings.
  • If no History: Sets what to do if a key in the primary top-X is not present in the comparison top-X.
    Note: If direction is Current to History, the current top-X set is primary and the history set is for comparison. If direction is History to Current, the history top-X set is primary and the current set is for comparison.
    - Use lowest top-x value (default): Compare the value of the key in the primary top-X set to the value of the last (lowest) key in the comparison set.
    - Set alert: Classify as a match on this key.
    - No alert: Classify as not a match on this key.
    - Use this value: Compare this key's current value to a static value instead of to a baseline value.
    - Value field: Shown only when If no History is set to Use this value. The static value to compare to.
  • Auto Calc checkbox: If checked, the value used when there is no history will be auto-calculated based on the settings of the threshold's comparators. Shown only when either of the following is true:
    - If no History is set to "Use this value."
    - If no History is set to "Use lowest top-X value," but there are no baseline entries (and therefore no historical lowest top-X).
Note: The threshold UI does not currently adapt to a direction setting of History to Current, which means that that the function of the following Comparison Options fields will differ from the labels for those fields:
- The Look At: Current Top field will set the depth of the history top-X set for this threshold. If Auto is checked, the depth of the history set will be determined by the depth specified in General Policy Settings for the current set.
- The Look At: Store Top field will set the depth of the current top-X set for this threshold. If Auto is checked, the depth of the current set will be determined by the depth specified in General Policy Settings for the history set.
- The If no History field will actually set what happens when a key in the history top-X is not present in the current top-X.

 

Activate If Settings

The Activate If section of the threshold determines the specific conditions that, when found to be true, will be considered by the alerting system to be a match. This section is structured as a set of ANDed comparators, each of which is a statement with set-able values.

The Activate If section contains the following settings:

  • Static comparator: True for any key (see About Keys) in the primary top-X set (which depends on the Direction setting in Comparison Options) that meets the conditions defined by the following fields:
    - Metric: A primary or secondary metric (see Alert Query Pane).
    - Operator: Greater than or less than the static value.
    - Value: A number representing units, which varies depending on the metric.
  • Baseline comparator: Available only if the threshold's comparison mode is Baseline. True for any key in the primary top-X set that meets the conditions defined by the following fields:
    - Metric: The alert's primary metric (not settable by user).
    - Operator: Greater than or less than the value of the metric for the same key in the comparison top-X set.
    - Value: A number.
    - Type: The type of value, either percent or the alert's primary metric (e.g. Kpps, Mbps, FPS).
    Note: When Direction (see Comparison Options) is History to Current the actual function of baseline comparators will differ from the comparator text. See note below.
  • Interface Capacity Comparator : Available only when one of the dimensions is Source:Interface or Destination:Interface and one of the alert's metrics is Bits/s. True for any key in the primary top-X set that meets the conditions defined by the following fields:
    - Metric: Bits per second (not settable by user).
    - Operator: Greater than or less than the value of the metric for the same key in the comparison top-X set.
    - Value: A number.
    - Type: The type of value, either Percent or Negative Offset in Mbps. If set to Negative Offset, the comparator will be true when the traffic volume is greater than the interface capacity less the offset value in mbps.
    Example: If interface capacity is 100 mbps and you want to activate an alarm using "negative offset" when traffic exceeds 90 mbps, set the operator to Greater Than and the value to 10.
  • Add Static Comparator: Adds a static comparator.
  • Add Baseline Comparator (available only if the threshold's comparison mode is Baseline): Adds a baseline comparator.
  • Add Interface Capacity Comparator: Adds an interface capacity comparator.
Note: The threshold UI does not currently adapt to a direction setting of History to Current, which means that that the actual function of baseline comparators will differ from the comparator text. The actual function of a baseline comparator when the direction is History to Current can be understood as follows:
The key's value in the baseline top-X set has [metric] [operator] the key's value in the current top-X set by [value] [type].

 

Activate When Settings

This section determines when the matches that are found based on the conditions defined in the Activate If Settings actually trigger an alarm. An alarm is triggered when a match occurs [operator] [number] times in [duration value] [duration units]. When no match has occurred for [reset period] minutes, the count of matches is reset to 0.

The Activate When section contains the following settings:

  • Operator: Greater than or less than, which lets you trigger an alarm not only if some number of matches occur but, alternatively, if they don't.
  • Number: How many times a match must occur within the specified duration.
    Note: If number is 1, the time settings are irrelevant; an alarm will be generated immediately upon the first match.
  • Duration value: The number of time units.
  • Duration units: The time unit, either minutes or hours.
  • Reset period: The number of match-free minutes after which the count of matches is reset to 0.

 

Threshold Notify Settings

When an alarm is triggered or an alert otherwise changes state, notifications may be sent in various forms to designated parties at various destinations. A collection of such destinations is represented as a "notification channel." Notification channels can be created on the Notification channels tab of the Alerting page or directly in the settings for an alert policy.

The Notify section of a threshold includes the following settings:

  • Add Notification Channel: A drop-down menu that allows you to add a notification channel, chosen from a drop-down menu of existing channels, to use for the alarms generated by this threshold.
  • Create Notification Channel: A button that opens the Create Notification Channel modal (see Add or Edit Notification) to create a new set of notification destinations.
  • Notification Channel and Destination: A list of the notification channels currently assigned to this threshold, showing the name and destinations of each channel.

 

Threshold Mitigation Settings

The Mitigation section allows you to assign a mitigation action to a threshold. A mitigation is a combination of a mitigation platform and a mitigation method. A mitigation platform could be built in, like Remotely Triggered Black-Hole routing (RTBH), or a third party system like Radware DefensePro or A10 Thunder TPS. You create mitigations (add to your Kentik Detect setup) on the Mitigation tab of the Alerting page (see Alert Mitigation). Depending on settings, mitigations are applied immediately upon the triggering of alarm state or after a delay or manual intervention.

A mitigation is added to a threshold with the drop-down Add Mitigation menu at the right of the threshold's mitigation settings section. Once a mitigation has been added, the section is populated with the threshold mitigation settings shown above and listed below:

  • Platform: Indicates the type of mitigation.
  • Method: Indicates the specific collection of mitigation settings that is to be applied to the mitigation for this threshold.
  • Apply Mitigation: Specifies when the mitigation will be applied:
    - Immediate: Initiate the mitigation immediately when the threshold activates an alarm (alert enters alarm state).
    - User Acknowledge: Initiate mitigation action only after a user clicks the Start Mitigation button in the actions at the right side of a given alarm's row in the Alarms List on the Alarms dashboard (Alarms tab of the Alerting page; see Current Alarms).
    - User Acknowledge unless timer expired: Wait for a user to acknowledge or cancel mitigation from the Alarms dashboard. If the specified time period expires with no user action then initiate mitigation automatically.
    - Timer: Set the timer for the User Acknowledge unless timer expired setting.
  • Clear Mitigation: Specifies when the mitigation will stop:
    - Immediate: Stop mitigation immediately when the alarm ends (alert exits alarm state).
    - User Acknowledge: Continue mitigation (even after the alarm ends) until it is canceled by a user.
    - User Acknowledge unless timer expired: Continue mitigation (even after the alarm ends) until it is manually cancelled by a user or the specified time expires.
    - Timer: Set the duration of the timer for User Acknowledge unless timer expired.

 

 
 top
Alert Presets

Kentik provides a set of alert policy templates that are covered in the following topics:

 

 
 top  |  section
About Alert Presets

The Kentik Alert Library tab of the Alerting page lists a set of Kentik-provided alert templates that cover common network traffic anomalies. An alert preset can be used as the starting point for configuring an alert that notifies you about an anomaly and enables you to respond with mitigation (manual or automated).

To use an alert preset, duplicate it on the Kentik Alert Library tab, at which point it will be added to the Alert Policy List on the Alert Policies tab. From there the alert's policy can be opened in the Alert Policy Settings page, where it can be tailored to work with your specific network.

 

 
 top  |  section
Alert Presets UI

The Kentik Alert Library tab is made up of the following UI elements:

  • Filter: A field at upper right that is used to filter the Alert Library List. The following columns are searched for a match on the string entered in this field: Name, Description, Dimensions, and Metric.
  • Alert Library List: See Alert Library List.

 

 
 top  |  section
Alert Library List

The Alert Library List is a table that lists all Kentik-provided alert policy presets. The table provides the following information and actions for each preset:

  • Name: The name of the alert policy preset as specified by Kentik when it was created.
  • Description: A general description of the alert policy preset.
  • Dimensions: The dimensions defined in the alert policy, which combine to make a key definition that will determine how traffic is subdivided for evaluation (see About Keys). Dimensions correspond to the fields of the KDE main table as described in Main Table Schema.
  • Metric: The units (e.g. bits/s, packets/s, flows/s, etc.) by which this alert measures incoming flow data (see Alert Query Pane). The primary metric is listed first, follwed by secondary metrics (if any).
  • Actions: The following actions can be performed on alert policy presets:
    - Copy: Duplicates the alert policy preset and adds the copy to the Alert Policy List, where it can be edited and saved. The original preset remains unchanged.

 

 
 top
Alarm Dashboards

Kentik Detect's Alerting page includes two tabs that are used to view alarms and mitigations that are generated by alerts, which are covered in the following topics:

Note: Alarm dashboards are specific to the alerting system and are distinct from the dashboards in the Dashboards section of the Kentik Detect portal.

 

 
 top  |  section
Current Alarms

The primary function of the Alarms tab of the Alerting page is to provide a list, covering the last seven days, of active alarms and mitigations as well as those that are waiting for acknowledgement. The list displays important information about each alarm or mitigation, including the alert policy that triggered it (see Alert and Policy Overview), its current state (see Alarm States), and the key whose traffic matched the conditions specified in one of the policy's thresholds. The content on this tab, which also contains additional indicators and information (see Alarms Tab UI), is refreshed every 30 seconds and displays up to 500 items.

 

Alarms Tab UI

The Alarms tab is made up of the following UI elements:

  • Alarms Summary: A display (at left) showing the total number of mitigations (red), alarms (orange), and required acknowledgements (blue) reported during the last seven days.
  • New/Total Alarms graph: A chart (top center) in which alarms are plotted along a timeline covering the last seven days in one-hour increments. At each hour:
    - The height of the red line shows the total number of alarms that are active during that hour.
    - The height of the blue bar, if any, represents the number of new alarms that occurred during that hour. Hovering the cursor over a blue bar opens a tool tip that shows the number of new and total alarms for that hour.
  • Top Keys table: A list (medium right) of up to eight keys that triggered the most alarms and mitigations during the last seven days. The number triggered during that period for each listed key is shown in the Count column. Two types of actions are available in this table:
    - Hovering on an item in the Key column opens a tool tip showing the dimension and value of the key.
    - Clicking an item in the Key column opens the History tab filtered to show the alarms and mitigations (up to 1000) triggered by the key during the last 30 days.
  • Top Policies table: A list (far right) of up to eight policies that triggered the most alarms and mitigations during the last seven days. The number triggered during that period for each listed policy is shown in the Count column. Two types of actions are available in this table:
    - Hovering on an item in the Policy column opens a tool tip showing the complete policy name.
    - Clicking an item in the Policy column opens the History tab filtered to show the alarms and mitigations (up to 1000) triggered under the policy during the last 30 days.
  • Alarms List table: A table listing currently active alarms (see Alarms List).

Note: When the browser window is sized to less than 1200 pixels wide, the UI elements listed above (except for the Alarms List) will occupy the full width of the window.

 

Alarms List

The Alarms List is a table of up to 500 rows in which each row is one of the following that has occurred in the last seven days:

The Alarms List provides the following information and actions for the rows in the list:

  • Select checkbox: A checkbox that includes the row in a set of rows that will be acted on by the Clear button. To select all rows at once, click the selection box in the column header.
  • Clear button: Appears at the top of the Select column when one or more select checkboxes is checked. The clear action applied to each selected row depends on the type of that row (alarm or mitigation):
    - Clear alarm: Takes the alert that generated the alarm out of alarm state.
    Note: If the conditions that caused the alarm to trigger are still occurring at the next refresh (the timing of which depends on the polling frequency of the alert policy), then a new alarm for the same threshold will appear on the Alarms List.
    - Clear mitigation: Stops the mitigation (equivalent to clicking the Stop icon in the actions at the right of the row).
  • Policy/State: Indicates the policy, type of row (alarm or mitigation), and its current state, the combination of which determines the row's color (see Alarm States).
  • Key/Dimension: The dimensions of the key definition, and their values for the keys that caused the alert to enter alarm state (see About Keys). The key can be placed in Learning Mode by clicking the Plus (+) icon in the row's column.
  • Value: For alarms, the sum total value returned by the key as defined by the alert policy's query. The top-X ranking of traffic is performed by evaluating the volume, as measured in the primary metric, of the traffic (across the selected devices and filtered by the specified filters) represented by the key.
    Note: This column is not applicable to mitigation rows (N/A).
  • Other Data: Shows the following:
    - Baseline: The baseline value from which the alarm threshold has deviated. The baseline can be either static or calculated as defined by the alert policy (see Historical Baseline Settings).
    - Severity: The threshold (critical, major2, major, minor, or minor2) reached as defined by the alert policy (see Policy Threshold Settings).
    Note: For mitigation rows, this column shows the mitigation platform and mitigation method that will be used to mitigate the alert.
  • Mit ID/Alarm ID: The system-generated unique ID assigned to the alarm or mitigation when it was triggered. The ID can be clicked to display the item on the History tab along with any related alarms and mitigations.
  • Start: The start time of the event that triggered the alarm or mitigation.
  • End: The end time of the event that triggered the alarm, if the event is waiting for an acknowledgement. Otherwise, indicates that the alarm or mitigation is "Currently Active."
  • Comment: Opens a dialog in which you can add comments to the selected alert or mitigation.
  • Actions: See Alarms List Actions.

 

Alarms List Actions

The actions that can be taken on an alarm or mitigation in the Alarms List are applied with the action icons shown at the right of each row. Available actions depend on whether the row represents an alarm or a mitigation.

The following actions are available for alarms:

  • Alarm History: Opens the History tab filtered to show alarms and changes of state, including any mitigations, for the key corresponding to the clicked alarm.
  • Debug Alarm: Displays the alarm in a modal Debug window (see Alert Debug).
  • Open in Dashboard: Opens, in a new browser window or tab, the dashboard associated with the policy of the alert that generated the alarm (see Policy Dashboard in General Policy Settings), with the dashboard set to correspond to the values of the alarm's key. For example, if the key's dimension is Destination:IP (IP_dst) and the value of the key in the alarm is 60.54.101.8 then there will be a filter in the dashboard for inet_dst_addr ILIKE 60.54.101.8.
  • Open in Explorer: Opens Data Explorer in a new browser window or tab, with the sidebar set to correspond to the values of the alarm's key. For example, if the key's dimension is Destination:IP (IP_dst) and the value of the key in the alarm is 60.54.101.8 then there will be a filter in the Data Explorer sidebar for inet_dst_addr ILIKE 60.54.101.8.
  • Clear Alarm State: Opens a modal that enables you to add a comment (enter text in the input field) and clear an alarm. The meaning of clear depends on the state of the alert that generated the alarm:
    - If the alert is currently in alarm state (orange row in Alarms List), click Submit to manually clear the alarm.
    - If the alert is no longer in alarm state (blue row in Alarms List), click Submit to acknowledge the alarm (see Acknowledge Required in General Threshold Settings).
    Note: Alerts must be selected before they can be cleared (see Selection Box in Alarms List).

The following actions are available for mitigations:

  • Mitigation History: Opens the History tab and displays the events (state changes; see Alarm States) associated with the mitigation.
  • Stop Mitigation: Manually stops the mitigation.
  • Start Mitigation: Manually starts or restarts the mitigation.

 

 
 top  |  section
Alarm History

The primary function of the History tab of the Alerting page is to display the History List, a filterable table listing alarms, mitigations, and matches (up to 1000) for a specified time range (default is last 24 hours). The list displays important information about each alarm, mitigation, and match, including the alert policy that triggered it (see Alert and Policy Overview), its current state (see Alarm States), and the key whose traffic matched the conditions specified in one of the policy's thresholds. The tab also contains additional indicators, information, and filter controls (see History Tab UI).

Note: Unlike the Alarms tab, the History tab is not refreshed automatically. Instead the tab's contents are updated in rersponse to the following actions:
- Apply filtering in the Alarms History Filter section.
- Click the tab itself (on History at upper left, just under the main navbar); if any filters have been set they will be removed and time range will be reset to default (last 24 hours).

 

History Tab UI

The History tab includes the following UI elements:

  • Alarms History Filter: Filters the History List by time range, alarm properties, and row type (alarm, mitigation, match). See Alarms History Filter.
  • History Summary: A display (at left of graph) with blue cells showing the total number of mitigations, alarms, and required acknowledgements reported for the specified time range.
  • New/Total Alarms graph: A plot of alarms along a timeline covering the specified time range in either one-hour increments (for time ranges of one week or less) or one-day increments (for time ranges longer than one week). At each increment:
    - The height of the red line shows the total number of alarms that are active at that time.
    - The height of the blue bar, if any, represents the number of new alarms that occurred during that time increment. Hovering the cursor over a blue bar opens a tool tip that shows the number of new and total alarms for that time increment.
  • Top Keys table: A list (medium right) of up to eight keys that triggered the most alarms and mitigations during the specified time range. The number triggered during that period for each listed key is shown in the Count column. Two types of actions are available in this table:
    - Hovering over an item in the Key column opens a tool tip showing the dimension and value of the key.
    - Clicking an item in the Key column sets the Filter By field of the Alarms History Filter to Dimension:Key and the Filter Value to the clicked-on key and its corresponding dimension. It also sets the time range to the last 30 days. To restore the full Key list, remove the filter.
  • Top Policies table: A list (far right) of up to eight policies that triggered the most alarms and mitigations during the specified time range. The number triggered during that period for each listed policy is shown in the Count column. Two types of actions are available in this table:
    - Hovering on an item in the Policy column opens a tool tip showing the complete policy name.
    - Clicking an item in the Policy column sets the Filter By field of the Alarms History Filter to Alert ID and the Filter Value to the ID of the alert policy. It also sets the time range to the last 30 days. To restore the full Policy list, remove the filter.
  • History List: A table listing alarms that occured during the specified time range (see History List).

Note: When the browser window is sized to less than 1200 pixels wide, the UI elements listed above (except for the History List) will occupy the full width of the window.

 

Alarms History Filter

The Alarms History Filter allows you to filter the History List based on time range, alarm properties, and row type (alarm, mitigation, or match). The current settings of the Alarms History Filter are applied using the Apply button at right.

The following controls filter the list by time range:

  • From: Two fields used to define the start of the time range:
    - Date field: Pops up a calendar.
    - Time field: Drops down a time list.
  • To: Two fields used to define the end of the time range (see From fields above).
  • Current time button: Click the circular arrow icon to set the end time to the current time.

The following controls filter the History tab by properties of the underlying alerts and alarms:

  • Filter By: The alert property (see options listed below) that will be filtered for the value in the Filter Value field.
  • Filter Value: The string that the Filter By property will be filtered for.

The following Filter By options are supported:

  • Alert Policy: Filters for rows having the alert policy name chosen from the drop-down Filter Value list.
    Note: You can also filter by alert policy with one of the following actions:
    - Click a policy in the Policy/State column of the Alarms list (Alarms tab) or History List (History tab).
    - Click a policy in the Top Policies table.
  • Key (Exact): Filters for rows whose key is identical to the entered string.
  • Key (Partial): Filters for rows whose key contains the entered string.
  • Dimension:Key: Filters for rows whose dimension:key is identical to the entered string.
    Note: You can also filter for a key with one of the following actions:
    - Click a key in the Key/Dimension column of the Alarms list (Alarms tab) or History List (History tab).
    - Click a key in the Top Keys table.
  • Alarm ID: Filters for rows whose alarm ID is identical to the entered string.
    Note: You can also filter for an alarm ID by clicking it in the Mit ID/Alarm ID column of the Alarms list (Alarms tab) or History List (History tab).
  • Mitigation ID: Filters for alarms and mitigations whose mitigation ID is identical to the entered string.
    Note: You can also filter for a mitigation ID by clicking it in the Mit ID/Alarm ID column of the Alarms list (Alarms tab) or History List (History tab).
  • Old State (Partial): Filters for rows whose old state contains the entered string.
    Note: You can also filter for an old state by clicking it in the Policy/State column (left) on the History tab.
  • New State (Partial): Filters for rows whose new state contains the entered string.
    Note: You can also filter for a new state by clicking it in the Policy/State column (right) on the History tab.
  • Any State (Partial): Filters for alarms and mitigations whose old or new state contains the entered string.

The following checkboxes filter the History tab by the type of event:

  • Mitigations: Determines whether the History tab will include mitigations that occurred in the specified time range.
  • Alarms: Determines whether the History tab will include alarms that occurred in the specified time range.
  • Matches: Determines whether the History tab will include matches (see Matches in History) that occurred in the specified time range. The following options are available:
    - Normal: Filters for all matches.
    - Learning: Filters for matches generated by alert policies that were in learning mode (see Alert Learning Mode) at the time of the match.
    - Debug: Internal use only.
    Note: A match indicates that traffic met conditions defined in an alert policy threshold but does not necessarily indicate that an alarm was triggered.

 

Matches in History

Unlike the Alarms tab, the History tab can be filtered to show matches (conditions that meet alert threshold criteria; see Activate If Settings) that didn't cause an alert to enter alarm state. This allows you to graph a history of matches for an alert during a specified time range, even if that alert does not enter alarm state because there were not enough matches during a defined time period (see Activate When Settings).

 

History List

The History List is a filterable table of up to 1000 rows in which each row is one of the following that has occurred during the specified time range:

The History List provides the following information and actions for the rows in the list:

  • Policy/State: Indicates the policy, type of row (alarm, mitigation, or match), and state, the combination of which determines the row's color (see Alarm States).
  • Key/Dimension: The dimensions of the key definition, and their values for the keys that caused the alert to match (see About Keys). The key can be placed in Learning Mode by clicking the Plus (+) icon in the row's column.
  • Severity: The threshold (critical, major2, major, minor, or minor2) that triggered an alarm, as defined by the alert policy (see Policy Threshold Settings). Also indicated is the number of matches reached for threshold.
    Note: For mitigation rows, this column shows the mitigation platform and mitigation method that was used.
  • Value: For alarms and matches, the sum total value returned by the key as defined by the alert policy's query. The top-X ranking of traffic is performed by evaluating the volume, as measured in the primary metric, of the traffic (across the selected devices and filtered by the specified filters) represented by the key.
  • Baseline used: Shows the following:
    - The upper number is the baseline value from which the alarm threshold has deviated. The baseline can be either static or calculated as defined by the alert policy.
    - The lower string code is a code indicating whether the value above was static (user set), based on a historical baseline, or based on an alternatively derived fallback value (see Baseline Codes).
  • Mit ID/Alarm ID: The system-generated unique ID assigned to the alarm or mitigation when it was triggered. The IDs can be clicked to filter the History tab for that ID.
    Note: Match rows only include an alarm ID if the match was included in the count of matches that triggered an alarm.
  • Timestamp (UTC): The time of the event that triggered the alarm, mitigation, or match.
    Note: This column also reports the number of matches (for example, LM:0) reported for alert policies in Learning Mode (see Alert Learning Mode).
  • Comment: Displays any comments added to the alarm or mitigation (via the Comment button) on the Alarms tab.
  • Actions: See History List Actions.

 

Baseline Codes

The following table describes the explanation codes that appear for matches in the Baseline Used column of the History List:

Code
Comparison direction
Comparator found?
Description
NO_USE_BASELINE
N.A.
N.A.
The match was triggered by a Static threshold (no baselining).
CALCULATED_USED_FOR_BASELINE
Current to History
Yes The match was triggered when the key's traffic exceeded the baseline.
TRIGGER_USED_NO_BASELINE
Current to History
No The match was triggered when no baseline was found.
DEFAULT_USED_FOR_BASELINE
Current to History
No The match was triggered when no baseline was found and the key's traffic exceeded the specified value.
LOWEST_USED_FOR_BASELINE
Current to History
No The match was triggered when no baseline was found and the key's traffic exceeded the lowest historical top-x value.
NOT_FOUND_EXISTS_NO_BASELINE
Current to History
N.A. The match was triggered when the key was in the threshold's current top-x but not in the historical top-x.
ACT_CURRENT_MISSING_TRIGGER
History to Current
No The match was triggered when the key was in the threshold's historical top-x but not in the current top-x.
ACT_CURRENT_USED_FOUND
History to Current
Yes The match was triggered when the key's historical traffic exceeded its current traffic.
ACT_CURRENT_NOT_FOUND_EXISTS
History to Current
N.A. The match was triggered when the key was in the threshold's historical top-x but not in the current top-x.

Note: The situations in which the above codes are used depend on the Threshold Comparison Mode and the Comparison Options settings of the threshold that triggered the alarm.

 

History List Actions

The actions that can be taken on an alarm or match in the History List are applied with the action icons shown at the right of each row. There are no actions available for mitigations in the History List.

The following actions are available for alarms and matches:

  • Debug Alarm: Displays the alarm in a modal Debug window (see Alert Debug).
  • Open in Dashboard: Opens, in a new browser window or tab, the dashboard associated with the policy of the alert that generated the alarm (see Policy Dashboard in General Policy Settings), with the dashboard set to correspond to the values of the alarm's key. For example, if the key's dimension is Destination:IP (IP_dst) and the value of the key in the alarm is 60.54.101.8, then there will be a filter in the dashboard for inet_dst_addr ILIKE 60.54.101.8.
  • Open in Explorer: Opens Data Explorer in a new browser window or tab, with the sidebar set to correspond to the values of the alarm's key. For example, if the key's dimension is Destination:IP (IP_dst) and the value of the key in the alarm is 60.54.101.8, then there will be a filter in the Data Explorer sidebar for inet_dst_addr ILIKE 60.54.101.8.

 

 
 top  |  section
Alarm States

The state of alarms and mitigations is displayed in the Policy/State column in the Alarms List on the Alarms tab and the History List on the History tab. The following tables show the meanings of the state codes.

 

Alarm Rows

Alarm rows can appear in both the Alarms List and the History List. The following table lists the possible alarm states represented by alarm rows, as well as the corresponding background colors:

State Color Description
ALARM orange Active alarm: an alarm that is currently in alarm state.
ACK_REQ blue An alarm that is no longer active but still requires user acknowledgement before being cleared (see General Threshold Settings).
CLEAR green An alarm that has been cleared.
Note: Cleared alarms are removed from Alarms List but appear in History List.

Notes:
- In the History List, which includes an entry for each time a given alarm or mitigation undergoes a change of state, the row color is determined by the new state.
- The History List can also display matches (see About Matches), which have no state. Match rows are labeled with the word "Match" in the Policy/State column and are displayed in light gray.

 

Mitigation Rows

Mitigation rows can appear in both the Alarms List and the History List. The following table lists the possible mitigation states represented by mitigation rows, as well as the corresponding background colors:

State Alarms color History color Description
START_WAIT_CONF dark red dark blue Mitigation start is pending: Mitigation has been triggered but user acknowledgement is required before starting (see User Acknowledge under Apply Mitigation in Threshold Mitigation Settings).
START_TIMED_CONF dark red dark blue Mitigation start is pending: Mitigation has been triggered but requires one of the following before starting (see User Acknowledge unless timer expired under Apply Mitigation in Threshold Mitigation Settings):
- expiration of timer;
- user acknowledgement.
END_WAIT_CONF dark red dark blue Mitigation stop is pending: The conditions that triggered the mitigation no longer exist but user acknowledgement is required before stopping (see User Acknowledge under Clear Mitigation in Threshold Mitigation Settings).
END_TIMED_CONF dark red dark blue Mitigation stop is pending: The conditions that triggered the mitigation no longer exist but one of the following is required before stopping (see User Acknowledge unless timer expired under Clear Mitigation in Threshold Mitigation Settings):
- expiration of timer;
- user acknowledgement.
MITIGATING dark red dark red The mitigation is active and was not restarted manually.
MITIGATING_FAIL dark gray dark gray Mitigation was attempted but was unable to execute as configured.
END_GRACE dark red dark blue The mitigation has ended but the grace period has not yet expired (see Grace period in Add or Edit Mitigation Method).
CLEAR N.A. dark green The mitigation has been cleared (not manually by user).
Note: Cleared mitigations do not appear in the Alarms List but do appear in the History List.
CLEAR_MANUAL N.A. dark green The mitigation has been manually cleared.
Note: Cleared mitigations do not appear in the Alarms List but do appear in the History List.

Notes:
- In the History List, which includes an entry for each time a given alarm or mitigation undergoes a change of state, the row color is determined by the new state.
- The History List can also display matches (see About Matches), which have no state. Match rows are labeled with the word "Match" in the Policy/State column and are displayed in light gray.