Threshold Policy Settings
Note: Alert policy configuration can be complex. If you don't find what you need here, please contact Kentik (see Customer Care). |
The settings of the threshold policies used by Kentik's alerting system are covered in the following topics:
- Threshold Settings Tabs
- General Policy Settings
- Policy Dataset Settings
- Policy Threshold Settings
- Policy Baseline Settings
- NMS Threshold Policies
Notes:
- For an overview of threshold policies, see About Threshold Policies.
- For general information about policy-based alerting, see Policy Alerts Overview.
- For information on policy templates, see Policy Templates.
- For information on active or historical alerts, see Alerting.
- For information on alert-related notifications, see Notifications.
- For information on mitigation for alerts, see Mitigations.
Threshold Settings Tabs
The following tabs are included on all types of settings pages (add, edit, or clone; see Policy Settings Pages) for threshold policies:
- General: Used to define the overall properties of the policy; see General Policy Settings.
- Dataset: Used to narrow the subset of traffic that is evaluated for all thresholds in this alert; see Policy Dataset Settings.
- Thresholds: Used to specify up to five collections of conditions that trigger the alert to enter alarm state (see Alert Status); see Policy Threshold Settings.
- Baseline: Used to configure the baselines against which current traffic is compared to determine if there is a deviation from the norm; see Policy Baseline Settings.
Policy Summary Pane
On each of the above tabs, a sidebar at the right contains the Summary pane, which features a set of expandable/collapsible cards, one for each settings tab. Each card includes the following elements:
- Summary indicator: An indication of the status of the settings on the corresponding tab, which may be one of the following:
- Complete (green circle with checkmark): All required settings on the tab are complete and valid.
- Incomplete (empty circle): One or more settings on the tab aren't specified.
- Error (red octagon with X): One or more settings on the tab need to be corrected.
Note: Policies whose settings are incomplete or contain errors can't be saved. - Heading: The name of the tab whose settings are summarized in the card.
- Count (Thresholds only): The number of active thresholds in the policy.
- Expand/Collapse: An up/down arrow; click to toggle the card between expanded and collapsed.
- Summary: A read-only summary of the settings on the corresponding tab, with indicators for any errors that need to be corrected.
General Policy Settings
The General Settings tab of each policy settings page (add, edit, or clone policies) is used to define the overall properties of the policy. This tab includes the following configuration options:
- Name: The name of the policy. Maximum of 50 characters; must include at least one letter.
- Description (optional): A description of the policy; used to summarize what the policy looks at and indicate what it is used for.
- Policy Status: A switch enabling you to enable or disable the policy. The default is enabled.
- Labels: A selector from which you can apply one or more Labels to the policy:
- Click in the field to choose one of your organization’s existing labels from a drop-down list, which can be filtered by label name.
- Click the Add Label link to open the New Label Dialog, where you can add a new label.
- When labels are assigned they are displayed as lozenges in the field. - Policy Type (DDoS or Custom policies only): A drop-down that sets the type of the policy to DDoS or Custom. Query-based policies can only be created from Data Explorer (see Add a Query-based Policy).
- Policy Expires (Query-based policies only): Sets the length of time after which the policy will be disabled. Options include: Never, 15 days, 30 days, 60 days, or 90 days. The default is 90 days.
Notes:
- If you turn off the policy status switch, the expiration switches to Never.
- If the Policy Expires time is set to Never, the type of the policy changes from Query-based to Custom. - Suppress Alerts: A switch that turns Alert Suppressions on or off for this policy. If on (typically used while establishing a baseline for new policies), the policy won't enter alarm state (and trigger notifications and/or mitigations) until a specified date.
Notes:
- Alert suppressions are enabled by default for query-based policies.
- Alert suppressions can be enabled on a pattern basis on the Alert Suppressions Page. - Suppress Alerts End Date: If Suppress Alerts is switched on, this field specifies the date on which the alert suppressions will end. The default is seven days.
- Policy Dashboard (not present for NMS policies): A selector from which you can choose a dashboard that will be the destination of the Open Dashboard button for this alert on the Alerting page (see Alert-specific Actions). Click in the field to choose one of your organization’s existing Dashboards from a drop-down list, which can be filtered by dashboard name.
Policy Dataset Settings
The Dataset tab is used to narrow the subset of traffic that is evaluated for the thresholds in this alert. The tab is divided into two panes whose settings are described below:
Note: A policy's metrics can't be changed while the policy is in use by any of your tenants, which will be indicated in the following locations:
- Tenants will be listed by name in the Used by Tenant column of the Policies List.
- On the Edit Policy page, a banner at the top of the Dataset tab tells you how many tenants are using the policy and provides links to the Tenant Policy Settings for each such tenant.
Data Funneling
The settings in the Data Funneling pane define which network traffic will be evaluated by this policy:
- Data Sources: Displays the policy's current data sources and provides access to the dialog for selecting data sources:
- Data Sources list: A list of the devices and cloud resources that the traffic evaluated by this policy is going through, to, or from.
- Edit Data Sources: A button that opens the Data Sources Dialog so you can change the data sources covered by this policy. - Policy Dimensions: Controls listing the dimensions used to evaluate the traffic and enabling access to the dialog for selecting dimensions (see Policy Dimensions).
- Metrics: Controls listing the metrics currently selected for the policy and enabling access to the Metrics dialog for managing primary and secondary metrics (see Policy Metrics).
- Filters: Controls for the filters that are set to screen the traffic that is evaluated for the alert:
- Filters list: A list of filters currently applied to this policy (see About Filters).
- Edit Filters: A button that opens the Filtering Options dialog (see Filtering Options Dialog).
Note: A mitigation can be assigned to a policy (see Threshold Mitigations) only if the dimensions assigned with the Policy Dimensions setting above include source or destination IP/CIDR.
Policy Dimensions
The Policy Dimensions section (required) of the Dataset tab shows the dimensions used to evaluate the traffic and enables access to the dialog for managing the policy’s dimensions:
- Dimension list: A list of the dimensions (see About Dimensions) that combine to define a key.
- v4 CIDR & v6 CIDR: These two fields allow users to set custom IPv4 and IPv6 prefixes when any CIDR metrics dimensions are applied to the policy. These fields appear for the following dimensions: Destination IP/CIDR, Source IP/CIDR, Source Next Hop IP/CIDR, and Destination Next Hop IP/CIDR.
- Edit Dimensions: A button that opens a Dimensions dialog (see Dimension Selectors) to edit the dimensions of the key. The key definition determines how traffic is subdivided for evaluation (see About Keys). You cannot have more than eight dimensions.
Example: If the primary metric is packets/second, and the group-by dimensions are set to Source:AS Number and Destination:AS Number, then the top-X evaluation will involve looking at all unique combinations of source ASN and destination ASN to determine which combinations have the highest traffic volume as measured in pps.
Policy Metrics
The Metrics section of the Dataset tab shows the metrics currently selected for the policy, and enables access to the Metrics dialog for managing primary and secondary metrics:
- Primary: A lozenge indicating the metric selected as the primary unit (e.g. bits/s, packets/s, flows/s, etc.) by which ingested flow data will be evaluated to determine top-X. For a list of available metrics, see General Metrics and Host Traffic Metrics.
Example: If the primary metric is bits/second then the top-X evaluation will rank keys by bps. - Secondary: A lozenge indicating the metric used to specify multiple additional static comparators (see Policy Threshold Settings) that are based on a metric other than the primary metric. Each added comparator represents an additional condition that must be met to trigger an alarm. The Secondary Metrics selector supports simultaneous selection of two secondary metrics.
- Edit Metrics: A button that opens the Metrics dialog (see About the Metrics Dialog) where you can edit the primary and secondary metrics for the policy.
Note: The maximum number of metrics per policy is three: one primary and two secondary.
Building Your Dataset
The settings in the Building Your Dataset pane specify how the network traffic defined in the Data Funneling pane will be evaluated. By default, the pane is hidden. To show the pane, click Advanced Settings at the bottom of the tab.
The pane always includes the following controls:
- Evaluation Frequency: The interval at which newly ingested traffic data is grouped by dimensions and the traffic data represented by the resulting keys is evaluated. Options include 60 seconds (default), 2 minutes, and 5 minutes.
- Maximum Keys Per Evaluation: The number of keys (unique combinations of dimension values; see About Keys) to evaluate for a match with the conditions specified in the alert's thresholds. Maximum valid value is 300.
- Minimum Traffic Threshold: The minimum value, as measured by the primary metric, that a key must have to be included in the top-X ranking and evaluated for a threshold match.
- Specified value: Specify the minimum value (measurement units are determined by the primary metric).
- Auto-calculate: If checked (default), the value is auto-calculated using a formula based on the settings of the comparators in the policy’s thresholds (see Threshold Conditions) and the Minimum Traffic Threshold field is greyed out.
If the Dimensions list in Data Funneling includes more than one dimension, the Building Your Dataset pane will also include the following Dimension Grouping controls (see About Dimension Grouping):
- Group Dimensions: A switch to enable grouping by some dimensions before the final top-X evaluation.
- Dimensions to Group By (shown only if Group Dimensions is enabled): The number of dimensions, starting at the top of the Dimensions list, that will be included in the key definition used for grouping.
- Maximum Keys per Group (shown only if Group Dimensions is enabled): The number of keys from each group that will be included in the overall top-X for the alert. The maximum valid value is the lesser of 300 or the value of the Maximum Keys per Evaluation setting.
About Dimension Grouping
Dimension grouping introduces an additional layer of control into how the alerting system tracks the top-X keys for current traffic. Dimension grouping can help keep keys from a high-volume area of the infrastructure from dominating the top-X keys, so that the alerting system can pick up on significant changes in other areas as well.
With dimension grouping disabled, the key definition (set of dimensions) specified in the Dimensions field (see Data Funneling) is used as a single unit, resulting in keys that are each a unique combination of dimension values. These keys are then ranked by traffic volume to arrive at Top-X.
With dimension grouping on, traffic is instead evaluated in stages as follows:
- The dimensions specified in the Data Funneling section are split into two sets, one of which can be thought of as the “grouping set.”
- The first dimension in the grouping set is the first dimension in the Policy Dimensions field.
- The last dimension in the grouping set is the dimension whose position in the Dimensions field corresponds to the number specified with the Dimensions to Group By control.
- The remaining dimensions are in the non-grouping set.
- Example: Dimensions are Source ASN, Full Device, Destination Country, and Destination ASN. If Dimensions to Group By is set to 2, the grouping set is Source ASN and Full Device. - Traffic is initially evaluated as if the key definition is only the grouping set. The resulting groups will each represent the traffic having a unique combination of the dimensions in the grouping set (e.g. Source ASN and Full Device).
- The traffic in each of these groups is then evaluated as if the key definition is only the non-grouping set. The resulting keys within each group will each represent the traffic having a unique combination of the dimensions in the non-grouping set (e.g. Destination Country and Destination ASN).
- The top N keys will be taken from each group and merged into a single pool, with N determined by the Maximum Keys per Group control.
- The keys in this pool are ranked by volume, resulting in the overall top-X keys for the alert, with X being defined by the Maximum Keys Per Evaluation control.
Policy Threshold Settings
The settings of the Alert Thresholds tab are covered in the following topics:
About Alert Thresholds
A threshold is a collection of settings that define a set of conditions that must be matched for an alert policy to be activated, at which point the policy generates an alert for each key for which conditions have been matched. Each policy includes at least one threshold by default but may include up to five (Critical, Severe, Major, Warning, Minor).
General Threshold Settings
The Thresholds tab includes the following general settings:
- Diagram: A visualization of the query built out of the policy as it is currently configured. Only visible if the policy is enabled.
- Refresh: Click to refresh the visualization’s traffic data.
- View in Data Explorer: Click to open the query in Data Explorer (opens in a new tab). - Severity selector: Choose which threshold you are configuring: Critical, Severe, Major, Warning, or Minor.
- Threshold Status: Determines whether the threshold is currently enabled (evaluating traffic data, generating alarms, etc.). A threshold that is not currently needed can be disabled and retained for future use.
- Threshold Description: A field in which to enter a description of the traffic conditions the threshold is intended to monitor.
Threshold Conditions
The settings shown in the Conditions section determine what constitutes a match (see About Matches) for the threshold. These settings are configured in the Edit Threshold Conditions dialog. The dialog is accessed via the Edit Conditions button in the Conditions section.
Notes:
- The conditions specified for a given threshold will be ANDed (all specified conditions must be met for a key to trigger an alert).
- The settings in the Edit Threshold Conditions dialog for a given threshold are independent of the same dialog's settings for other thresholds.
Conditions General UI
The Edit Threshold Conditions dialog includes the following general fields and controls:
- Cancel buttons: To close the dialog, click the X in the upper right corner or the Cancel button at lower right. All elements will be restored to their values at the time the dialog was opened.
- Conditions tiles: A set of tiles that each define a condition (see Conditions Tile Types).
- Apply button: Click to apply the conditions to the policy. This will close the dialog and return you to the policy settings page.
Conditions Tile Types
The conditions for a given threshold (Critical, Severe, etc.) in a policy are set with the controls in the tiles of the Edit Threshold Conditions dialog. Each tile is used to specify one condition of a specific type. Depending on current policy settings (e.g. the metrics set on the policy's Dataset tab), the tiles for some condition types may be disabled (not available to set). If the tile for a given condition type is inactive then the reason that it's not available will be stated in a brief notification at the right of the tile.
By default, the Edit Threshold Conditions dialog includes tiles for the following condition types:
- Static Condition: Tile(s) for setting conditions based on any of the primary and secondary metrics specified in the policy's Dataset tab (see Static Condition). Click Show Chart to display a visualization of the metric’s data for the last three days.
Note: The default primary metric for a newly created policy is Packets/s. - Baseline Condition: A tile for setting a baseline condition (if enabled). See Baseline Condition.
- Top Keys Condition: A tile for setting a top keys condition (if enabled). See Top Keys Condition.
- Interface Capacity Condition: A tile for setting an interface capacity condition (if enabled). See Interface Capacity Condition.
- Ratio-Based Condition: A tile for setting a ratio-based condition (if enabled). See Ratio-Based Condition.
Note: The conditions enabled in the dialog will be displayed in the Conditions section of the tab (Critical, Severe, etc.) corresponding to the threshold for which the condition(s) were set.
Conditions Tile Common UI
Every conditions tile in the Edit Threshold Conditions dialog includes the following controls, which appear in full when the condition has been activated:
- Activate: A switch that activates/deactivates the condition. When activated, the tile is expanded, enabling you to specify the condition's settings.
- Condition statement: Indicates the condition being applied and a summary of the settings entered for that condition. Condition statements vary depending on the type of condition (see topics below).
- Show Chart/Hide Chart (present only for static and ratio-based conditions): A button that opens a chart showing the last three days of data (as defined by the policy’s data sources, filters, and dimensions) for the specified metric(s), with the static/baseline metric being displayed in orange. If the chart is shown, Hide Chart will hide it
- View in Data Explorer (present only when a static chart is open): A link (located under the chart) that takes you to Data Explorer, where the Query pane will be populated to match the settings of this policy’s Dataset tab.
- Refresh (present only when a static chart is open): A button that refreshes the data in the visualization.
Static Condition
A static condition compares the current value of a primary or secondary traffic metric that has been specified in the Metrics pane of the Dataset tab (see Policy Metrics) to a fixed value (number) that you specify in the tile. The condition is specified with the following controls:
- Condition value: A field in which to enter a number.
- Condition metric: A drop-down from which you can select the units in which the condition value is expressed.
The resulting condition statement takes the form of a sentence based on your entered number and metric, e.g. "If traffic is at least 100 packets/s." The static condition Is met when the current value of the metric meets or exceeds your specified condition value.
Notes:
- The default primary metric for a static condition is packets/s. Use the Metrics pane of the Dataset tab to change the primary metric and/or add secondary metrics. The tiles of the Edit Threshold Conditions dialog will reflect the Metrics pane settings at the time the dialog is opened.
- A static condition is required to set a Baseline Condition.
Baseline Condition
Baseline conditions compare the current value of a traffic metric with a "baseline" representing the historical value of the same metric from a specified period in the past. The Baseline condition tile is enabled only when both of the following are true:
- The dialog's Static Condition tile is activated for the primary metric.
- The dialog's Top Keys Condition tile is not activated.
The baseline condition is specified with the following controls:
- Comparison value: A field in which to enter a number.
- Unit of measure: A drop-down from which you can choose the unit of measure for the condition:
- %: The condition is met when the current metric value exceeds or falls below the baseline metric value by the percentage specified with the comparison value.
- Units over baseline: The condition is met when the current metric value exceeds or falls below the baseline metric value by at least the number of units (e.g. bit/s or packets/s) specified with the comparison value. - Comparison direction: A drop-down from which you choose the direction of the condition:
- Above: The condition is met when the measured value exceeds the baseline.
- Below: The condition is met when the measured value falls below the baseline.
The resulting condition statement takes the form of a sentence based on the values entered with the above controls, e.g. "If traffic is at least 20 % above the baseline."
Top Keys Condition
The Top Keys Condition tile is enabled only when the Baseline Condition is disabled. This condition will look for a change in the composition of the top-X keys (a key joining or leaving). The following controls determine how the top keys are evaluated:
- Comparison type: A drop-down from which you can choose the type of comparison:
- Joins: The condition is met when the given key joins the top-X keys in current traffic.
- Leaves: The condition is met when the given key leaves the top-X keys in current traffic. - Top Keys: The number of keys to evaluate.
Note: If the field is left blank, the number of keys to track is set to the value of the Maximum Number of Keys field in Building Your Dataset (default = 25).
The resulting condition statement takes the form of a sentence based on the values entered with the above controls, e.g. "If a given key joins the top 25 keys in current traffic."
Interface Capacity Condition
The Interface Capacity tile of the Edit Threshold Conditions dialog enables you to alert based on the capacity of an interface relative to the current utilization of that interface, which must exist on a data source whose traffic is included in the policy (see Data Funneling). The tile is disabled unless the following is true:
- The policy's dimensions include Source Interface and/or Destination Interface (see Policy Dimensions).
- The policy's metrics include bits/s (see Policy Metrics).
- No baseline condition is set in the Baseline Condition tile.
The condition is specified with the following controls:
- Comparison value: A field in which to enter a number.
- Comparison method: A drop-down from which you can choose the method used to compare utilization with capacity:
- Mbits/s: The condition is met when the current utilization of the interface (traffic volume), expressed in bits/s, exceeds the interface's capacity by at least the comparison value.
- %: The condition is met when the current utilization of the interface, expressed as a percent of the interface's capacity, exceeds the specified comparison value.
The resulting condition statement takes the form of a sentence based on the values entered with the above controls, e.g. "If utilization is at least 100 Mbits/s over interface capacity."
Ratio-Based Condition
The Ratio-Based Condition tile of the Edit Threshold Conditions dialog enables you to set a policy condition based on the ratio of any two metrics specified on the Dataset tab (see Data Funneling). The tile is disabled unless the policy's dataset includes at least two metrics.
The condition is defined with the following controls, which designate one metric as "Left" and the other as "Right":
- Metrics: Two control sets, one for the left metric and the other for the right. Each set has two parts, one for each part of the ratio:
- Metric value: A field in which you enter a numeric value for the metric.
- Metric: A drop-down from which you choose a metric. - Swap Metrics (left/right arrow icon): A button that moves the settings of the left metric controls to the right metric controls, and vice versa.
- Margin (shown only when Bidirectional is on): Active only when the metrics value fields are both set to 1, this field adds a fractional value to both parts of the ratio. For example, if the margin is set to 0.2 then the left:right ratio and the right:left ratio each become 1.2:1.
- Bidirectional: A switch that controls how the two parts of the ratio will be evaluated:
- When the switch is off, the condition will be met if the ratio of the left metric to the right metric exceeds the ratio defined by their respective metric values. For example if the left value is 10 and the right value is 1 then the condition is met when the ratio of the left metric to the right metric exceeds 10:1.
- When the switch is on, the condition will be met when the ratio of either metric to the other metric exceeds the ratio defined by their respective metric values, e.g. if the ratio of left to right exceeds 10:1 or the ratio of right to left exceeds 10:1.
Threshold Configuration
The settings of the Threshold Configuration pane (accessed via Advanced Options on the Thresholds tab) vary depending on the settings of the Edit Threshold Conditions dialog:
- If no baseline exists (present only when a Baseline Condition is set for the primary metric): A drop-down that sets what to do if a key in the primary top-X is not present in the comparison top-X (see No Existing Baseline).
- Default Value (present only when a Baseline Condition is set for the primary metric): A field in which to enter the value to use for comparison when If no baseline exists is set to “Use a Default Value” or if the Lowest or Highest Value can’t be determined from top keys (e.g. baselining hasn't yet started).
- Top Keys (always present): If specified, the following fields override (for this threshold only) the dataset's Maximum Keys Per Evaluation setting (see Building Your Dataset), which is stated in parentheses to the right of each field:
- Current keys: How many top keys to evaluate from current traffic.
- Baseline keys: How many top keys to store for the baseline.
No Existing Baseline
The If no baseline exists setting is shown only when a Baseline Condition is set for the primary metric (see About Historical Baselines). The setting is a drop-down that sets what to do if a key in the primary top-X is not present in the comparison top-X:
- Use Lowest Value from Top Keys (default): Compare the value of the key in the primary top-X set to the value of the last (lowest) key in the comparison set.
- Use a Default Value: Compare this key’s current value to the static value in the comparison value field.
- Do Not Alert: Classify as not a match on this key.
- Activate an Alert: Classify as a match on this key.
- Use Highest Value from Top Keys: Compare the value of the key in the primary top-X set to the value of the highest key in the comparison set.
Threshold Frequency
Kentik's alerting system defines a "match" as an individual instance of the network traffic evaluated by this alert policy matching the conditions defined for this threshold. The Threshold Frequency settings specify how many matches must occur within a given duration of time to trigger an alert:
- Times: The count of matches that must occur within the specified duration.
Note: If the number is set to 1, the time settings are irrelevant; an alert will be generated immediately upon the first match. - Duration value: The number of time units in the duration.
- Duration units: The time unit of the duration, either minutes or hours.
- Reset period: The number of match-free minutes after which the count of matches is reset to 0.
Threshold Actions
The settings in this section determine how the alerting system responds to an alert generated by this threshold. These settings include:
- Acknowledgment Required: If this switch is enabled, the alert must be manually acknowledged in the Alerts List or the Alert Details Drawer on the Alerting page, or on the Alert Details Page.
- Notifications: A drop-down from which you can choose notification channels that will be notified in the event of an alert for this threshold (see Threshold Notifications).
- Mitigations: A drop-down from which you can choose mitigations that will be triggered in the event of an alert for this threshold (see Threshold Mitigations).
Threshold Notifications
When an alert is triggered or otherwise changes state, notifications may be sent in various forms to designated parties at various destinations. The Notification Channels field is used to specify the notification(s) sent when an alarm is triggered. The notifications controls include:
- Notification Channels: A field that shows lozenges for each of the notification channels you’ve selected to add to this threshold:
- To add a channel, click in the field to drop down a filterable list of channels in your organization and click a notification channel. Repeat as needed.
- To remove a channel from this policy, click the X in the channel’s lozenge. - Add New Channel: Opens the Add Notification Channel dialog (Add or Edit Notification), where you can configure a new notification channel in your organization (see Notification Settings) and automatically add it to the policy.
- Test Notification Channels: Send a test notification to the recipients in all the selected notification channels.
Note: Test Notification Channels is visible only when there are channels in the Notifications Channels field.
Note: For more information about notification channels, see Notifications.
Threshold Mitigations
Note: The Mitigations section is only visible when the policy’s dimensions (see Data Funneling) include source or destination IP/CIDR. |
The Mitigations section enables you to set one or more mitigations (see About Mitigation) that will be triggered in response to an alert on this threshold.
Add Mitigation to Threshold
In its initial state, the Mitigations section includes the following controls:
- Mitigation selector: A drop-down list from which you can select a mitigation.
- Add Mitigation: A button that assigns the currently selected mitigation to this threshold.
Once a mitigation is assigned to the threshold, a configuration tile for the mitigation is displayed above the initial controls (which may still be used to add another mitigation). The tile includes the controls covered in Mitigation Settings.
Note: The mitigations assigned to a policy (automated mitigations) will escalate and de-escalate automatically as changing conditions match different thresholds in that policy.
Mitigation Settings
The mitigations tile for an individual mitigation that has been assigned to the threshold includes the following controls:
- Remove: A red X at upper right. Click to remove the mitigation from the threshold.
- Apply Mitigation: A drop-down that specifies when the mitigation will be applied:
- Immediately, when alert starts: Initiate the mitigation immediately when the threshold activates an alert.
- After user confirmation: Initiate mitigation action only after a user clicks the Approve and start the mitigation option in the actions at the right side of a given alarm's row in the Alerts List on the Alerting page.
- After user confirmation or timeout expires: Wait for a user to acknowledge or cancel the mitigation from the Alerting page. If the specified time period expires with no user action, then initiate the mitigation automatically.
- Application Timer (shown only when Apply Mitigation is set to “After user confirmation or timeout expires”): The duration (in minutes) of the timer. - Clear Mitigation: A drop-down that specifies when the mitigation will stop:
- Immediately, when alert ends: Stop mitigation immediately when the alert exits alarm state.
- After user confirmation: Continue mitigation (even after the alert ends) until it is canceled by a user.
- After user confirmation or timeout expires: Continue mitigation (even after the alert ends) until it is manually cancelled by a user or the specified time expires.
- Application Timer (shown only when Clear Mitigation is set to “After user confirmation or timeout expires”): The duration (in minutes) of the timer.
Multiple Mitigations
If desired, you can add one or more additional mitigations to the same threshold. Multiple mitigations are assigned as follows:
- Use the controls covered in Add Mitigation to Threshold to add another mitigation.
- Use the controls covered in Mitigation Settings to specify when the new mitigation will be applied and cleared.
The ability to apply multiple mitigations to a threshold enables you to simultaneously trigger all the mitigation methods/platforms (e.g. appliances at multiple sites) with which you’d like to respond to a given set of conditions, and to do so in a way that is much more scalable than by cloning a given policy for each of your appliances.
Support for multiple mitigations per threshold also enables the response for a given alarm to include a mix of mitigation types (e.g. RTBH, A10, and Radware). The following scenario, for example, outlines a multi-location DDoS response involving multiple mitigation types:
- De-preference or stop-announcing a BGP route on Location #1 by injecting a route whose community has been predefined as a flag for these actions.
- Announce a broader routing table entry, less specific than /24 (thus forcing acceptance by Internet peers), for Location #2.
- Trigger a third-party mitigation method (e.g. A10 or Radware) on Location #2 to announce more specific prefixes for internal re-direction to a scrubbing center.
Flowspec Mitigation Details
Mitigations whose platform is Flowspec will include a Traffic Matching card at the right of the standard Mitigation Settings tile. The dimensions listed in the card have either been populated with specific values or are inferred from values elsewhere in the policy. To see which Flowspec components (see Flowspec Component Types) are involved for this mitigation, click the View Method Details button, which opens the Mitigation Method Details dialog.
Policy Baseline Settings
The settings of the Baseline tab are covered in the following topics:
About Historical Baselines
Baselining enables the alerting system to trigger an alert based on a comparison of current traffic, more specifically the current value of a key (see About Keys), against a baseline value. Derived from historical traffic patterns, the baseline represents what's "normal" for that key. If the current value varies from the norm in a way that matches a condition specified in one of an alert policy's defined thresholds, then that threshold will trigger an alert.
The baselining process begins with a historical data set that has the following characteristics:
- It covers traffic that is "funneled" the same way as the current traffic to which it will be compared, which is defined on the Dataset tab (see Policy Dataset Settings).
- It covers a specified time range covering multiple hours or days (the "baseline window").
- It's drawn from time series data points that each represent the traffic volume of a top-X key for one time-slice, which is a duration (1, 2, or 5 min) determined by the Dataset tab's Evaluation Frequency setting.
We arrive at a baseline value for each key in two main stages:
- Building the baseline: We smooth out spikes or drops in normal traffic by normalizing/averaging the data points for each top-X key over the baseline window (based on the options in Building the Baseline). The result is a series of "buckets" that each cover a duration of one hour and include one value for each of the top-X keys.
- Using the baseline: We choose certain buckets (based on the options in Using the Baseline) and normalize/average the key values in those buckets into a final historical value for each key. This value is what's used for comparisons in the conditions of each threshold.
When a comparison of the current value with the final baseline value results in a match with the baseline condition defined for a given threshold then that threshold may — depending on its Threshold Frequency settings — trigger an alert.
Note: The settings for historical baselines in a given policy apply to all the thresholds in that policy that include one or more baseline conditions.
Baseline Presets
Kentik offers three presets that serve as a starting point for configuring baselining for a given policy. Once you choose a preset you can tailor settings to your specific needs with the controls that are revealed by clicking Advanced Options (see Building the Baseline and Using the Baseline). When you change any of the settings the baseline will automatically be reclassified as Custom.
The following presets are available:
- Default: Produces a general-purpose baseline for most alerting applications.
- Precision: Produces the most accurate baseline for highly detailed applications.
- Express: Rapidly produces a baseline that's less accurate but nonetheless useful for general applications.
All the above presets share the following approach:
- In Building the Baseline:
- Rollup aggregation: Set to 98th percentile. - In Using the Baseline:
- Bucket width: Each key's value for a given hour is based only the key's value in that hour's bucket.
- Final aggregation: Set to 95th percentile.
- Use separate patterns for weekdays and weekends: Disabled by default.
The table below shows how the presets differ.
Default | Precision | Express | |
Bucket depth: Number of top keys every hour | 25 | 300 | 25 |
Baseline window start | 1 day ago | 1 day ago | 1 hour ago |
Baseline window lookback | 4 weeks | 4 weeks | 1 week |
Completeness: Days of data accumulated before use | 4 | 14 | 2 |
Compare to: comparison data is derived from… | Current hour of the day, current day of the week | Current hour of the day, current day of the week | Every hour |
Building the Baseline
The controls of the Building the Baseline pane (accessed via Advanced Options on the Baseline tab) enable you to specify the historical data that will be included in an initial (“rollup”) aggregation pass. The controls are structured as three settings that together answer the question "How should we build the buckets that will be available for your baseline?":
- Window: Go back [duration] from [end point]. These settings determine the time range of the baseline window (e.g. "go back one week from one day ago"):
- Duration: The length of the time range. Options include 1, 2, 3, or 4 (default) weeks.
- End point: A drop-down that sets the end point of the time range for evaluation of historical data. The window will go back in time from this end point for the specified duration. Traffic data newer than the end point is excluded to prevent current spikes or anomalies from skewing baseline values. Options include 1 hour, 1 day (default), and 1 week ago. - Bucket depth: Use the top [##] of keys in every hour of the window. By default, this number is set to match the Maximum Keys per Evaluation setting on the Dataset tab (see Building Your Dataset).
- Rollup aggregation: For each one-hour bucket, derive each key's value based on the [minimum/maximumn/percentile] of the time series datapoints for that hour. Options include Minimum, Maximum, or Percentile: 98th (default), 95th, 50th, 25th, or 5th.
Note: The baseline window described above is "rolling," meaning that older traffic data continually ages out and newer traffic data is continually added.
Using the Baseline
The controls of the Using the Baseline pane (accessed via Advanced Options on the Baseline tab) enable you to define how the buckets defined in Building the Baseline are used to derive final comparison values. The controls are structured as four settings that together answer the question "Which buckets should we take key values from, and how should we derive a comparison value for each key?":
- Completeness: Don't use this baseline until it has at least [##] [time unit] of data. This minimum duration helps ensure that the baseline reflects sufficient information to establish what the "normal" value is for each key. Time unit options are hours or days.
- Number field: Enter a number representing time units.
- Time unit: Click the drop-down to select days (default) or hours. - Compare to: For each key, take baseline values from the buckets for [every/the current] hour of [every/the current] day of the week. Use these drop-down to determine which buckets are included in the final aggregation:
- All buckets in the window: every hour of every day.
- Buckets from the current hour of every day (default).
- Buckets from the current hour of the current day.
Note: The availability of these options depends on the Window settings in Building the Baseline. - Bucket width: For each key, derive the value representing each "compare to" hour from [just that hour/surrounding hours].
- If bucket width is "just that hour" then the value of each key for that hour is taken only from the bucket for that hour.
- If bucket width is "surrounding hours" then an additional line appears, enabling the key values for a given hour to be derived from a number of hours on either side (see Baseline Bucket Width). - Final aggregation: Derive the final comparison value for each key from the [minimum/maximumn/percentile] of the key's values for the "compare to" hours. This drop-down specifies how the values for a given key in the chosen buckets will become a single comparison value for that key. Options include maximum, minimum, and percentile: 99th, 98th, 95th (default), 90th, 80th, 50th, 25th, 10th, or 5th.
Note: Policies watching for activity in excess of baseline typically use Maximum, 98th, or 95th percentile aggregation. Policies watching for activity below baseline typically use 25th percentile aggregation.
Baseline Bucket Width
The buckets resulting from Building the Baseline each cover one hour in the overall baseline window, with one value per key per bucket. The Bucket width setting in Using the Baseline enables you to replace those single-hour values for each key with values that are derived from up to four hours on either side of the bucket's nominal hour. This additional aggregation step is specified in the following line, which appears when Bucket width is specified as "surrounding hours": Aggregate [#] hours before and after using [minimum/maximumn/percentile].
The number of hours that can be aggregated into a single bucket value ranges from 1 to 4. The method of deriving the bucket value from multiple hours may be minimum, maximum, or a percentile: 5th, 10th, 25th, 50th, 75th, or 90th.
Weekend Baselining
The final section of the Advanced Options section on the Baseline tab is the Use separate patterns for weekdays and weekends checkbox. When this box is checked, the baselines for weekends (UTC Saturday and Sunday) will be calculated independently from the baselines for weekdays (UTC Monday through Friday), thereby accounting for variations in traffic patterns caused by day of week.
NMS Threshold Policies
The following topics cover the NMS-specific aspects of creating an NMS threshold policy in Kentik's alerting system:
Note: Most of the settings for NMS threshold policies are the same as for other threshold policies, and are covered in Alert Policies.
General NMS Settings
The General tab of each policy settings page is used to define the overall properties of the policy. When creating an NMS threshold policy, you need to set the Policy Type to NMS on the General tab. The rest of the settings are explained in General Policy Settings.
Note: When the policy type is NMS, the General tab does not include the Policy Dashboard setting.
Dataset NMS Settings
The Dataset tab is used to narrow the subset of traffic that the policy evaluates for its thresholds. The Data Funneling section is the only section on this tab whose settings are unique to NMS. For information about the remaining sections, see Building Your Dataset.
The settings in the Data Funneling pane parallel the following settings panes on the Query sidebar of Metrics Explorer:
- Measurement: The Measurement Pane is used to define specifically what you'd like the query to look at. The pane's settings in an alert policy are the same as those in Metrics Explorer except in a policy you can only select one metric.
- Filtering: The Filters Pane includes controls that enable you to narrow the scope of your query or narrow the results returned from the query. The pane's settings in an alert policy are the same as those in Metrics Explorer.
Thresholds NMS Settings
A threshold is a collection of settings that define a set of conditions that must be matched for an alert policy to be activated, at which point the policy generates an alert for each key for which conditions have been matched. Each policy includes at least one threshold by default but may include up to five (Critical, Severe, Major, Warning, Minor).
Policy thresholds are configured on the policy's Thresholds tab. The tab is the same for NMS settings as for other threshold policies (see Policy Threshold Settings), except for the settings in the Conditions section, which determine what constitutes a match (see About Matches) for the threshold.
The Conditions section includes the following UI elements:
- Static condition: An indicator showing the metric selected in the Data Funneling section on the Dataset tab, followed by a lozenge indicating the value set for the condition's metric.
- Baseline condition: A lozenge indicating the current setting for the baseline condition.
- Edit Conditions: A button that opens the Edit Threshold Conditions dialog, where the static and baseline conditions are set.
Note: The Static and Baseline conditions above aren't shown in the Conditions section until they've been set in the dialog.
About NMS Conditions
For NMS threshold policies, all conditions are based on the metric selected in the Data Funneling section on the Dataset tab. The conditions are set in the Edit Threshold Conditions dialog, which enables you to set one of each of the following condition types:
- Static: The current metric value is compared with a fixed number specified by the user.
- Baseline (only if there's already a static condition): A comparison of the current metric value with a historical baseline for the same metric.
Edit Threshold Conditions
The Edit Threshold Conditions dialog is accessed via the Edit Conditions button in the Conditions section of a policy's Thresholds tab. The dialog includes the following general fields and controls:
- Conditions tiles: A set of tiles that each define a condition (see About NMS Conditions).
- Cancel: Buttons — a Cancel button at lower right and an X in the upper right corner — that close the dialog without adding or changing any conditions.
- Apply: Apply the conditions to the policy. This will close the dialog and return you to the policy settings page.
Note: The settings in the Edit Threshold Conditions dialog for a given threshold are independent of the conditions set for other thresholds.
NMS Conditions Tiles
Each condition exists in a tile that includes the following controls, which are used to specify the condition:
- Enable/disable: A switch that turns on or off the condition.
Note: The baseline condition can't be turned on unless the static condition is on. - Condition type: A label that indicates whether the condition is static or baseline.
- Condition metric: A lozenge indicating the metric evaluated for the condition
- Condition value: Controls that set the value that constitutes a match for the condition (see Metrics Condition Match).
- Show/Hide Chart (static condition only): A button that, if no chart is shown, expands the dialog to show a chart that plots data for the specified metric from the last three hours (default), last day, or last seven days. If the chart is shown, Hide Chart will hide it.
Metrics Condition Match
The controls that determine what constitutes a match depend on the type of the condition. A static condition is met when the current metric value exceeds the condition value specified with the following controls:
- Condition value: A field in which to enter a number.
- Condition units: The metric selected in the Dataset Funneling section of the Dataset tab (see Dataset NMS Settings).
A baseline condition is met when the current metric value exceeds the baseline metric value to the extent specified with the following controls:
- Comparison value: A field in which to enter a number.
- Comparison type: A drop-down from which you can choose the type of comparison:
- %: The condition is met when the current metric value exceeds or falls below the baseline metric value by the percentage specified with the comparison value.
- Units: The condition is met when the current metric value exceeds or falls below the baseline metric value by at least by the number of units (e.g. bit/s or packets/s) specified with the comparison value. - Above/Below: Chose whether you would like the condition to evaluate the chosen percentage or unit value above or below the baseline.