Alert Policies

Note: Need help with alert policy configuration? Contact Customer Support.

Management of the alert policies used by Kentik's alerting system is covered in the following topics:

Notes:
- For general information about policy-based alerting, see Policy Alerts Overview.
- For information on policy templates, see Policy Templates.
- For information on active or historical alerts, see Alerting.
- For information on alert-related notifications, see Notifications.
- For information on mitigation for alerts, see Mitigations.

Your organization's alert policies are managed from the Policies page.
 

Policies Page

Alert policies are managed from the Policies page, which is covered in the following topics:

Note: For information on configuring a policy, see Policy Settings.

 
top  |  section

About the Policies Page

The Policies page, reached via the Manage Alert Policies button on the Alerting page is used to manage alert policies for Kentik's alerting system (see Alerts and Policies). This page displays the Policies List, which is a table listing the alert policies that are currently available to your organization. Policies can be added, duplicated, removed, and edited using this list.

 
top  |  section

Policies Page UI

The Policies page is made up of the following UI elements:

  • Policy Templates button: A button (upper right) that opens the Policy Templates page, where you can manage templates, including adding a policy from a template (see Add Policy from Template).
  • Policies statistics: Just below the page title, these fields indicate the number of policies created and enabled in your organization and give the maximum allowable number for each.
  • Add Policy: A button (upper right) with two parts:
    - Add Policy: Go to the Add Policy page (see Policy Settings), where you can configure and save a new policy.
    - Open menu (down arrow at right): Open a drop-down from which you can choose Add Policy from Template, which opens an Add Policy from Template dialog that enables you to create a policy from a policy template.
  • Filters (funnel icon): A button that toggles the Filters pane between expanded and collapsed.
  • Search field: A field that you can use to narrow the policies shown in the Policies List. If text is entered, the list will show only policies that match the text in the ID, Name, or Description column. The field will also display any filters already applied with the Filters pane.
  • Filters pane: A set of Policy Filters that you can use to narrow the alert policies shown in the Policies List.
  • Policies list: A table listing the policies in your organization and giving information about each (see Policies List).
  • Policy Details drawer: Click on any policy in the Policies list to open a drawer from the right of the page that contains a summary of that policy’s settings (see Policy Details Drawer).

Note: While Administrators can see and use all of the controls listed throughout this article, Members can only:

  • See and filter the policies displayed in the Policies list;
  • Open a policy's Policy Details Drawer;
  • Enable or disable policies.
 
top  |  section

Policies List

The Policies list is a table that shows all of the alert policies that are currently available in your organization. The table includes the following columns of information and actions for each policy:

  • Selected: A checkbox in the leftmost column that enables you to select one or more policies at once. Once selected, you can either delete, enable, or disable those policies.
    Notes:
    - To select all policies at once, select the checkbox in the header row of the Selected column.
    - Selecting Remove will bring up a Remove Policies dialog in which you can confirm (or cancel) removal of the selected policies from your organization's collection of alert policies.
  • Status: The policy will display as either Disabled (gray) or Enabled (green).
  • Type: The type of policy (DDoS, Query-based, or Custom; see Policy Types).
  • ID: The system-generated unique ID assigned when the policy was created.
  • Name: The user-specified name of the policy. The policy’s user-specified description, if entered, is presented below the name.
  • Data Sources: Either “All Data Sources” or the set of devices covered by the query for the policy. Devices are selected on the Dataset tab of the Policy Settings.
  • Metrics: The units (e.g. bits/s, packets/s, flows/s, etc.) by which this alert measures incoming flow data (see Data Funneling). The primary metric is listed first, followed by secondary metrics (if any).
  • Dimensions: The dimensions defined in the policy, which combine to make a key definition that will determine how traffic is subdivided for evaluation (see About Keys). Dimensions, which are based on fields in the KDE main table, are described in Dimensions Reference.
  • Actions: The following actions are available from the Action menu (vertical ellipsis) of each policy:
    - Enable/Disable Policy: Enable a disabled policy or disable an enabled policy.
    - Edit Policy: Go to the Edit Policy page where you can edit the policy's settings.
    - Clone Policy: Go to the Clone Policy page, where the settings of a new policy will be populated with the values of the policy that you cloned (see Clone a Policy).
    - Remove: Open a confirming dialog that allows you to delete the policy from your organization's collection of alert policies.
 
top  |  section

Policy Filters

The policies displayed in the Policies list can be filtered using the controls in the Filters pane on the left. For filter categories with a set of checkboxes (e.g. Type):

  • If any boxes are checked, only policies that match those boxes will be included.
  • If no boxes are checked, there will be no matching on that category.

The pane includes the following elements:

  • Close: Click the < in the upper right corner to close the Filters pane. Click the Filters button (funnel icon) above to show it again.
  • Clear all (appears only when you’ve specified one or more filters): Click to clear all current filters.
  • Type: The types of policies to include in the list (DDoS, Query-based, or Custom; see Policy Types).
  • Status: The policy statuses to include in the list (Enabled and/or Disabled).
  • Policy ID: A field with which you can filter the list to one specific policy by entering a full policy ID (no partial matching).
 
top  |  section

Policy Details Drawer

The Policy Details drawer is a read-only display of policy settings, including details that may not be displayed directly in the Policies list. Click anywhere on the row of a policy to open its details drawer.

The drawer contains the following elements:

  • ID number: The system-generated unique ID assigned when the policy was created.
  • Name: The user-specified name of the policy.
  • Actions: The following actions are available from the Action menu (vertical ellipsis) to the right of the policy name:
    - Edit Policy: Go to the policy's Edit Policy page to edit its current settings.
    - Enable/Disable Policy: Enable a disabled policy or disable an enabled policy.
    - Clone Policy: Go to the Clone Policy page, where the settings of a new policy will be populated with the values of the policy that you cloned (see Clone a Policy).
    - Remove: Open a confirming dialog that allows you to delete the policy.
  • Status: A lozenge indicating the policy’s status, either enabled (green) or disabled (gray).
  • Description: A description of the policy, if any was provided by the policy's creator.
  • Policy Dashboard: The dashboard that is set as the destination of the Open Dashboard button for any alert listed in the Alerting list. See Policy Dashboard under General Policy Settings. A lozenge labeled Preset will be present if the dashboard is a Kentik-provided preset.
  • Dataset: An expandable/collapsible pane containing a summary of the current settings of the Dataset tab of the policy (see Policy Dataset Settings).
  • Thresholds: An expandable/collapsible pane containing a summary of the current settings of the Thresholds tab of the policy (see Policy Threshold Settings). The number in brackets indicates the number of thresholds in use for the policy.
  • Baseline: An expandable/collapsible pane containing a summary of the current settings of the Baseline tab of the policy (see Policy Baseline Settings).
 

Adding a Policy

The addition of a policy to your organization's collection of alert policies is covered in the following topics:

Note: To add a policy your user level must be Administrator or above.

 
top  |  section

Policy Creation Methods

You can add an alert policy with any of the following methods:

  • Create a policy from scratch via the Add Policy button, which takes you to a blank Add Policy page (see Policy Settings Pages).
  • Create a policy from Data Explorer using the Create Alert Policy action (see Add a Query-based Policy).
  • Duplicate an existing policy using the Clone Policy button in the Action menu at the right of each row in the Policies list (see Clone a Policy).
  • Add a policy based on a policy template (see Use a Policy Template).

The settings that you'll make when adding a policy are covered in Policy Settings. When you've finished specifying the settings, click Save. The new policy will be listed in the Policies list.

Note: The first time you access a policy settings page you’ll see a dialog offering a tour of the page.

 
top  |  section

Add a Query-based Policy

A query-based policy is a time-limited policy built from contextual Data Explorer queries. You can create a policy directly from Data Explorer using all of the criteria (dimensions, metrics, filters) you’re using to view your data.

To create a query-based policy:

  1. In Data Explorer, create a query with the necessary data sources, dimensions, metrics, and filters.
  2. On the SubNav, click the Actions button to display a drop-down menu.
  3. Select Create Alert Policy. The criteria used in the query (dimensions, metrics, etc.) automatically populates a new policy on the Add Query-Based Policy page.
  4. On the General tab, provide a name for the policy.
  5. On the Thresholds tab, click Edit Conditions and add at least one condition (see Threshold Conditions).
  6. Check the indicators in the Policy Summary Pane to see if you have any missing fields or errors in the policy.
  7. Click Save. You will be taken back to the Data Explorer page from which you created the policy. The policy will now appear in the Policies list on the Policies page.

Note: On the General tab (see General Policy Settings), if you set the Policy Expires time to Never or you disable Policy Status, the type of the policy changes from Query-based to Custom.

 
top  |  section

Clone a Policy

Cloning allows you to duplicate a policy so that it can be modified without altering the original. To clone a policy:

  1. In the Policies list on the Policies page, find the policy that you'd like to clone.
  2. From the Actions menu at the right of the policy's row, choose Clone Policy, which takes you to the Clone Policy page, where the settings of a new policy will be populated with the values of the policy that you cloned.
  3. Change the name of the new policy so that it's distinct from the original policy you cloned.
  4. Change any other settings on the tabs of the Clone Policy page to tailor them to your requirements for the new policy.

Note: Once a new policy has been created by cloning it will be added to the Policies list.

 
top  |  section

Use a Policy Template

Templates are preconfigured policies provided by Kentik as the starting point for creating a policy for your organization. Kentik provides templates for many common alerting needs. Templates are not intended to be used as-is without being customized to your network and traffic situation.

There are two routes to adding a policy from a template:

  • Policies page: On the Policies page, click the drop-down portion of the Add Policy button and choose Add Policy from Template to open the Add Policy from Template dialog. Choose a template from the drop-down, then click Continue.
  • Policy Templates page: On the Policies page, click the Policy templates button to go to the Policy Templates page, then click the Clone icon at the right of a row in Templates list.

Both of the above methods lead you to an Add Policy page whose settings (see Policy Settings) are already filled in with the default settings of the template. Tailor the settings to your specific needs and save the new policy.

Add Policy from Template

The Add Policy from Template dialog includes the following settings and controls:

  • Cancel buttons: To cancel the action and close the dialog, click the X in the upper right corner or the Cancel button at lower right.
  • Drop-down menu: Display the menu to see a list of existing templates that you can use as the basis for your new policy.
  • Go to Policy Templates: A link that takes you to the Policy Templates page (see Policy Templates).
  • Template description: A description of the template and what it is designed to detect.
  • Continue: A button that takes you to the Add Policy page, whose tabs will be populated with the settings of the template you selected from the drop-down menu. You can tailor the new policy to the specific needs of your organization by adjusting these settings.
    Note: The Continue button will be greyed out until you select a policy template from the drop-down menu.
 

Policy Settings

The pages and dialogs used to specify policy settings are covered in the following topics:

 
top  |  section

Policy Settings Pages

Alert policy settings are accessed on the following policy settings pages; the specific page used depends on the situation (add, edit, clone, etc.):

  • Add Policy page: Used to specify the settings of a new custom policy. Accessed from the following locations:
    - From the Policies page via the Add Policy button at upper right.
    - From the Add Policy from Template dialog, accessed as described in Use a Policy Template.
  • Edit Policy page: Used to edit an existing policy. Accessed from the Policies page via Edit Policy on the policy's Action menu (at right of row).
  • Clone Policy page: Used to clone an existing policy (see Clone a Policy). Accessed from the Policies page via Clone Policy on the policy's Action menu (at right of row).
  • Add Query-Based Policy page: Used to create a policy from a query in Data Explorer. Accessed from Data Explorer via Create Alert Policy on the Action menu (in the SubNav at top of page). The settings (dimensions, metrics, etc.) in Data Explorer's Query sidebar will be used to populate the settings of the policy (see Add a Query-based Policy).
 
top  |  section

Policy Settings Page UI

The policy settings pages share the same layout and the following common UI elements:

  • Cancel button: Click Cancel in the upper right corner to return to the Policies page and cancel the add/edit/clone policy operation. All elements will be restored to their values at the time the page was opened.
  • Save button: Save changes to policy settings and return to the Policies page. This button will be active only when the settings on all tabs are complete and error-free (see the indicators in the Policy Summary Pane).
  • Settings tabs: The tabs where the policy settings are made (see Policy Settings Tabs).
  • Summary pane: The Policy Summary Pane contains a summary with an expandable/collapsible card for each tab. Each card includes a high-level overview of the tab's settings and an indicator of its status.

Policy Summary Pane

The Summary pane is situated on the right side of the policy settings page and features expandable cards that each correspond to a settings tab of the same name. Each card includes the following elements:

  • Summary indicator: An indication of the status of the settings on the corresponding tab, which may be one of the following:
    - Complete (green checkmark): All required settings on the tab are complete and valid.
    -Incomplete (empty circle): One or more settings on the tab aren't specified or have an invalid value (e.g. negative number).
  • Heading: The name of the tab whose settings are summarized in the card.
    Note: The number beside the Thresholds heading is a count of active thresholds in the policy.
  • Expand/Collapse: An up/down arrow; click to toggle the card between expanded and collapsed.
  • Summary: A read-only summary of the settings on the corresponding tab, with indicators for any errors that need to be corrected.
 
top  |  section

Policy Settings Tabs

On each of the policy settings pages (add, edit, or clone), policies are configured on the following tabs:

  • General: Used to define the overall properties of the policy; see General Policy Settings.
  • Dataset: Used to narrow the subset of traffic that is evaluated for all thresholds in this alert; see Policy Dataset Settings.
  • Thresholds: Used to specify up to five collections of conditions that trigger the alert to enter alarm state (see Alert Status); see Policy Threshold Settings.
  • Baseline: Used to configure the baselines against which current traffic is compared to determine if there is a deviation from the norm; see Policy Baseline Settings.
 

General Policy Settings

The General Settings tab of each policy settings page (add, edit, or clone policies) is used to define the overall properties of the policy. This tab includes the following configuration options:

  • Name: The name of the policy. Maximum of 50 characters; must include at least one letter.
  • Description (optional): A description of the policy; used to summarize what the policy looks at and indicate what it is used for.
  • Policy Status: A switch enabling you to enable or disable the policy. The default is enabled.
  • Policy Expires (query-based policies only): Sets the length of time after which the policy will be disabled. Options include: Never, 15 days, 30 days, 60 days, or 90 days. The default is 90 days.
    Notes:
    - If you switch the policy status to disabled, the expiration switches to Never.
    - If the Policy Expires time is set to Never, the type of the policy changes from Query-based to Custom.
  • Silent Mode: Prevents the alert from entering alarm state (and triggering notifications and/or mitigations) until the specified date. Use for new policies to allow time to establish baselines. Silent mode is enabled by default for query-based policies.
    Note: This switch enables silent mode for this specific policy. To enable silent mode on a pattern basis instead, see Silent Mode.
  • Silent Mode End Date: If silent mode is enabled, this field specifies the date on which the alert will exit silent mode. The default is seven days.
  • Policy Dashboard: A dashboard that will be the destination of the Open Dashboard button for any alert listed in the Alerting list. Click to open the drop-down menu displaying all dashboards available to use. Click a dashboard on the list to select it.
 

Policy Dataset Settings

The Dataset tab is used to narrow the subset of traffic that is evaluated for the thresholds in this alert. The tab is divided into two panes whose settings are described below:

 
top  |  section

Data Funneling

The settings in the Data Funneling pane define which network traffic will be evaluated by this policy:

  • Data Sources: Controls used to choose the devices the traffic is going through, to, or from:
    - Data Sources list: A list of the devices currently assigned to this dataset. To remove a device from the list, click the X at the right of the device's row.
    - Edit Data Sources: A button that opens the Data Sources Dialog, so you can choose the specific data sources to include.
  • Dimensions: Controls used to choose the dimensions used to evaluate the traffic:
    - Dimension list: A list of the dimensions (see About Dimensions) that combine to define a key.
    - v4 CIDR & v6 CIDR: These two fields allow users to set custom IPv4 and IPv6 prefixes when any CIDR metrics dimensions are applied to the policy. These fields appear for the following dimensions: Destination IP/CIDR, Source IP/CIDR, Source Next Hop IP/CIDR, and Destination Next Hop IP/CIDR.
    - Edit Dimensions: A button that opens a dimensions dialog (see Dimension Selectors) to edit the dimensions of the key. The key definition determines how traffic is subdivided for evaluation (see About Keys). You cannot have more than eight dimensions.
    Example: If the primary metric is packets/second, and the group-by dimensions are set to Source:AS Number and Destination:AS Number, then the top-X evaluation will involve looking at all unique combinations of source ASN and destination ASN to determine which combinations have the highest traffic volume as measured in pps.
  • Metrics: Controls used to select metrics used measure flows.
    - Primary: A lozenge indicating the metric selected as the primary unit (e.g. bits/s, packets/s, flows/s, etc.) by which ingested flow data will be evaluated to determine top-X. For a list of available metrics, see General Metrics and Host Traffic Metrics.
    Example: If the primary metric is bits/second then the top-X evaluation will rank keys by bps.
  • - Secondary: A lozenge indicating the metric used to specify multiple additional static comparators (see Policy Threshold Settings) that are based on a metric other than the primary metric. Each added comparator represents an additional condition that must be met in order to trigger an alarm. The Secondary Metrics selector supports simultaneous selection of two secondary metrics.
    - Customize Metrics: A button that opens the Metrics dialog (see About the Metrics Dialog) where you can edit the primary and secondary metrics for the policy.
    Note: A policy only supports the use of three metrics: one primary and two secondary.
  • Filters: Controls for the filters that are set to screen the traffic that is evaluated for the alert:
    - Filters list: A list of filters currently applied to this policy (see About Filters).
    - Edit Filters: A button that opens the Filtering Options dialog (see Filtering Options Dialog).

Note: A mitigation can be assigned to a policy (see Threshold Mitigations) only if the dimensions assigned with the Dimension setting described above includes source or destination IP/CIDR.

 
top  |  section

Building Your Dataset

The settings in the Building Your Dataset pane specify how the network traffic defined in the Data Funneling pane will be evaluated. By default the pane is hidden. To show the pane, click Advanced Settings at the bottom of the tab.

The pane always includes the following controls:

  • Evaluation Frequency: The interval at which newly ingested traffic data is grouped by dimensions and the traffic data represented by the resulting keys is evaluated. Options include 60 seconds (default), 2 minutes, and 5 minutes.
  • Maximum Keys Per Evaluation: The number of keys (unique combinations of dimension values; see About Keys) to evaluate for a match with the conditions specified in the alert's thresholds. Maximum valid value is 300.
  • Minimum Traffic Per Key: The minimum value, as measured by the primary metric, that a key must have to be included in the top-X ranking and evaluated for a threshold match.
    - Auto: If checked (default), the value is auto-calculated using a formula based on the settings of the comparators in the policy’s thresholds (see Threshold Conditions).
    - Specified value: If Auto is not checked, specify the minimum value (measured in packets/s).

If the Dimensions list in Data Funneling includes more than one dimension then the Building Your Dataset pane will also include the following Dimension Grouping controls (see About Dimension Grouping):

  • Group Dimensions: A switch to enable grouping by some dimensions before the final top-X evaluation.
  • Dimensions to Group By (shown only if Group Dimensions is enabled): The number of dimensions, starting at the top of the Dimensions list, that will be included in the key definition used for grouping.
  • Maximum Keys per Group (shown only if Group Dimensions is enabled): The number of keys from each group that will be included in the overall top-X for the alert. The maximum valid value is the lesser of 300 or the value of the Maximum Keys per Evaluation setting.

About Dimension Grouping

Dimension grouping introduces an additional layer of control into how the alerting system tracks the top-X keys for current traffic. Dimension grouping can help keep keys from a high-volume area of the infrastructure from dominating the top-X keys, so that the alerting system can pick up on significant changes in other areas as well.

With dimension grouping disabled, the key definition (set of dimensions) specified in the Dimensions field (see Data Funneling) is used as a single unit, resulting in keys that are each a unique combination of dimension values. These keys are then ranked by traffic volume to arrive at Top-X.

With dimension grouping on, traffic is instead evaluated in stages as follows:

  • The dimensions specified in the Data Funneling section are split into two sets, one of which can be thought of as the “grouping set.”
    - The first dimension in the grouping set is the first dimension in the Policy Dimensions field.
    - The last dimension in the grouping set is the dimension whose position in the Dimensions field corresponds to the number specified with the Dimensions to Group By control.
    - The remaining dimensions are in the non-grouping set.
    - Example: Dimensions are Source ASN, Full Device, Destination Country, and Destination ASN. If Dimensions to Group By is set to 2, the grouping set is Source ASN and Full Device.
  • Traffic is initially evaluated as if the key definition is only the grouping set. The resulting groups will each represent the traffic having a unique combination of the dimensions in the grouping set (e.g. Source ASN and Full Device).
  • The traffic in each of these groups is then evaluated as if the key definition is only the non-grouping set. The resulting keys within each group will each represent the traffic having a unique combination of the dimensions in the non-grouping set (e.g. Destination Country and Destination ASN).
  • The top N keys will be taken from each group and merged into a single pool, with N determined by the Maximum Keys per Group control.
  • The keys in this pool are ranked by volume, resulting in the overall top-X keys for the alert, with X being defined by the Maximum Keys Per Evaluation control.
 

Policy Threshold Settings

The settings of the Alert Thresholds tab are covered in the following topics:

 
top  |  section

About Alert Thresholds

A threshold is a collection of settings that define a set of conditions that must be matched in order for an alert policy to be activated, at which point the policy generates an alert for each key for which conditions have been matched. Each policy includes at least one threshold by default but may include up to five (Critical, Severe, Major, Warning, Minor).

 
top  |  section

General Threshold Settings

The Thresholds tab includes the following general settings:

  • Diagram: A visualization of the query built out of the policy as it is currently configured. Only visible if the policy is enabled.
    - Refresh: Click to refresh the visualization’s traffic data.
    - View in Data Explorer: Click to open the query in Data Explorer (opens in a new tab).
  • Severity selector: Choose which threshold you are configuring: Critical, Severe, Major, Warning, or Minor.
  • Threshold Status: Determines whether or not the threshold is currently enabled (evaluating traffic data, generating alarms, etc.). A threshold that is not currently needed can be disabled and retained for future use.
  • Threshold Description: A field in which to enter a description of the traffic conditions the threshold is intended to monitor.
 
top  |  section

Threshold Conditions

The settings shown in the Conditions section determine what constitutes a match (see About Matches) for the threshold. These settings are configured in the Edit Threshold Conditions dialog. The dialog is accessed via the Edit Conditions button in the Conditions section.

Note: The settings in the Edit Threshold Conditions dialog for a given threshold are independent of the same dialog's settings for other thresholds.

Conditions General UI

The Edit Threshold Conditions dialog includes the following general fields and controls:

  • Close buttons: To close the dialog, click the X in the upper right corner or the Cancel button at lower right. All elements will be restored to their values at the time the dialog was opened.
  • Conditions tiles: A set of tiles that each define a condition (see Conditions Tile Types).
  • Apply button: Click to apply the conditions to the policy. This will close the dialog and return you to the policy settings page.

Conditions Tile Types

A conditions tile is a set of controls used to specify one condition for the threshold. By default, the Edit Threshold Conditions dialog includes the following standard types of conditions tiles:

  • Metrics Condition: Tile(s) that display any selected metrics condition(s). Can be static and/or baseline condition(s). Click Show Chart to display a visualization of the metric’s data for the last three days. See Metrics Condition.
    Note: The default primary metric for a newly created policy is Packets/s. You can change this on the Dataset tab (see Policy Dataset Settings).
  • Top Keys Condition: A tile that displays the top keys condition (if enabled). See Top Keys Condition.
  • Interface Capacity Condition: A tile that displays the interface capacity condition (if enabled). See Interface Capacity Condition.
  • Ratio-Based Condition: A tile that displays the ratio-based condition (if enabled). See Ratio-Based Condition.

Notes:
- In addition to the above tiles, if a secondary metric is specified in the policy's Dataset tab then the Edit Threshold Conditions dialog will also include a tile for the secondary metric.
- If no settings are made in a given tile in the dialog then no condition of that type will be displayed in the Conditions section on the tab for the corresponding threshold on the policy's settings page.

Conditions Tile Common UI

A conditions tile is a set of controls used to specify one condition for the threshold. Each such tile in the Edit Threshold Conditions dialog includes the following controls, which appear in full when you click the tile's Add Condition button:

  • Add Condition: A button that expands the tile for a condition, enabling you to specify the condition's settings.
  • Condition statement: Indicates the condition being applied and a summary of the settings entered for that condition. Condition statements vary depending on the type of condition (see topics below)
  • Show Chart (present only for metrics and ratio-based conditions): A button that opens a chart showing the last three days of data (as defined by the policy’s data sources, filters, and dimensions) for the specified metric(s), with the static/baseline metric being displayed in orange.
  • View in Data Explorer (present only for metrics): A link (icon to the left of the Add Condition button) that takes you to Data Explorer, where the Query pane will be populated to match the settings of this policy's Dataset tab. When the Show Chart button has been clicked the same link will also appear as part of the UI of the chart.
  • Refresh (present only when a chart is open): A button that refreshes the data in the visualization.
  • Remove (trash icon): A button that removes the condition from the threshold.

Metrics Condition

By default the tile for a metrics condition (primary or secondary) includes a static condition in which current traffic is compared with a fixed number specified by the user. A metrics condition may optionally include as well a further requirement based on a comparison of current traffic with a historical baseline. The overall metrics condition defined by the tile's settings is stated to the right of the tile's heading, and will be met when the evaluated traffic meets both the static condition and the baseline condition (if any).

The static condition is met when the current metric value exceeds the condition value specified with the following controls:

  • Condition value: A field in which to enter a number.
  • Condition units: A drop-down from which you can select the units in which the condition value is expressed.

The baseline condition is met when the current metric value exceeds the baseline metric value to the extent specified with the following controls:

  • Comparison value: A field in which to enter a number.
  • Comparison type: A drop-down from which you can choose the type of comparison:
    - % of baseline: The condition is met when the current metric value as a percent of the baseline metric value exceeds the specified comparison value.
    - Units over baseline: The condition is met when the current metric value exceeds the baseline metric value by at least by the number of units (e.g. bit/s or packets/s) specified with the comparison value.

Notes:
- The default metric is packets/s. If you choose a different metric on the Dataset tab (see Data Funneling), the UI of the metrics condition tile will reflect your selected metric.
- Baseline conditions are available only for a policy's primary metric. If a baseline condition is set for a given threshold then a top keys condition cannot be set for that threshold.

Top Keys Condition

If the policy includes a top keys condition then it will look for a change in the composition of the top-X keys (a key entering or leaving the top keys). The following controls determine how the top keys are evaluated:

  • Comparison Direction: Determines which set of top-X keys (current or historical) is the primary set and which is used for comparison (see Comparison Direction).
  • Top Keys: The number of keys to evaluate.
    Note: If the field is left blank, the number of keys to track is set to the value of the Maximum Number of Keys field in Building Your Dataset (default = 25).

Note: If a top keys condition is set for a given threshold then a baseline condition cannot be set for that threshold.

Interface Capacity Condition

The Interface Capacity tile of the Edit Threshold Conditions dialog enables you to alert based on a comparison of the current utilization of an interface whose traffic is included in the policy's dataset with the capacity of that interface. The tile's Add Condition button is active when the following is true:

  • The policy's dimensions include Source Interface and/or Destination Interface.
  • The policy's metrics include bits/s (see Data Funneling).
  • No baseline condition is set in the Metrics Condition tile.

The condition is defined with the two-part Utilization control:

  • Comparison value: A field in which to enter a number.
  • Comparison method: A drop-down from which you can choose the method used to compare utilization with capacity:
    - Mbits/s: The condition is met when the current utilization of the interface (traffic volume), expressed in bits/s, exceeds the interface's capacity by at least the comparison value.
    - %: The condition is met when the current utilization of the interface, expressed as a percent of the interface's capacity, exceeds the specified comparison value.

Ratio-Based Condition

The Ratio-Based Condition tile of the Edit Threshold Conditions dialog enables you to set a policy condition based on a part-to-part ratio between any two metrics specified on the Dataset tab (see Data Funneling).

The following controls, which designate one metric as "Left" and the other as "Right," are used to define the condition:

  • Bidirectional: A switch that controls how the two parts of the ratio will be evaluated:
    - When the switch is off, the condition will be met if the ratio of the left metric to the right metric exceeds the ratio defined by their respective metric values. For example if the left value is 10 and the right value is 1 then the condition is met when the ratio of the left metric to the right metric exceeds 10:1.
    - When the switch is on, the condition will be met when the ratio of either metric to the other metric exceeds the ratio defined by their respective metric values, e.g. if the ratio of left to right exceeds 10:1 or the ratio of right to left exceeds 10:1.
  • Metrics: Two control sets, one for the left metric and the other for the right. Each set has two parts:
    - Metric value: A field in which you enter the numeric value of one metric's part of the ratio.
    - Metric: A drop-down from which you choose the metric for one part of the ratio.
  • Swap Metrics (left/right arrow icon): A button that moves the settings of the left metric controls to the right metric controls, and vice versa.
  • Margin (shown only when Bidirectional is on): Active only when the metrics value fields are both set to 1, this field adds a fractional value to both parts of the ratio. For example if the margin is set to 0.2 then the left:right ratio and the right:left ratio each become 1.2:1.
  • Show Chart: See Show Chart in Conditions Tile Common UI.

Note: The tile's Add Condition button is active only when the policy's dataset includes at least two metrics.

 
top  |  section

Threshold Configuration

The settings displayed in this pane of the Thresholds tab vary depending on the settings of the Edit Threshold Conditions dialog:

  • If no baseline exists (shown only when a baseline condition is set for the primary metric): A drop-down that sets what to do if a key in the primary top-X is not present in the comparison top-X:
    - Use Lowest Value from Top Keys (default): Compare the value of the key in the primary top-X set to the value of the last (lowest) key in the comparison set.
    - Use a Default Value: Compare this key’s current value to the static value in the comparison value field.
    - Do Not Alert: Classify as not a match on this key.
    - Activate an Alert: Classify as a match on this key.
    - Use Highest Value from Top Keys: Compare the value of the key in the primary top-X set to the value of the highest key in the comparison set.
  • Default Value (shown only when a baseline condition is set for the primary metric): The value, displayed below the If no baseline exists drop-down, used for comparison when If no baseline exists is set to “Use a Default Value”, or if we can’t determine the Lowest or Highest Value from Top Keys (maybe there are no baseline values yet) — then the Default Value is used in those cases.
  • Comparison Direction (shown only when a baseline condition or top keys condition is set for the primary metric): Determines which set of top-X keys (current or historical) is the primary set and which is used for comparison (see Comparison Direction).
  • Top Keys: If specified, the following fields override (for this threshold only) the dataset's Maximum Keys Per Evaluation setting (see Building Your Dataset), which is stated in parentheses to the right of each field:
    - Top current keys: The number of keys to evaluate from current traffic.
    - Top baseline keys: The number of keys to evaluate from baseline traffic.

Comparison Direction

The Thresholds tab includes a Comparison Direction setting in two places:

The following table explains the effect of this setting:

Direction Primary set Comparison set Description
Current to Historical (default) Current keys Baseline keys For each key in the current top-X, compare the current value to that key's baseline value.
Historical to Current Baseline keys Current keys For each key in the history top-X, compare the baseline value to the current value. This direction enables the system to identify keys that were but no longer are in the current top-X, e.g. a key that normally has high traffic volume that currently has no traffic.

 
top  |  section

Threshold Frequency

Kentik's alerting system defines a "match" as an individual instance of the network traffic evaluated by this alert policy matching the conditions defined for this threshold. The Threshold Frequency settings specify how many matches must occur within a given duration of time in order to trigger an alert:

  • Times: The count of matches that must occur within the specified duration.
    Note: If the number is set to 1, the time settings are irrelevant; an alert will be generated immediately upon the first match.
  • Duration value: The number of time units in the duration.
  • Duration units: The time unit of the duration, either minutes or hours.
  • Reset period: The number of match-free minutes after which the count of matches is reset to 0.
 
top  |  section

Threshold Actions

The settings in this section determine how the alerting system responds to an alert generated by this threshold. These settings include:

  • Acknowledgement Required: If this switch is enabled, an alert that is no longer in an alarm state will not be fully cleared until it is acknowledged manually from the Alerts List on the Alerting page (see Alerts List).
  • Notifications: A drop-down from which you can choose notification channels that will be notified in the event of an alert for this threshold (see Threshold Notifications).
  • Mitigations: A drop-down from which you can choose mitigations that will be triggered in the event of an alert for this threshold (see Threshold Mitigations).
 
top  |  section

Threshold Notifications

When an alert is triggered or otherwise changes state, notifications may be sent in various forms to designated parties at various destinations. The Notifications field is used to specify the notification(s) sent when an alarm is triggered.

To select a notification for the policy, click in the Notifications field. A drop-down will display a list of your organization's existing notification channels. You can select one or more notification channel. A lozenge for each will be displayed in the field.

Note: For more information about notification channels, see Notifications.

 
top  |  section

Threshold Mitigations

Note: The Mitigations section is only visible when the policy’s dimensions (see Data Funneling) include source or destination IP/CIDR.

The Mitigations section enables you to set one or more mitigations (see About Mitigation) that will be triggered in response to an alert on this threshold.

Add Mitigation to Threshold

In its initial state, the Mitigations section includes the following controls:

  • Mitigation selector: A drop-down list from which you can select a mitigation.
  • Add Mitigation: A button that assigns the currently selected mitigation to this threshold.

Once a mitigation is assigned to the threshold, a configuration tile for the mitigation is displayed above the initial controls (which may still be used to add another mitigation). The tile includes the controls covered in Mitigation Settings.

Note: The mitigations assigned to an policy (automated mitigations) will escalate and de-escalate automatically as changing conditions match different thresholds in that policy.

Mitigation Settings

The mitigations tile for an individual mitigation that has been assigned to the threshold includes the following controls:

  • Remove: A red X at upper right. Click to remove the mitigation from the threshold.
  • Apply Mitigation: A drop-down that specifies when the mitigation will be applied:
    - Immediately, when alert starts: Initiate the mitigation immediately when the threshold activates an alert.
    - After user confirmation: Initiate mitigation action only after a user clicks the Approve and start the mitigation option in the actions at the right side of a given alarm's row in the Alerts List on the Alerting page.
    - After user confirmation or timeout expires: Wait for a user to acknowledge or cancel the mitigation from the Alerting page. If the specified time period expires with no user action, then initiate the mitigation automatically.
    - Application Timer (shown only when Apply Mitigation is set to “After user confirmation or timeout expires”): The duration (in minutes) of the timer.
  • Clear Mitigation: A drop-down that specifies when the mitigation will stop:
    - Immediately, when alert ends: Stop mitigation immediately when the alert exits alarm state.
    - After user confirmation: Continue mitigation (even after the alert ends) until it is canceled by a user.
    - After user confirmation or timeout expires: Continue mitigation (even after the alert ends) until it is manually cancelled by a user or the specified time expires.
    - Application Timer (shown only when Clear Mitigation is set to “After user confirmation or timeout expires”): The duration (in minutes) of the timer.

Multiple Mitigations

If desired, you can add one or more additional mitigations to the same threshold. Multiple mitigations are assigned as follows:

  1. Use the controls covered in Add Mitigation to Threshold to add another mitigation.
  2. Use the controls covered in Mitigation Settings to specify when the new mitigation will be applied and cleared.

The ability to apply multiple mitigations to a threshold enables you to simultaneously trigger all of the mitigation methods/platforms (e.g. appliances at multiple sites) with which you’d like to respond to a given set of conditions, and to do so in a way that is much more scalable than by cloning a given policy for each of your appliances.

Support for multiple mitigations per threshold also enables the response for a given alarm to include a mix of mitigation types, e.g. RTBH, A10, and Radware. The following scenario, for example, outlines a multi-location DDoS response involving multiple mitigation types:

  1. De-preference or stop-announcing a BGP route on Location #1 by injecting a route whose community has been predefined as a flag for these actions.
  2. Announce a broader routing table entry, less specific than /24 (thus forcing acceptance by Internet peers), for Location #2.
  3. Trigger a third-party mitigation method (e.g. A10 or Radware) on Location #2 to announce more specific prefixes for internal re-direction to a scrubbing center.

Flowspec Mitigation Details

Mitigations whose platform is Flowspec will include a Traffic Matching card at the right of the standard Mitigation Settings tile. The dimensions listed in the card have either been populated with specific values or are inferred from values elsewhere in the policy. To see which Flowspec components (see Flowspec Component Types) are involved for this mitigation, click the View Method Details button, which opens the Mitigation Method Details dialog.

 

Policy Baseline Settings

The settings of the Historical Baseline tab are covered in the following topics:

 
top  |  section

About Historical Baselines

Baselining enables the alerting system to trigger an alert based on a comparison of current traffic, more specifically the current value of a key (see About Keys), against a baseline value. Derived from historical traffic patterns, the baseline represents what's "normal" for that key. If the current value varies from the norm in a way that matches a condition specified in one of an alert policy's defined thresholds, then that threshold will trigger an alert.

The baselining process begins with a historical data set that has the following characteristics:

  • It covers traffic that is "funneled" the same way as the current traffic to which it will be compared, which is defined on the Dataset tab (see Policy Dataset Settings).
  • It covers a specified time range covering multiple hours or days (the "baseline window").
  • It's drawn from time series data points that each represent the traffic volume of a top-X key for one time-slice, which is a duration (1, 2, or 5 min) determined by the Dataset tab's Evaluation Frequency setting.

We arrive at a baseline value for each key in two main stages:

  • Building the baseline: We smooth out spikes or drops in normal traffic by normalizing/averaging the data points for each top-X key over the baseline window (based on the options in Building the Baseline). The result is a series of "buckets" that each cover a duration of one hour and include one value for each of the top-X keys.
  • Using the baseline: We choose certain buckets (based on the options in Using the Baseline) and normalize/average the key values in those buckets into a final historical value for each key. This value is what's used for comparisons in the conditions of each threshold.

When a comparison of the current value with the final baseline value results in a match with the baseline condition defined for a given threshold then that threshold may — depending on its Threshold Frequency settings — trigger an alert.

Note: The settings for historical baselines in a given policy apply to all of the thresholds in that policy that include one or more baseline conditions.

For each one-hour bucket, the value of a given key (red bar) is derived from the time-slice values for that key.
 
top  |  section

Baseline Presets

Kentik offers three presets that serve as a starting point for configuring baselining for a given policy. Once you choose a preset you can tailor settings to your specific needs with the controls that are revealed by clicking Advanced Options (see Building the Baseline and Using the Baseline). When you change any of the settings the baseline will automatically be reclassified as Custom.

The following presets are available:

  • Default: Produces a general-purpose baseline for most alerting applications.
  • Precision: Produces the most accurate baseline for highly detailed applications.
  • Express: Rapidly produces a baseline that's less accurate but nonetheless useful for general applications.

All of the above presets share the following approach:

  • In Building the Baseline:
    - Rollup aggregation: Set to 98th percentile.
  • In Using the Baseline:
    - Bucket width: Each key's value for a given hour is based only the key's value in that hour's bucket.
    - Final aggregation: Set to 95th percentile.

The table below shows how the presets differ.

  Default Precision Express
Bucket depth: Number of top keys every hour 100 300 25
Baseline window start 1 day ago 1 day ago 1 hour ago
Baseline window lookback 3 weeks 4 weeks 1 week
Completeness: Days of data accumulated before use 4 21 2
Compare to: comparison data is derived from… Current hour of the day, current day of the week Current hour of the day, current day of the week Every hour
Separate patterns for weekdays and weekends No Yes No

 
top  |  section

Building the Baseline

The controls of the Building the Baseline pane (accessed via Advanced Options on the Baseline tab) enable you to specify the historical data that will be included in an initial (“rollup”) aggregation pass. The controls are structured as three settings that together answer the question "How should we build the buckets that will be available for your baseline?":

  • Window: Go back [duration] from [end point]. These settings determine the time range of the baseline window (e.g. "go back one week from one day ago"):
    - Duration: The length of the time range. Options include 1 hour, 1 day, or 1, 2, 3 (default), or 4 weeks.
    - End point: A drop-down that sets the end point of the time range for evaluation of historical data. The window will go back in time from this end point for the specified duration. Traffic data newer than the end point is excluded to prevent current spikes or anomalies from skewing baseline values. Options include 1 hour, 1 day (default), and 1 week ago.
  • Bucket depth: Use the top [##] of keys in every hour of the window. By default, this number is set to match the Maximum Keys per Evaluation setting on the Dataset tab (see Building Your Dataset).
  • Rollup aggregation: For each one-hour bucket, derive each key's value based on the [minimum/maximumn/percentile] of the time series datapoints for that hour. Options include Minimum, Maximum, or Percentile: 98th (default), 95th, 50th, 25th, or 5th.

Note: The baseline window described above is "rolling," meaning that older traffic data continually ages out and newer traffic data is continually added.

 
top  |  section

Using the Baseline

The controls of the Using the Baseline pane (accessed via Advanced Options on the Baseline tab) enable you to define how the buckets defined in Building the Baseline are used to derive final comparison values. The controls are structured as four settings that together answer the question "Which buckets should we take key values from, and how should we derive a comparison value for each key?":

  • Completeness: Don't use this baseline until it has at least [##] [time unit] of data. This minimum duration helps ensure that the baseline reflects sufficient information to establish what the "normal" value is for each key. Time unit options are hours or days.
    - Number field: Enter a number representing time units.
    - Time unit: Click the drop-down to select days (default) or hours.
  • Compare to: For each key, take baseline values from the buckets for [every/the current] hour of [every/the current] day of the week. Use these drop-downs to determine which buckets are included in the final aggregation:
    - All buckets in the window: every hour of every day.
    - Buckets from the current hour of every day.
    - Buckets from the current hour of the current day.
    Note: The availability of these options depends on the Window settings in Building the Baseline.
  • Bucket width: For each key, derive the value representing each "compare to" hour from [just that hour/surrounding hours].
    - If bucket width is "just that hour" then the value of each key for that hour is taken only from the bucket for that hour.
    - If bucket width is "surrounding hours" then an additional line appears, enabling the key values for a given hour to be derived from a number of hours on either side (see Baseline Bucket Width).
  • Final aggregation: Derive the final comparison value for each key from the [minimum/maximumn/percentile] of the key's values for the "compare to" hours. This drop-down specifies how the values for a given key in the chosen buckets will become a single comparison value for that key. Options include maximum, minimum, and percentile: 99th, 98th, 95th (default), 90th, 80th, 50th, 25th, 10th, or 5th.
    Note: Policies watching for activity in excess of baseline typically use Maximum, 98th, or 95th percentile aggregation. Policies watching for activity below baseline typically use 25th percentile aggregation.

Baseline Bucket Width

The buckets resulting from Building the Baseline each cover one hour in the overall baseline window, with one value per key per bucket. The Bucket width setting in Using the Baseline enables you to replace those single-hour values for each key with values that are derived from up to four hours on either side of the bucket's nominal hour. This additional aggregation step is specified in the following line, which appears when Bucket width is specified as "surrounding hours": Aggregate [#] hours before and after using [minimum/maximumn/percentile].

The number of hours that can be aggregated into a single bucket values ranges from 1 to 4. The method of deriving the bucket value from multiple hours may be minimum, maximum, or a percentile: 5th, 10th, 25th, 50th, 75th, or 90th.

 
top  |  section

Weekend Baselining

The final section of the Advanced Options section on the Baseline tab is the Use separate patterns for weekdays and weekends checkbox. When this box is checked the baselines for weekends (UTC Saturday and Sunday) will be calculated independently from the baselines for weekdays (UTC Monday through Friday), thereby taking into account variations in traffic patterns caused by day of week.

 

Manage Alert Policies

The following procedures cover some basic operations related to managing alert policies in the portal:

Note: See also Add a Query-based Policy, Clone a Policy, and Use a Policy Template.

 
top  |  section

Add a Policy

As discussed in Policy Creation Methods, Kentik provides several different approaches to creating an alert policy. The following is a basic outline for adding a policy using the Add Policy page:

  1. Choose Alerting from the featured column at the left of the popup menu on the main navbar.
  2. Click the Configure Alert Policies button at the upper right of the main display area.
  3. On the resulting Policies page, click the Add Policy button at the upper right of the main display area.
  4. On the General tab, specify the settings covered in General Policy Settings.
  5. On the Dataset tab, specify the settings covered in Policy Dataset Settings.
  6. On the Thresholds tab, specify the settings covered in Policy Threshold Settings.
  7. On the Baseline tab, specify the settings covered in Policy Baseline Settings.
  8. Click the Save button at the upper right of the main display area to save the policy and return to the Policies page, where the new policy will now be included in the Policies List.
 
top  |  section

Edit a Policy

To edit an existing policy:

  1. Choose Alerting from the featured column at the left of the popup menu on the main navbar.
  2. Click the Configure Alert Policies button at the upper right of the main display area.
  3. In the Policies List on the Policies page, find the row of the policy whose settings you'd like to change.
  4. Click the Actions icon at the right of the row and choose Edit Policy from the popup.
  5. On the resulting Edit Policy page, click on the tab containing the setting that you'd like to change (see Policy Settings Tabs), then find the setting and change it.
  6. Click the Save button at the upper right of the main display area to save the policy and return to the Policies page.
 
top  |  section

Disable or Enable a Policy

To disable a policy or enable a policy that was previously disabled:

  1. Choose Alerting from the featured column at the left of the popup menu on the main navbar.
  2. Click the Configure Alert Policies button at the upper right of the main display area.
  3. In the Policies List on the Policies page, find the row of the policy you'd like to disable or enable.
  4. Click the Actions icon at the right of the row:
    - To disable an enabled policy, choose Disable from the popup. The policy will no longer monitor its dataset, generate alerts, or trigger mitigations.
    - To enable a disabled policy, choose Enable from the popup. The policy will resume normal policy-related functions.
 
top  |  section

Remove a Policy

To remove a policy from your organization's collection of policies:

  1. Choose Alerting from the featured column at the left of the popup menu on the main navbar.
  2. Click the Configure Alert Policies button at the upper right of the main display area.
  3. In the Policies List on the Policies page, find the row of the policy you'd like to disable or remove.
  4. Click the Actions icon at the right of the row and choose Remove from the popup.
  5. In the resulting confirmation dialog, click Remove. The policy will be permanently removed from your organization's collection of policies.
© 2014- Kentik
In this article:
×