Alert Policies
Note: The Kentik support team encourages you to contact us at support@kentik.com for assistance with alert policy configuration. |
The policy-based Alerting system in Kentik Detect is covered in the following topics:
- Alert Policies Page
- Alert Policy Dialogs
- General Policy Settings
- Policy Dataset Settings
- Policy Threshold Settings
- Policy Baseline Settings
- Alert Library
Notes:
- For general information about policy-based alerting, see Policy Alert Overview.
- For information on active or historical alerts, see Alert Dashboards.
- For information on alert-related notifications, see Alert Notifications.
- For information on mitigation for alerts, see Mitigation.
Alert Policies Page
The Policies page displays the Alert Policy List, which is a list of the alert policies (see About Alerts and Policies) that are currently available to your organization. Policies can be added, duplicated, and edited from this list.
Note: For information on configuring an alert policy, see Alert Policy Dialogs.
Alert Policies Page UI
The work area of the Policies page is made up of the following UI elements:
- Filter: A field at upper right that is used to filter the Policy list. The following columns are searched for a match on the string entered in this field: ID, Name, Metrics, Dimensions.
- Add Policy: A button at upper right that opens the Add Alert Policy dialog (see Alert Policy Dialogs), where you can configure and save a new alert policy.
- Policy List: See Alert Policy List.
Alert Policy List
The Policy List is a table that lists all of the alert policies that are currently available to be used by your organization. Policies added to the list may be created in one of the following ways:
- Created from scratch via the Add Policy button.
- Duplicated from an existing policy using the Copy button at the right of each row in the Policy List.
- Duplicated from an alert policy preset on the Library page; see Alert Library.
The Policy List provides the following information and actions for each alert policy:
- Status: A switch enabling you to enable or disable the policy.
Note: If an alert has generated one or more alarms that are listed on the Active page, switching the alert to Disabled will remove those alarms from the Active list. - ID: The system-generated unique ID assigned when the alert policy was created.
- Name: The user-specified name of the alert policy. The policy’s user-specified description, if any, is presented below the name.
- Devices: Either “All Devices” or the number of devices covered by the query for the alert policy. Devices are selected on the Dataset tab of the Alert Policy Dialogs (add or edit).
- Metrics: The units (e.g. bits/s, packets/s, flows/s, etc.) by which this alert measures incoming flow data (see Data Funneling). The primary metric is listed first, followed by secondary metrics (if any).
- Dimensions: The dimensions defined in the alert policy, which combine to make a key definition that will determine how traffic is subdivided for evaluation (see About Keys). Dimensions, which are based on fields in the KDE main table, are described in Dimensions Reference.
- Actions: The following actions can be performed on an alert policy:
- Copy: Duplicates the alert policy so that it can be modified without altering the original.
- Remove: Opens a confirming dialog that allows you to delete the alert policy.
Note: Clicking anywhere in a policy’s row (other than on the action buttons) opens that policy’s settings dialog.
Policy Error State
Alert policies may occasionally end up in an error state due to a user misconfiguration, a bug, or a known issue in the back end. As shown below, a policy that is in error is indicated by the Status switch, which will be orange and say ERROR. If a policy is in error state, click the Status switch once to disable the policy, then again to re-enable the policy. If the policy is still indicated as being in error, contact Customer Support.
Note: Policies that are in error state are not currently indicated as such in the Active Alerts List.
Alert Policy Dialogs
The dialogs used to specify alert policy settings are covered in the following topics:
Accessing Policy Settings
Alert policies are defined in the alert policy dialogs (add or edit):
- Add Alert Policy: To add a new alert policy, open the dialog with the Add Policy button at upper right.
- Edit Alert Policy: To edit an existing alert policy, open the dialog by clicking anywhere in that policy’s row in the Policy list.
Policy Dialog UI
The Add Alert Policy and Edit Alert Policy dialogs share the same layout and the following common UI elements:
- Close button: Click the X in the upper right corner to close the dialog. All elements will be restored to their values at the time the dialog was opened.
- Tab selector: Choose the tab to display (see Policy Dialog Tabs).
- Remove button (Edit Alert Policy dialog only): Remove the policy from your organization’s collection of policies.
- Cancel button: Cancel the add policy or edit policy operation and exit the dialog. All elements will be restored to their values at the time the dialog was opened.
- Add Alert Policy button (Add Alert Policy dialog only): Save settings for the new policy and exit the dialog.
- Save button (Edit Alert Policy dialog only): Save changes to policy settings and exit the dialog.
- Policy dialog tabs: The tabs where the policy settings are made; see Policy Dialog Tabs.
- Show Summary button: Opens a popup containing a summary view of the current settings on all of the tabs. To close the summary, click anywhere outside of the popup.
- Show Traffic button: Opens the Top Traffic to Current Keys Dialog, which shows a graph of the traffic that, based on current settings in the alert policy dialog, will be evaluated for the alert.
- Use the Change View Type drop-down at upper right to change the type of graph.
- Click the View in Explorer button to open Data Explorer with a view of the same traffic (the settings in alert policy dialog will become the settings in the panes of the sidebar.
Note: The Save button (when editing a policy) or Add Alert Policy button (when creating a policy) will be grayed out until the Tab Progress Indicators for all tabs of the dialog are checkmarks.
Top Traffic to Current Keys Dialog
The Top Traffic to Current Keys dialog, opened via the Show Traffic button, displays a chart of the top-X traffic over the last 72 hours that met the conditions defined by the settings in the alert policy dialog.
The dialog includes the following UI elements:
- Close buttons: To close the dialog, click the X in the upper right corner or the Close button at lower right.
- View Type: A drop-down menu used to set the type of visualization used for the graph (defaults to Line Chart); for descriptions of the options see Chart View Types.
- Chart: The visualization of traffic (using the current view type).
- View in Explorer button: Opens Data Explorer for further exploration of the device’s traffic. The sidebar will be set so that query results will show the same traffic that is shown in the dialog.
Policy Dialog Tabs
Once the Add Alert Policy or Edit Alert Policy dialog is open, policies are configured on the following tabs:
- General Settings: Used to define the overall properties of the alert policy; see General Policy Settings.
- Dataset: Used to narrow the subset of traffic that is evaluated for all thresholds in this alert; see Policy Dataset Settings.
- Alert Thresholds: Used to specify up to five collections of conditions that would trigger the alert to enter alarm state (see Alert States); see Policy Threshold Settings.
- Historical Baseline: Used to configure the baselines against which current traffic is compared to determine if there is a deviation from the norm; see Policy Baseline Settings.
Tab Progress Indicators
The progress of each tab toward completion (all required settings specified) is tracked in the tab selector at the top of the dialog and also in the Policy Summary. Progress is indicated as follows:
- Stop icon (red): A setting on the tab has been specified with an invalid value (e.g. a negative number).
- Caution icon (orange): One or more required settings on the tab are not yet specified.
- Checkmark (green): All required settings on the tab are completed and valid.
General Policy Settings
The General Settings tab of the alert policy dialogs is used to define the overall properties of the alert policy.
This tab includes the following configuration options:
- Name: The name of the alert policy. Maximum of 50 characters; must include at least one letter.
- Description (optional): A description of the policy; used to summarize what the policy looks at and indicate what it is used for.
- Policy Status: A switch enabling you to enable or disable the policy.
Note: If an alert has generated one or more alarms that are listed on the Active page, switching the alert to Disabled will remove those alarms from the Active list. - Silent Mode: Prevents the alert from entering alarm state (and triggering notifications and/or mitigations) until the specified date. Use for new policies to allow time to establish baselines. Silent mode is enabled by default.
Note: This switch enables silent mode for this specific policy. To enable silent mode on a pattern basis instead, see Alert Silent Mode. - Silent Mode End Date: If silent mode is enabled, specifies the date on which the alert will exit learning mode. The default learning period is six days.
- Policy Dashboard: A dashboard that will be the destination of the Open In Dashboard button for any alarm or match from this alert that is listed in the Alert or History list.
Policy Dataset Settings
The Dataset tab is used to narrow the subset of traffic that is evaluated for the thresholds in this alert. The tab is divided into two panes whose settings are described below:
Data Funneling
Specify the criteria used to include traffic in the data being evaluated:
- Devices: Controls used to choose the devices the traffic is going through, to, or from:
- Devices list: A list of the devices currently assigned to this dataset. To remove a device from the list, click the X at the right of the device’s row.
- Edit Devices: Click to open the Selected Devices Dialog, so you can choose the specific devices to include. - Dimensions: A list of the dimensions (see About Dimensions) that combine to define a key. Click in the list to open a dimension dialog (see Query Dimension Dialogs) to edit the dimensions of the key. The key definition determines how traffic is subdivided for evaluation (see About Keys).
Example: If the primary metric is packets/second, and the group-by dimensions are set to Source:AS Number and Destination:AS Number, then the top-X evaluation will involve looking at all unique combinations of source ASN and destination ASN to determine which combinations have the highest traffic volume as measured in pps. - Primary Metric: A drop-down menu to choose the units (e.g. bits/s, packets/s, flows/s, etc.) by which ingested flow data will be evaluated to determine top-X. For a list of available metrics, see General Metrics and Host Traffic Metrics.
Example: If the primary metric is bits/second then the top-X evaluation will rank keys by bps. - Secondary Metrics: A drop-down menu used to enable you to specify multiple additional static comparators (see Policy Threshold Settings) that are based on a metric other than the primary metric. Each added comparator represents an additional condition that must be met in order to trigger an alarm. The Secondary Metrics selector supports simultaneous selection of multiple secondary metrics.
- Filters: Filters with which you wish to screen the traffic that is evaluated for the alert. Use the Edit Filters button to open the Filtering Options Dialog, from which you can apply ad hoc filters or saved filters.
Note: A mitigation can be assigned to a policy (see Threshold Mitigations) only if the list in the policy’s Dimension setting includes source or destination IP/CIDR.
Building Your Dataset
Specify how the traffic is evaluated:
- Evaluation Frequency: The interval at which newly ingested traffic data is grouped by dimensions and the traffic data represented by the resulting keys is evaluated. Options include 1 min, 2 min (default), and 5 min.
- Maximum Number of Keys: The number of keys (unique combinations of dimension values; see About Keys) to evaluate for a match with the conditions specified in the alert’s thresholds. Maximum valid value is 300.
- Dimension Grouping: A switch to enable grouping by some dimensions before the final top-X evaluation; see About Dimension Grouping.
- Dimensions Responsible for Grouping (shown only if Dimension Grouping is on): The number of dimensions from the start of the Dimensions field that will be included in the key definition used for grouping.
- Maximum Number of Keys in Each Group (shown only if Dimension Grouping is on): The number of keys from each group that will be included in the overall top-X for the alert.
- Minimum Traffic Threshold: The minimum value, as measured by the primary metric, that a key must have to be included in the top-X ranking and evaluated for a threshold match.
- Auto: If checked (default), the “only look” value is auto-calculated using a formula based on the settings of the comparators in the policy’s thresholds (see Threshold Conditions).
- Specified value: If Auto is not checked, specify the minimum value.
About Dimension Grouping
Dimension grouping introduces an additional layer of control into how the alerting system tracks the top-X keys for current traffic. Dimension grouping can help keep keys from a high-volume area of the infrastructure from dominating the top-X keys, so that the alerting system can pick up on significant changes in other areas as well.
With dimension grouping off, the key definition (set of dimensions) specified in the Dimensions field (see Data Funneling) is used as a single unit, resulting in keys that are each a unique combination of dimension values. These keys are then ranked by traffic volume to arrive at Top-X.
With dimension grouping on, traffic is instead evaluated in stages as follows:
- The dimensions specified in Data Funneling are split into two sets, one of which can be thought of as the “grouping set.”
- The first dimension in the grouping set is the first dimension in the Dimensions field.
- The last dimension in the grouping set is the dimension whose position in the Dimensions field corresponds to the number specified with the Dimensions Responsible for Grouping control.
- The remaining dimensions are in the non-grouping set.
- Example: Dimensions are Source ASN, Full Device, Destination Country, and Destination ASN. If Dimensions Responsible for Grouping is set to 2, the grouping set is Source ASN and Full Device. - Traffic is initially evaluated as if the key definition is only the grouping set. The resulting groups will each represent the traffic having a unique combination of the dimensions in the grouping set (e.g. Source ASN and Full Device).
- The traffic in each of these groups is then evaluated as if the key definition is only the non-grouping set. The resulting keys within each group will each represent the traffic having a unique combination of the dimensions in the non-grouping set (e.g. Destination Country and Destination ASN).
- The top N keys will be taken from each group and merged into a single pool, with N determined by the Maximum Number of Keys in Each Group control.
- The keys in this pool are ranked by volume, resulting in the overall top-X keys for the alert, with X being defined by the Maximum Number of Keys Analyzed control.
Policy Threshold Settings
The settings of the Alert Thresholds tab are covered in the following topics:
About Alert Thresholds
A threshold is a collection of settings that define a set of conditions that must be matched in order for an alert to be activated, at which point the alert generates an alarm for each key for which conditions have been matched. Each alert policy includes at least one threshold by default, but may include up to five.
General Threshold Settings
The Alert Thresholds tab includes the following general settings:
- Level selector: Choose which threshold you are configuring: Critical, Major 2, Major, Minor 2, or Minor.
- Enabled: Determines whether or not the threshold is currently active (evaluating traffic data, generating alarms, etc.). A threshold that is not currently needed can be disabled and retained for use later if needed.
- Copy settings from (present only when two or more thresholds exist in the policy): A drop-down that enables you to choose another threshold in the same alert from which to import settings.
- Description: A place for the creator of the threshold to describe for others what traffic conditions the threshold is intended to catch.
Threshold Configuration
The settings in this section depend on the threshold type, which is chosen from the drop-down This Threshold Will field, which has the following options for the method used to evaluate traffic for the alert:
- Use static values only: The threshold will evaluate traffic against one or more static conditions specified in the Conditions pane below.
- Use static and baseline values: The threshold will evaluate traffic against at least one static condition as well as baseline conditions (also specified in the Conditions pane), which compare against norms established by baselining.
- Compare sets of keys: The threshold will compare the top-X keys of two sets of traffic, current and baseline, to see whether the keys in both sets are the same.
Threshold Configuration Settings
Depending on the chosen threshold type, the rest of the Threshold Configuration pane may contain some or all of the following settings:
- And Compares: Determines which set of top-X keys (current or historical) is the primary set and which is used for comparison (see Comparison Direction).
- N/A: Not used.
- Current to Historical (default): The set of current keys is primary; the set of history keys is for comparison.
- Historical to Current: The set of history keys is primary; the set of current keys is for comparison. - Using Top Keys: If the From Policy Settings switch is off then the number of keys to track are specified directly in the from Current Traffic and from Historical Baseline input fields. If the switch is on:
- The value of the from Current Traffic field is set to the value of the Maximum Number of Keys field in Building Your Dataset (default = 25).
- The value of from Historical Baseline is set to the number of baseline keys specified in Building Your Baseline (default = 25). - From Current Traffic: If the From Policy Settings switch is off, the number of keys that this threshold will track in current traffic.
- From Historical Baseline: If the From Policy Settings switch is off, the number of keys that this threshold will track in historical traffic.
- If No Baseline Exists: A drop-down that sets what to do if a key in the primary top-X is not present in the comparison top-X (see Comparison Direction).
- Do not alert: Classify as not a match on this key.
- Activate an alert: Classify as a match on this key.
- Use the lowest value (default): Compare the value of the key in the primary top-X set to the value of the last (lowest) key in the comparison set.
- Use a default value: Compare this key’s current value to the static value in the comparison value field.
Note: If And Compares is “Historical to Current” then the value of this setting will be set to “Activate an alert.” - Use a default value of: The value, displayed below the If No Baseline drop-down, used for comparison when If No Baseline is set to “Use a default value”:
- If the Auto Calculate switch is on, the comparison value will be auto-calculated.
- If the switch is off, enter a value in the field. - Auto Calculate Default Value:
- will be used when there is no history will be auto-calculated based on the settings of the threshold’s comparators. Shown only when:
- If No Baseline is set to “Use a default value.”
- If No Baseline is set to “Use the lowest value,” but there are no baseline entries (and therefore no historical lowest top-X).
Threshold Settings Availability
Unless the threshold uses static values only, the availability in the Threshold Configuration pane of the settings listed above depends on both the threshold type (This Threshold Will setting) and the Comparison Direction (And Compares setting), as shown in the following table:
Threshold type: Comparison direction: |
Compare sets of keys Either |
Static and baseline Current to Historical |
Static and baseline Historical to current |
Using Top Keys | Yes | Yes | Yes |
From Current Traffic | Yes | Yes | Yes |
From Historical Baseline | Yes | Yes | Yes |
If No Baseline Exists | No | Yes | Yes |
Use a default value of | No | Yes | No |
Auto Calculate Default Value | No | Yes | No |
Comparison Direction
The following table shows the comparison direction corresponding to the And Compares setting in Threshold Configuration:
And Compares | Primary set | Comparison set | Description |
Current to Historical | Current keys | Baseline keys | For each key in the current top-X, compare the current value to that key’s baseline value. |
Historical to Current | Baseline keys | Current keys | For each key in the history top-X, compare the baseline value to the current value. This direction enables the system to identify keys that were but no longer are in the current top-X, e.g. a key that normally has high traffic volume that currently has no traffic. |
Threshold Conditions
The settings in the Conditions pane determine what constitutes a match (see About Matches) for the threshold.
Condition Type
Depending on the This Threshold Will setting (see Threshold Configuration), three condition types (each of which may include multiple individual conditions) may be used in an alert threshold:
- Static condition: Compares current traffic with a fixed number, either specified by the user or auto-calculated. At least one static condition is required.
- Baseline condition: Compares current traffic with a historical baseline.
- Interface Capacity: Compares current traffic with the capacity of the interfaces in the alert dataset (see Data Funneling). Appears only when the Dimensions list includes “Interface” (source or destination) and the Primary Metric is “Bits/second.”
Condition Settings
Each condition type that is available for the current threshold type (see Condition Availability) is represented in the Conditions pane with a tile containing settings for one or more conditions of that type. These tiles each contain the following UI elements:
- Condition statement: Indicates the following:
- Number of keys: How many top keys are evaluated for the condition.
- Key set: The set of keys being evaluated, either Current Traffic or Historical Baseline (see Comparison Direction). - Condition list: A set of specific operator | value | metric conditions (e.g. Greater Than | 500 | Mbps) that will be evaluated for a match. The conditions in the list are ANDed (all must be true to qualify as a match). Note: For the minimum number of each condition type for each threshold type, see Condition Availability.
- Add condition button: Click the blue + to add a condition to the list.
- Remove condition button: Click the red X to remove a condition from the list.
Condition Availability
The following table shows how the condition types available in the Conditions pane, as well as the minimum number of conditions of each type, vary depending on the threshold type as set with the And Compares setting (see Threshold Configuration).
Condition type | Static Only threshold |
Static and Baseline threshold |
Compare sets of keys threshold |
Static | Minimum = 1 | Minimum = 1 | Minimum = 1 |
Baseline | N.A. | Minimum = 0 | N.A. |
Interface capacity | Minimum = 0 | Minimum = 0 | Minimum = 0 |
Note: At least one static condition is require in order to screen out the noise of fluctuations in low-level traffic that don’t signify any meaningful change in traffic conditions.
Threshold Activation
The Notifications pane of the dialog is used to specify:
- The situations in which one or more match (see About Matches) will cause a threshold to trigger an alarm (enter ALARM state).
- The notifications sent when the alarm is triggered.
Activate When Settings
The following settings specify what triggers an alarm:
- Number: How many times a match must occur within the specified duration.
Note: If number is 1, the time settings are irrelevant; an alarm will be generated immediately upon the first match. - Duration value: The number of time units.
- Duration units: The time unit, either minutes or hours.
- Reset period: The number of match-free minutes after which the count of matches is reset to 0.
Acknowledgement Setting
The Notifications pane also includes the Acknowledgement Required switch. If on, an alert that is no longer in ALARM state will not be fully cleared until it is acknowledged manually in the Active page (see Active Alerts).
Threshold Notify Settings
When an alarm is triggered or an alert otherwise changes state, notifications may be sent in various forms to designated parties at various destinations. A collection of such destinations is represented as a “notification channel” (see About Notification Channels). Notification channels can be created directly in the Notifications pane or on the Channels page.
The notification settings in the Notifications pane include the following:
- Select Notification Channel: Click in the field to choose from a list of the existing notification channels in your organization.
- Add Notification Channel: A button that opens the Add Notification Channel dialog (see Add or Edit Channel) to create a new channel.
Threshold Mitigations
Note: The Mitigations pane is only visible if the policy’s dimension list (Dimensions setting in Data Funneling) includes source or destination IP/CIDR. |
The Mitigations pane enables you to set one or more mitigations (each a combination of platform and method; see About Mitigation) that can be applied in response to an alarm.
Add Mitigation
In its initial state the pane includes the following controls:
- Mitigation selector: A drop-down list of available mitigations.
- Add Mitigation button: Assigns the current mitigation in the mitigation selector as a mitigation for this threshold.
Note: The mitigations assigned to an alert policy (automated mitigations) will escalate and de-escalate automatically as changing conditions match different thresholds in that policy.
Mitigation Settings
Once a mitigation has been assigned to the threshold the mitigations pane displays a tile for the mitigation, which includes the following controls:
- Remove button: Click the red X to remove the mitigation from the threshold.
- Apply Mitigation: Specifies when the mitigation will be applied:
- Immediately: Initiate the mitigation immediately when the threshold activates an alarm (alert enters alarm state).
- User Acknowledgement: Initiate mitigation action only after a user clicks the Start Mitigation button in the actions at the right side of a given alarm’s row in the Active Alerts List on the Active page.
- User Acknowledgement Unless Timer Expired: Wait for a user to acknowledge or cancel mitigation from the Alarms dashboard. If the specified time period expires with no user action then initiate mitigation automatically.
- Application Timer (only if Apply Mitigation is set to “User Acknowledgement Unless Timer Expired”): The duration (in minutes) of the timer. - Clear Mitigation: Specifies when the mitigation will stop:
- Immediately: Stop mitigation immediately when the alarm ends (alert exits alarm state).
- User Acknowledgement: Continue mitigation (even after the alarm ends) until it is canceled by a user.
- User Acknowledgement Unless Timer Expired: Continue mitigation (even after the alarm ends) until it is manually cancelled by a user or the specified time expires.
- Clear Timer (only if Apply Mitigation is set to “User Acknowledgement Unless Timer Expired”): The duration (in minutes) of the timer.
Multiple Mitigations
If desired, you can add one or more additional mitigations to the same threshold. Multiple mitigations are assigned as follows:
- Use the Add Mitigation UI described above to choose and add another mitigation.
- Use the Mitigation Settings UI described above to specify when the new mitigation will be applied and cleared.
The ability to apply multiple mitigations to a threshold enables you to simultaneously trigger all of the mitigation methods/platforms (e.g. appliances at multiple sites) with which you’d like to respond to a given set of conditions, and to do so in a way that is much more scalable than by cloning a given policy for each of your appliances.
Support for multiple mitigations per threshold also enables the response for a given alarm to include a mix of mitigation types, e.g. RTBH, A10, and Radware. The following scenario, for example, outlines a multi-location DDoS response involving multiple mitigation types:
- De-preference or stop-announcing a BGP route on Location #1 by injecting a route whose community has been predefined as a flag for these actions.
- Announce a broader routing table entry, less specific than /24 (thus forcing acceptance by Internet peers), for Location #2.
- Trigger a 3rd-party mitigation method — e.g. A10 or Radware — on Location #2 to announce more specific prefixes for internal re-direction to a scrubbing center.
Policy Baseline Settings
The settings of the Historical Baseline tab are covered in the following topics:
About Historical Baselines
Baselining enables the alerting system to trigger an alarm based on a comparison of current traffic against historical traffic patterns. The historical data set is made up of data points representing traffic totals for the time-slices whose duration (1 min, 2 min, or 5 min) is defined with the Evaluation Frequency setting under Building Your Dataset. If differences between current and historical data meet the parameters you’ve defined in a threshold’s conditions, that threshold can trigger an alarm. In a given policy the settings for historical baselines apply to all of the thresholds in that policy that include one or more baseline conditions (see Condition Type).
Building Your Baseline
The controls of the Building Your Baseline pane specify the historical data that will be included in an initial (“rollup”) aggregation pass. The controls are structured as two sentences in which you fill in the blanks with the settings described further below:
- Record the [Rollup aggregation] of bits/second for the top [Number of keys] keys every hour within a rolling baseline window.
- The baseline window starts [Start back] and goes back [Look back] from now.
The following descriptions apply to the settings in the two sentences above:
- Rollup aggregation: A drop-down used to specify the method for a first stage of aggregation in which the time-slice data points of the baseline dataset are aggregated into one-hour baseline data points. Options include Min, Max, and Percentile: 98th (default), 95th, 50th, 25th, or 5th.
- Number of keys: A field in which to specify the number of keys in the slice-to-hourly aggregation. By default this number is set to match the corresponding number in Building Your Dataset.
- Start Back: Sets how recently history ends (one hour, one day, or one week ago). Traffic data newer than this value is not included in the data used for baselining. Excluding recent traffic allows you to keep current spikes or anomalies from skewing baseline values.
- Options include 1 hour, 1 day, or 1 week ago.
- Default is 1 day. - Look Back: Sets how far back in time the history starts. As traffic data becomes older than this value it is dropped from the data used for baselining.
- Options include 1 hour, 1 day, or 1, 2, 3, or 4 weeks ago.
- Default is 3 weeks.
Leveraging Your Baseline
Aggregation controls define how the data points of the traffic history data are prepared for comparison with current traffic. The controls are structured as four sentences in which you fill in the blanks with the settings described further below:
- Do not use my baseline until it has at least [Minimum duration] of data.
- Once ready, get comparison data from [Interval].
- Each value in the comparison data set is derived from [Value source].
- Then, use the [Final aggregation] of all comparison data to evaluate alert threshold conditions.
The following descriptions apply to the settings in the four sentences above:
- Minimum duration: The minimum duration for which there must be historical data in order for the baseline to qualify as valid.
- Number field: Enter a number representing time units.
- Time units: Choose hours or days (default). - Interval: The time interval at which data will be drawn from the history to create the baseline. If the nature of the underlying traffic is cyclical on a daily basis, the unit may be set to ensure that the baseline will be based on data from the same time of day each day. Otherwise the unit is typically set to every hour, meaning that the baseline will be based on data from 1, 2, 3, 4, etc. hours ago. Options include:
- Every hour
- The same hour every day (default)
- The same hour of the day and same day of week. - Value source: The starting data from which each baseline data point is derived. Options include:
- The baseline value at that time (default)
- Minimum or maximum
- Percentile: 90th, 75th, 50th, 25th, 10th, or 5th. - Final Aggregation: Specifies how the set of hourly data points resulting from settings in Building Your Baseline are aggregated into a single baseline value for each key (combination of dimensions) that can be compared to the current value for that key.
- Options include Max, Min, and Percentile: 99th, 98th, 95th, 90th, 80th, 50th, 25th, 10th, or 5th.
Note: Policies watching for activity in excess of baseline typically use Max, 98th, or 95th percentile aggregation. Policies watching for activity below baseline typically use 25th percentile aggregation.
- Default is 95th.
Weekend Baselining
Because traffic on weekends often varies considerably from traffic during the week, the Historical Baseline tab includes the Use Separate Patterns for Weekdays and Weekends setting. When the checkbox is checked the alerting system will determine weekend (UTC Saturday and Sunday) norms that are different from the norms for weekdays (Monday through Friday).
Alert Library
Kentik provides a set of alert policy templates that are covered in the following topics:
About Alert Library
The Library page lists a set of Kentik-provided alert templates that cover common network traffic anomalies. An alert preset can be used as the starting point for configuring an alert that notifies you about an anomaly and enables you to respond with mitigation (manual or automated).
To use an alert preset, duplicate it on the Library page, at which point it will be added to the Alert Policy List on the Policies page. From there the policy can be opened for editing (see Alert Policy Dialogs), where it can be tailored to work with your specific network.
Alert Library List
The Alert Library List is a table that lists all Kentik-provided alert policy presets. The table provides the following information and actions for each preset:
- ID: The system-generated unique ID assigned when the alert policy was created.
- Name: The name of the alert policy preset as specified by Kentik when it was created. The name is accompanied by a user-specified description.
- Devices: Either “All Devices” or the number of devices covered by the query for the alert policy. Devices are selected on the Dataset tab of the Alert Policy Dialogs (add or edit).
- Metrics: The units (e.g. bits/s, packets/s, flows/s, etc.) by which this alert measures incoming flow data (see Data Funneling). The primary metric is listed first, followed by secondary metrics (if any).
- Dimensions: The dimensions defined in the alert policy, which combine to make a key definition that will determine how traffic is subdivided for evaluation (see About Keys). Dimensions, which are based on fields in the KDE main table, are described in Dimensions Reference.
- Actions: The following actions can be performed on alert policy presets:
- Copy: Duplicates the alert policy preset and adds the copy to the Alert Policy List, where it can be edited and saved. The original preset remains unchanged.