Calculating Kentik Detect Requirements from Flow Volume
The procedure for calculating service requirements based on flow volume is covered in the following topics:
In networking terms, a “flow” is a unidirectional set of packets sharing common attributes (e.g. source and destination IP, source and destination ports, IP protocol, etc.). Using a “flow protocol” (e.g. sFlow, IPFIX, NetFlow v9 or v5), network devices (routers, switches, hosts) are equipped to gather information about these attributes for a user-configured sample of the traffic they handle and to send that information to the ingest layer of Kentik Detect.
The rate — flows per second, or FPS — at which this flow metadata is sent to Kentik from a given customer directly correlates with the resources required to provide that customer with the Kentik Detect service. Thus to accurately estimate and price a customer’s service requirements it is crucial to first estimate the FPS that customer will send to Kentik Detect. This document explains the relationship between FPS and pricing and shows how to estimate FPS accurately.
Every device that sends flow data to Kentik Detect is assigned to a “plan.” Plans, which are developed by Kentik sales engineers to closely match customer needs, help Kentik quantify the resources that must be available to effectively service the covered devices. These resources — which directly affect the cost of scaling the Kentik Data Engine (KDE; Kentik’s distributed HA back-end) — fall into the following main categories:
- Compute: The compute power needed by the platform to ingest flow data and process it in as close as possible to real time, which is mainly determined by:
- the peak number of concurrent flows hitting ingest at any given moment;
- the peak number of devices talking to the platform, not only for flow but also for BGP and SNMP.
- Storage: The storage needed for both flow and enrichment data, which is mainly determined by data retention (how long the customer wants Kentik to retain their unsummarized traffic data).
To help determine the required compute and storage resources, Kentik Detect plans are structured to take into account the following attributes:
- Device quantity: Each new device (hosts or router) that exports flow data to Kentik Detect decreases the number of available devices left in the plan.
- Device types: Plans may allow routers/switches and/or hosts, which are monitored by the nProbe agent. When a plan includes nProbe hosts it comes with licenses to use the nProbe software for the plan’s devices.
- BGP: Devices may publish their BGP routing table to KDE to enable peering for analytics.
- Data retention: Kentik creates two fully independent dataseries at ingest, one at full resolution and another optimized for faster execution of queries that cover long timespans. Retention (number of days) can be the same for both dataseries or different for each.
- Flows per second: The maximum FPS allowed per-device, also referred to as the flow rate. We refer to “flows” generically in this document, but since sFlow works differently than NetFlow/IPFIX, the term actually represents different things depending on the flow protocol:
- NetFlow/IPFIX: Metadata about a sampled subset of flows passing through a device (e.g. router) is aggregated into sets of flow records that are then exported from the device to Kentik Detect.
- sFlow: Packets from a sampled subset of flows passing though a device are truncated (leaving metadata without the content data) and sent to Kentik Detect as “datagrams.”
In the following sections we’ll look at how to arrive at a realistic estimate of overall flow rate so that your Kentik Detect plans can be tailored to your needs.
Before starting the actual calculation you’ll need to do the following:
- Make a list of the devices that will export flow to Kentik Detect.
- On each exporting device, enable Flow Data Export (see Router Flow Configs).
- For each exporting device, note the approximate Gb/s for inbound traffic.
- For each exporting device, note the flow sampling rate used:
- NetFlow/IPFIX: One out of every how many flows is exported as a flow record?
- sFlow: One out of every how many packets is truncated into a datagram?
Note: To optimize the flow sampling rates of your devices, refer to Flow Sampling.
Next we’ll see how to determine the flows per second (FPS) for a single router. By repeating this process for each customer router exporting flow, we’ll be able to accurately predict the Kentik Detect resources required to handle flow for that customer. Alternatively, to expediently calculate the worst-case resources needed for a system, we can extrapolate from a calculation based on the single device with the most connected interfaces.
Note: Device flow records associate an ingress to an egress port for each flow record. When measuring device traffic, make sure to count Gb/s only at either ingress or egress so that you are not double-counting the device’s flow.
Use the following steps to determine flows per second (FPS) for a router.
- Export the router’s instant flow count:
- Log onto the router with the CLI and use the command for your router (see table below) to export the instant flow count.
- Measure two consecutive flows for the router, making note of the precise times (T0 and T1) and flow count (F0 and F1) for each measurement.
- Make sure to measure flow at Peak time and Peak day of the week to ensure the values are consistent.
- Locate the instant flow count in the router’s flow report (see Sample Code for Common Devices).
||Command to export instant flow count
||show ip flow export
||show flow exporter fem1 location 0/0/CPU0
||show flow export
||EX, MX, T Series
||show services accounting flow
- Calculate the router’s flows per second (FPS) by inserting the times and flow counts from step 1 into the following formula: FPS = (F1 - F0) ÷ (T1 - T)
In the following example, the calculation is based on flow exports of 4,028 and 15,028 at 10 seconds apart:
||T0 / F0
||T1 / F1
- If it’s not possible to measure flow at Peak time and Peak day of the week, measure it another time and perform a linear interpolation based on the sum of inbound traffic for all of the devices connected interfaces with flow enabled.
||Non-Peak Time Measurement
||Peak Time (Extrapolated)
|Flows per second (FPS)
- To add a 6-month traffic forecast to accommodate for the mid-term future, use linear interpolation with the numbers from the previous step and the projected future traffic.
||Current Peak-Time FPS
||6-Month Forecast (Extrapolated)
|Flows per second (FPS)
The examples in the following sections illustrate how to export flow count from common devices (note highlighted code):
Router# show ip flow export
Flow export v5 is enabled for main cache
Exporting flows to 10.51.12.4 (9991) 10.1.97.50 (9111)
Exporting using source IP address 10.1.97.17
Version 5 flow records
11 flows exported in 8 udp datagrams
0 flows failed due to lack of export packet
0 export packets were sent up to process level
0 export packets were dropped due to no fib
0 export packets were dropped due to adjacency issues
0 export packets were dropped due to fragmentation failures
0 export packets were dropped due to encapsulation fixup failures
0 export packets were dropped enqueuing for the RP
0 export packets were dropped due to IPC rate limiting
0 export packets were dropped due to output drops
The flow export for IOS XR shows both the instant value of the flow (requiring computation between two timestamps) and the per second flow (thereby not requiring any computation, and negating the need for Step 2 in FPS Calculations).
RP/0/RP0/CPU0:router# show flow exporter fem1 location 0/0/CPU0
Flow Exporter: NFC
Used by flow monitors: fmm4
Destination 220.127.116.11 (50001)
Source 18.104.22.168 (5956)
Flows exported: 0 (0 bytes)
Flows dropped: 0 (0 bytes)
Templates exported: 1 (88 bytes)
Templates dropped: 0 (0 bytes)
Option data exported: 0 (0 bytes)
Option data dropped: 0 (0 bytes)
Option templates exported: 2 (56 bytes)
Option templates dropped: 0 (0 bytes)
Packets exported: 3 (144 bytes)
Packets dropped: 0 (0 bytes)
Total export over last interval of:
1 hour: 0 pkts
1 minute: 3 pkts
1 second: 0 pkts
N7K1# show flow export
Flow exporter KENTIK:
Description: ships flows to Kentik cloud
VRF: default (1)
Destination UDP Port 2055
Source Interface Vlan10 (10.10.10.5)
Export Version 9
Number of Flow Records Exported 726
Number of Templates Exported 1
Number of Export Packets Sent 37
Number of Export Bytes Sent 38712
Number of Destination Unreachable Events 0
Number of No Buffer Events 0
Number of Packets Dropped (No Route to Host) 0
Number of Packets Dropped (other) 0
Number of Packets Dropped (LC to RP Error) 0
Number of Packets Dropped (Output Drops) 0
Time statistics were last cleared: Tue Jul 8 21:12:06 2014
user@host> show services accounting flow
Service Accounting interface: sp-2/0/0, Local interface index: 215
Flow packets: 9867, Flow bytes: 631488
Flow packets 10-second rate: 0, Flow bytes 10-second rate: 628
Active flows: 2, Total flows: 10
Flows exported: 4028, Flows packets exported: 6150
Flows inactive timed out: 8, Flows active timed out: 4026
Service Accounting interface: sp-2/1/0, Local interface index: 223
Flow packets: 0, Flow bytes: 0
Flow packets 10-second rate: 0, Flow bytes 10-second rate: 0
Active flows: 0, Total flows: 0
Flows exported: 0, Flows packets exported: 1
Flows inactive timed out: 0, Flows active timed out: 0
Arista routers use sFlow, not Netflow, so we’re looking for samples/sec in the report, not flow exports/sec. A variation of #samples between T0 and T1 is therefore necessary using the “Number of Samples” value.
172.16.0.1:6343 ( VRF: default) <-- should be pointing at the external sFlow collector
172.16.0.41 ( VRF: default)
:: ( default) ( VRF: default)
Sample Rate: 16384
Polling Interval (sec): 30.0
Rewrite DSCP value: No
Polling On: Yes
Sampling On: Yes ( default)
Send Datagrams: Yes ( VRF: default)
Total Packets: 487924
Number of Samples: 847
Sample Pool: 0
Hardware Trigger: 0
Number of Datagrams: 27
Brocade/Foundry routers use sFlow, not Netflow, so we’re looking for samples/sec, not flow exports/sec. A variation of #samples between T0 and T1 is therefore necessary using the “sFlow collected samples” value.
sFlow services are enabled.
sFlow agent IP address: 10.123.123.1
sFlow source IP address: 22.214.171.124
sFlow source IPv6 address: 4545::2
4 collector destinations configured:
Collector IP 192.168.4.204, UDP 6343
Configured UDP source port: 33333
Polling interval is 0 seconds.
Configured default sampling rate: 1 per 512 packets
Actual default sampling rate: 1 per 512 packets
Sample mode: Non-dropped packets
The maximum sFlow sample size:512
exporting cpu-traffic is enabled
exporting cpu-traffic sample rate:16
exporting system-info is enabled
exporting system-info polling interval:20 seconds
10552 UDP packets exported
24127 sFlow samples collected.
sFlow ports: ethe 1/2 to 1/12 ethe 1/15 ethe 1/25 to 1/26 ethe 4/1 ethe 5/10 to
5/20 ethe 8/1 ethe 8/4
Module Sampling Rates
Slot 1 configured rate=512, actual rate=512
Slot 3 configured rate=0, actual rate=0
Slot 4 configured rate=10000, actual rate=32768
Slot 5 configured rate=512, actual rate=512
Slot 7 configured rate=0, actual rate=0
Slot 8 configured rate=512, actual rate=512
Port Sampling Rates
Port 8/4, configured rate=512, actual rate=512, Subsampling factor=1
Port 8/1, configured rate=512, actual rate=512, Subsampling factor=1