KDE Tables
This article covers the following topics related to the tables of the Kentik Data Engine (KDE):
Notes:
- For general information about KDE, see KDE Overview.
- For information about querying Kentik Data Engine see Querying KDE.
About KDE Tables
KDE keeps a discrete set of databases for the flow records of each customer. These databases are made up of “main tables” for the flow records and associated data received for each device (router or host) as well as for supplemental data that is derived from the flow data. The data for all devices of a given customer are also merged into that customer's All Devices Table.
The columns in these tables form the basis for the dimensions (see Dimensions Reference) that are used for filtering and group-by in Kentik queries. Most of these dimensions are exposed in the portal UI's dimension selectors for filtering (see Dimension Selector Dialog) and group-by (see Dimension Selectors). A small number of additional dimensions, covered in the topics below, are available for use only when querying via the Query SQL Method of the Query API.
Universal Data Records
Universal Data Records is Kentik's technique for populating the fields of KDE tables. As described in About KDE Tables, many of the columns of the main tables in which flow data is stored correspond to fields of the flow records ingested from individual devices. Rather than reserving columns for fields that are not used by a given device type (router, switch, firewall, etc.), Universal Data Records varies the mapping of fields to columns depending on the device type. This flexible schema enables Kentik to rapidly extend KDE's support for storing and querying data from new and disparate sources, which allows visibility into a much wider range of data points about customer networks and infrastructure. Among the many advantages of this approach are the ability to store vendor-specific flow fields (see Device-specific Dimensions), increased capacity for Custom Dimensions, and the ability to store data from flow records with non-standard fields.
Main Table Columns
The topics below cover the columns of the main table that are not exposed as filter and/or group-by dimensions in the portal UI:
Note:
- The columns in the topics below are used only when querying with SQL via the Query SQL Method of the Query API.
- For information on columns that are exposed in the portal as filter and/or group-by dimension, see Dimensions Reference.
- Some columns fit into more than one category. If you don’t find a column where you think it would be, check other categories.
Query Columns
Used to specify dataseries (Full or Fast; see Resolution Overview), and other query-related data.
Column | Type | Description |
i_fast_dataset | boolean | Input only. Determines the dataseries on which a query is run: - if not specified, dataseries selection is automatic; - if false, use Full dataseries; - if true, use Fast dataseries. |
i_sub_limit | integer | Passes a limit value that applies to all subqueries used in the current KDE query. For example, if querying for top IPs with i_sub_limit = 2, then for every main table queried by a subquery only the top two IPs are contributed to the main query results. |
Time Columns
Used to specify time ranges.
Column | Type | Description |
ctimestamp | bigint | Unix Epoch when flow was received by Kentik, 1 second resolution. |
i_start_time | timestamp with time zone | The time that flow was received, rounded to the nearest minute for display. Searching allows granularity to the nearest second (see Time Management in Queries). |
Metrics Columns
Metrics are used to specify the units in which traffic is measured, e.g. for ranking in a top-X list. The metrics listed here are not exposed as dimensions in the portal, but may be used in API calls made with the Query SQL Method.
Column | Type | Description |
both_bytes | bigint | Sum of in_bytes and out_bytes. |
both_pkts | bigint | Sum of in_pkts and out_pkts. |
ooorder_in_pkts | bigint | Out of order packets sent through the ingress interface (host devices only). |
ooorder_out_pkts | bigint | Out of order packets sent through the egress interface (host devices only). |
retransmitted_in_pkts | bigint | Retransmitted packets sent through the ingress interface (host devices only). |
tcp_retransmit | integer | Number of packets re-transmitted (host only). |
Note: For metrics that are exposed as dimensions in the portal, see Dimensions Reference.
Additional Columns
While the dimensions needed for almost any flow-based query are exposed in the portal UI (see Dimensions Reference), the columns listed below are available as additional dimensions that can be used in queries made with the Query SQL Method of the Query API.
Tags Columns
Matches to user-specified tags (see About Flow Tags).
Column | Type | Description |
dst_flow_tags | text | Tags that match on destination flow. |
src_flow_tags | text | Tags that match on source flow. |
NPM Columns
Network performance monitoring data, including retransmits, out-of-order packets, and latency (client, server, and application).
Column | Type | Description |
fragments | bigint | Packets that have been split into smaller packets for delivery across the network. |
ooorder_in_pkts | bigint | Packets that arrived out of sequence. |
ooorder_out_pkts | bigint | Packets that were sent out of sequence. |
retransmitted_in_pkts | bigint | Packets re-sent from source to destination. Note: Valid only with a reliable transport protocol such as TCP. |
DNS/WWW Columns
Data related to DNS lookup and HTTP, including domain name, referrer, status, etc.).
Column | Type | Description |
i_kflow_dns_query_type_name | text | The name of the resource record type requested by a DNS query. |
Ultimate Exit Columns
Information about the Ultimate Exit of the traffic, which is where it leaves the network (device, interface, site, etc.; see Using Ultimate Exit).
Column | Type | Description |
i_inet_ult_exit | text | The IP address, either IPv4 or IPv6, of the interface through which the flow leaves. |
i_ult_exit_site_id | bigint | The ID of the site (see About Sites) through which the flow leaves. |
ult_exit_device_id | bigint | ID of device through which the flow leaves. |
Interface Classification Columns
Information about the interface though which a flow enters and leaves a device; see Interface Classification).
Column | Type | Description |
dst_connect_type | bigint | A numeric code representing a connectivity type value (destination interface). |
src_connect_type | bigint | A numeric code representing a connectivity type value (source interface). |
ult_exit_connect_type | bigint | A numeric code representing a connectivity type value (ultimate exit interface). |
dst_network_bndry | bigint | A numeric code representing a network boundary value (destination interface). |
src_network_bndry | bigint | A numeric code representing a network boundary value (source interface). |
ult_exit_network_bndry | bigint | A numeric code representing a network boundary value (ultimate exit interface). |
CDN Attribution Columns
Information about content delivery networks associated with source and destination traffic (see About CDN Attribution):
Column | Type | Description |
dst_cdn_int | bigint | Destination CDN (content delivery network). |
i_dst_cdn | text | The commercial name of the CDN derived from the destination IP (inet_dst_addr) of an ingested flow. |
i_src_cdn | text | The commercial name of the CDN derived from the source IP (inet_src_addr) of an ingested flow. |
src_cdn_int | bigint | Source CDN (content delivery network). |
Threat Feed Columns
Information about source and destination hosts and IPs that have been identified as a security threat by Spamhaus (updated daily):
Column | Type | Description |
dst_threat_type | text | Destination IP type of threat (as defined by Spamhaus). |
src_threat_type | text | Source IP type of threat (as defined by Spamhaus). |
Using Main Tables
The following topics cover special considerations affecting the use of KDE Main Table columns:
Aggregation Duration
In queries covering a timespan of more than a few minutes, KDE performs aggregation to standardize both the interval between reported data points and the start and end times of the covered data. The duration used for this aggregation is determined internally by KDE depending on the query's specified timespan. An aggregation duration of, for example, 60 seconds results in a data point interval of one minute with the start and end of the timespan adjusted to the closest minute.
To represent the aggregation duration in the query itself, KDE supports a variable named i_duration. The i_duration value is expressed in seconds. It will be the same for all rows of all tables queried, and can be used to normalize bytes/packets to bps or pps. When a query includes a GROUP clause covering other fields (e.g. src_as), i_duration is used in an aggregate function in the SELECT clause, e.g. SELECT max(i_duration).
Note: i_duration cannot be used in a WHERE clause.
Required Select Columns
The inclusion in a KDE query of any of the main table columns described above is for the most part optional. But unless the SELECT statement in a given query includes one or more of the following columns, that query's results will be empty (no rows returned):
- in_bytes
- in_pkts
- out_bytes
- out_pkts
- both_bytes
- both_pkts
- tcp_retransmit
MAC Address Columns
The following information applies to the Main Table columns src_eth_mac and dst_eth_mac, which store MAC address data collected from flow data of type sFlow, IPFIX (depending on configuration), and pcap:
- MAC address entry in the portal, psql client, or API may be any of the following formats:
- 11.11.11.11.11.11
- 11:11:11:11:11:11
Note: Addresses may be entered with or without leading zeros (e.g. 1.1.1.1.1.1 or 01.01.01.01.01.01). - MAC address display in the portal and in query results is always period-separated (e.g. 11.11.11.11.11.11).
- Only the following operators are supported in queries for use with these columns:
=, <>, !=.
All Devices Table
In addition to the individual table for each device, each customer’s database includes a special table named all_devices, which is a merge into one table of the data from all of that customer’s devices. The All Devices table includes all the fields of an individual device table, but also includes the following additional fields:
Column | Type | Description |
i_device_id | text | Kentik-assigned unique numerical ID of the device. Used for all functions. |
i_device_name | text | User-defined name for the device. |
i_device_site_name | text | Name of the site to which the device has been assigned (see About Sites). If the device hasn't been assigned to a site, returns an empty string. Notes: - Supported operators for WHERE clause: case-insensitive equality, LIKE, IN, and regex matching. - Site assignments in the table may lag Admin settings by up to 10 minutes. |
i_device_type | text | Type of device, e.g. router or host (see Supported Device Types). Note: Used only for selection (filtering with WHERE clause), not for display or GROUP_BY. |
The additional fields of the All Devices table can be used to filter returned data by individual routers, or to further group by device using i_device_id.
Note: The All Devices fields above are exposed as filtering and group-by dimensions in the portal (see Dimensions Reference).