Using Kentik Firehose

Prev Next

The use of Kentik Firehose is covered here.

About Kentik Firehose

This section provides general information about Kentik Firehose and how it works.

Firehose Overview

Kentik is built around the Kentik Data Engine (KDE), which gathers flow records (e.g., NetFlow or sFlow) sent by your data sources (network devices, hosts, clouds) and enriches them with additional data (SNMP, GeoIP, BGP, threat feeds) before storing them in a time-series database. Kentik Firehose provides the ability to simultaneously send the same enriched flow records that are ingested by KDE to a location of your choice, which enables you to integrate the data into other (non-Kentik) analytics systems, either directly or through a data lake.

Note: Firehose capability is not currently implemented in the Kentik portal for cloud data sources. To use ktranslate for flow log data from cloud resources, contact Kentik (see Customer Care).

ktranslate Overview

To receive enriched flow records that have been "T-ed" by KDE you deploy a Kentik-provided agent (client binary) called ktranslate in a public cloud resource or on a server in your data center:

  • Data is sent to ktranslate in Kentik's normalized kflow format, and can be converted by ktranslate into a desired output format (see Formats in ktranslate Operations).

    Note: ktranslate does not currently input HTTPS. Use a proxy to convert to HTTP before passing encrypted kflow to ktranslate (HTTPS » [proxy] » HTTP » [ktranslate]).

  • Each Kentik-registered data source (e.g. router or host) can send data to one or more endpoints (URLs), which each represent a single instance of ktranslate. To specify ktranslate endpoints for a given data source, use the Integrations tab of the Device dialog (see Firehose Data Sources), which is accessed in the portal via the Devices page (Settings » Devices).

  • Each ktranslate instance can receive the flow data for multiple data sources.

  • You can deploy and run as many instances of ktranslate as you wish.

  • Kentik recommends one instance per 100k flows per second ingested by Kentik from the data sources sending data to ktranslate.

ktranslate Scenarios

As suggested by the name, ktranslate is designed to be deployed between two network entities to enable data from one to be utilized by the other. The ktranslate agent listens for HTTP traffic from the source entity, performs various operations on the data as specified with the arguments of the ktranslate CLI (see ktranslate Command Line), and passes the data on to one or more other entities.

The base deployment scenario involves sending kflow from KDE to a third-party system that can use the enriched flow data for a purpose such as business intelligence. In addition to this use case, the table below shows some of the other purposes for which ktranslate may be useful.

From

To

ktranslate function

KDE

3rd-party analytics system or data lake

Format data and direct the resulting stream to a sink for a 3rd-party system.

kproxy

KDE, other system

Split kproxy-encrypted kflow into streams for KDE and a 3rd-party system.

Note: Requires setting the -servers argument in kproxy (see kproxy CLI Reference).

kprobe

KDE, other system

Split host flow records from kprobe into streams for KDE and a 3rd-party system.

Note: Sending kflow directly from kprobe to ktranslate (bypassing KDE) requires setting the --flow-url argument in kprobe (see Debug-only Parameters) to the ktranslate endpoint.

ktranslate Operations

As described in ktranslate Scenarios, ktranslate can perform various operations on the data in an incoming stream before passing that data along via an output stream or file. These operations fall into the following categories:

  • Formats: Specify the format to which the kflow input is converted for output. Output options include JSON (default), NetFlow, Apache Avro, InfluxDB line protocol, Splunk, and Prometheus.

    Note: Rollups are not supported when format is NetFlow.

  • Compression: Apply a compression algorithm to the output data. Compression options include none (default), gzip, snappy, null, and deflate.

  • Sinks: Set the destinations to which the output data streams should be written. Sink options include stdout (default), Kafka, New Relic, Kentik, net, HTTP, Splunk, Prometheus, and file.

    Note: You can specify multiple sinks for the output of a single ktranslate instance.

  • Rollups: Group data into dimension and metric sets. Rollups are specified as a comma-separated string stating method, name, metric, and dimensions 1 through n. In the following example, sum is the rollup method, kentik.bytes is the name, in_bytes is the metric, and dst_addr is dimension 1 (of 1):
    -rollups sum,kentik.bytes,in_bytes,dst_addr

    Notes:

    • Supported rollup methods are currently sum, min, max, mean, median, entropy, percentilerank, percentile, unique.

    • Output may contain multiple rollups.

    • The width of each time-slice in the aggregation defaults to 15 seconds but may be set with the argument -rollup_interval.

    • Additional variations to rollups can be specified with the ktranslate arguments that begin with -rollup_ (see ktranslate CLI Reference).

    • Metrics and dimensions supported for rollups are listed in ktranslate Metrics and Dimensions.

  • Filters: Focus the data in the ktranslate output by including only data that matches specified filters. As shown in the following example, a filter is specified as a comma-separated string with four attributes: type, dimension, operator, and value:
    -filters string,src_addr,==,12.0.1.2

    Notes:

Notes:

  • The operations and options above are for ktranslate version 2.

  • For examples of how the above operations are actually specified with the ktranslate CLI, see ktranslate CLI Examples.

Firehose Setup Overview

Setting up Firehose is a two-stage process:

  • Deploy ktranslate in a Docker container on a server in your data center or in a public cloud resource, as detailed in ktranslate Download and Install:\

    • The server must have a public IP address and meet the requirements given in ktranslate Requirements.

    • The ktranslate CLI arguments (see ktranslate CLI Reference) in the Docker run command will determine which operations this instance of ktranslate performs and the destinations to which it outputs streams. Some typical deployment options are covered in ktranslate CLI Examples.

  • In the Kentik portal, set the deployed ktranslate instance as the destination of a T-ed stream from KDE that will contain the kflow for a given Kentik-registered data source (see Firehose Data Sources). Repeat as needed to cover all of the data sources for which you wish to receive kflow on this ktranslate instance.

Note: By repeating the process above you may send kflow to multiple instances of ktranslate. You may deploy as many instances as needed to route the data for your specific purposes.

ktranslate Deployment

Deployment of ktranslate is covered here.

ktranslate Requirements

The following resources must be available (at minimum) to support the use of ktranslate running in a container:

ktranslate Download and Install

Installation from a downloaded Docker image provides a convenient and easy deployment mechanism for systems that already use Docker-based containerized applications.

To install ktranslate via Docker:

  1. Pull down the latest ktranslate image from Kentik's Docker hub repository (https://hub.docker.com/r/kentik/ktranslate):
    # docker pull kentik/ktranslate:v2

  2. Run the Docker image as shown in this example:
    # docker run -p 8082:8082 kentik/ktranslate:v2

  3. In the portal, the data sources from which you wish to send kflow to this ktranslate instance must be provided with the instance's endpoint (see Firehose Data Sources).

Notes:

  • To install and run the latest version of ktranslate, leave :v2 off of the end of the above examples.

  • In the above docker run command (step 2) the -p argument maps an external port to a port in the container.

  • For a reference to the ktranslate arguments that may be used in the above run command, see ktranslate CLI Reference.

  • Example command lines for common use cases are provided in ktranslate CLI Examples.

ktranslate Command Line

The command line for ktranslate is covered here.

ktranslate CLI Examples

The topics below provide examples of the ktranslate arguments for common docker run commands.

ktranslate Operation Examples

The following examples illustrate commands that are used for some common ktranslate operations:

  • Filtered output: Translate kflow to JSON and output one stdout stream that is filtered to pass only traffic whose source port is 80.
    docker run -p 8082:8082 kentik/ktranslate:v2 -format json -sinks stdout -filters int,l4_src_port,==,80 -listen=0.0.0.0:8082

    Note: Multiple filters may be used. All filters are ANDed.

  • Output with rollups: Translate kflow to JSON and output two rollups to an stdout sink. One rollup aggregates the top 10 unique destination and source IPs every 60 seconds. The second represents the top source ports by bytes, aggregated at the default 15 second interval. The optional -rollup_and_alpha flag will result in the specified sinks receiving, in addition to the rollups, a clone of the enriched kflow stream that is received by ktranslate.
    docker run -p 8082:8082 kentik/ktranslate:v2 -format json -sinks stdout -rollups unique,top_src_addr_by_count_dst_addr,dst_addr,src_addr -rollup_top_k 10 -rollup_interval 60 -rollups sum,in_bytes+out_bytes,l4_src_port -rollup_and_alpha -listen=0.0.0.0:8082

    Note: Rollups are not supported when -format is NetFlow.

ktranslate Output Examples

The following examples illustrate commands that are used for common output destinations (the filter and rollout operations illustrated above may be used with any of the outputs below):

  • Output to Elastic: Translate kflow to JSON and send to a local instance of Elastic via HTTP:
    docker run -p 8082:8082 kentik/ktranslate:v2 -format json -sinks http -http_url=http://127.0.0.1:9200/indexname/typename/optionalUniqueId -listen=0.0.0.0:8082

  • Output to file: Translate kflow to InfluxDB line protocol for output to a file:
    docker run kentik/ktranslate -format influx -sinks file -file_on -file_out destination_directory -listen=0.0.0.0:8082

    Note: Output to file requires that the -file_on flag be present.

  • Output to InfluxDB: Translate kflow to InfluxDB line protocol for output via HTTP to a local instance named "kentik":
    docker run -p 8082:8082 kentik/ktranslate:v2 -format influx -sinks http -http_url=http://localhost:8086/write?db=kentik -listen=0.0.0.0:8082

  • Output to Kafka: Translate kflow to Apache Avro and output to a Kafka topic named "kentik_netflow":
    docker run -p 8082:8082 kentik/ktranslate:v2 -format avro -sinks kafka --kafka_topic=kentik_netflow -listen=0.0.0.0:8082

  • Output as NetFlow: Translate kflow to NetFlow and send, with a limit of 10 flows per message, to a local collector running on 127.0.0.1:9913:
    docker run -p 8082:8082 kentik/ktranslate:v2 -format netflow -sinks net -net_server 127.0.0.1:9913 -max_flows_per_message 10 -listen=0.0.0.0:8082

    Notes:

    • NetFlow output defaults to IPFIX (version 10). For NetFlow version 9 add this argument: -netflow_version netflow9.

    • Rollups are not supported when -format is NetFlow.

  • Output to New Relic: Translate kflow to JSON and to a New Relic account with the specified ID (requires New Relic API key):
    docker run -e NEW_RELIC_API_KEY=$API_KEY -p 8082:8082 kentik/ktranslate:v2 -format new_relic -sinks new_relic --nr_account_id=ID -listen=0.0.0.0:8082

  • Output to Prometheus: Translate kflow to Prometheus and output to a Prometheus server listening on port 8084:
    docker run -p 8082:8082 kentik/ktranslate:v2 -format prometheus -sinks prometheus -prom_listen=:8084 -listen=0.0.0.0:8082

  • Output to Splunk: Send kflow to a local instance of Splunk:
    docker run -e KENTIK_API_TOKEN=kentik_api_token -p 8082:8082 kentik/ktranslate:v2 -format splunk -sinks http -http_url=https://your-collector.splunkcloud.com:8088/services/collector/event -http_header 'Authorization: Splunk placeholder-for-your-splunk-token' [email protected] -compression gzip -listen=0.0.0.0:8082

Note: In the Splunk example above, the authorization value that you provide in the -http_header argument is a Splunk HEC token (see https://docs.splunk.com/Documentation/Splunk/7.3.0/Data/UsetheHTTPEventCollector).

ktranslate CLI Reference

The table below lists the arguments of the ktranslate CLI version 2. Use -h to return this list in whichever version you are using.

Argument (type)

Description

-asn4 (string)

ASN IPv6 mapping file

-asn6 (string)

ASN IPv6 mapping file

-bootstrap.servers (string)

bootstrap.servers

-city (string)

City mapping file

-compression (string)

Compression algo to use (none|gzip|snappy|deflate|null) (default "none")

-dns (string)

Resolve IPs at this ip:port

-file_out (string)

Write flows seen to log to this directory if set (default "./")

-filters (value)

Any filters to use. Format: type dimension operator value

-format (string)

Format to convert kflow to: (json|avro|netflow|influx|prometheus) (default "json")

-geo (string)

Geo mapping file

-healthcheck (string)

Bind to this interface to allow healthchecks

-http_header (value)

Any custom http headers to set on outbound requests

-http_url (string)

URL to post to (default "http://localhost:8086/write?db=kentik")

-info_collector

Also send stats about this collector

-interfaces (string)

Interface mapping file

-kafka.debug (string)

Debug contexts to enable for kafka

-kafka_topic (string)

Kafka topic to produce on

-kentik_email (string)

Email to use for sending flow on to Kentik

-kentik_url (string)

URL to use for sending flow on to Kentik (default "https://flow.kentik.com/chf")

-listen (string)

The interface and port on which ktranslate should listen for flow data from Kentik. Default is off (no data received from Kentik).

-log_level (string)

Logging Level (default "debug")

-mapping (string)

Mapping file to use for enums (default "config.json")

-max_flows_per_message (int)

Max number of flows to put in each emitted message (default 10000)

-max_sql_conns (int)

Max concurrent SQL connections (default 16)

-measurement (string)

Measurement to use for rollups. (default "kflow")

-metalisten (string)

HTTP port to bind on (default "localhost:0")

-metrics (string)

Metrics Configuration. none|syslog|stderr|graphite:127.0.0.1:2003 (default "syslog")

-net_protocol (string)

Use this protocol for writing data (udp|tcp|unix) (default "udp")

-net_server (string)

Write flows seen to this address (host and port)

-netflow_version (string)

Version of netflow to produce: (netflow9|ipfix) (default "ipfix")

-nr_account_id (string)

If set, sends flow to New Relic

-nr_url (string)

URL to use to send into NR (default "https://insights-collector.newrelic.com/v1/accounts/%s/events")

-olly_dataset (string)

Olly dataset name

-olly_write_key (string)

Olly dataset name

-prom_listen (string)

Bind to listen for prometheus requests on. (default ":8082")

-region (string)

Region mapping file

-rollup_and_alpha

Send both rollups and alpha inputs to sinks. The alpha output is a clone of the enriched kflow stream received by ktranslate.

-rollup_interval (int)

Export timer for rollups in seconds

-rollup_key_join (string)

Token to use to join dimension keys together (default "'")

-rollup_top_k (int)

Export only these top values (default 10)

-rollups (value)

Any rollups to use. Structure: method, name, metric, dimensions (1 through n). Available methods are sum, min, max, mean, median, entropy, percentilerank, percentile, unique

Note: For available metrics see ktranslate Metrics and Dimensions.

-s3_bucket string

AWS S3 Bucket to write flows to

-s3_flush_sec int

Create a new output file every this many seconds (default 60)

-s3_prefix string

AWS S3 Object prefix (default "/kentik")

-sample_rate (int)

Sampling rate to use. 1 -> 1:1 sampling, 2 -> 1:2 sampling and so on. (default 1)

-sasl.kerberos.keytab (string)

sasl.kerberos.keytab

-sasl.kerberos.kinit.cmd (string)

sasl.kerberos.kinit.cmd (default "kinit -R -t \"%{sasl.kerberos.keytab}\" -k %{sasl.kerberos.principal} || kinit -t \"%{sasl.kerberos.keytab}\" -k %{sasl.kerberos.principal}")

-sasl.kerberos.principal (string)

sasl.kerberos.principal

-sasl.kerberos.service.name (string)

sasl.kerberos.service.name (default "kafka")

-sasl.mechanism (string)

sasl.mechanism

-security.protocol (string)

security.protocol

-service_name (string)

Service identifier (default "ktranslate")

-sinks (string)

List of sinks to send data to. Options: (kafka|stdout|new_relic|kentik|net|http|splunk|prometheus|file) (default "stdout")

-ssl.ca.location (string)

ssl.ca.location

-stdout

Log to stdout (default true)

-tag_lookup (string)

Tag service port to run lookups on

-threads (int)

Number of threads to run for processing

-v

Show version and build information

ktranslate Metrics and Dimensions

The table below lists the values that may be specified for metrics and dimensions in a ktranslate rollup or filter. Usage depends on the type of the value:

  • String:

    • In a rollup, if the method is unique, a value of type string can be used as a metric.

    • In all other cases, a value of type string may be used only as a dimension (rollup dimension or filter dimension).

    • In a filter of type addr, the dimension must be one of the below string values that end in "_addr" (e.g. dst_addr or src_addr).

  • Int: Any value of type int may be used as either a metric or a dimension.

Value

Type

company_id

int64

custom_bigint

string

custom_int

string

custom_str

string

device_id

int64

device_name

string

dst_addr

string

dst_as

int64

dst_bgp_as_path

string

dst_bgp_comm

string

dst_eth_mac

string

dst_flow_tags

string

dst_geo

string

dst_geo_city

string

dst_geo_region

string

dst_nexthop

string

dst_nexthop_as

int64

dst_route_prefix

int64

dst_second_asn

int64

dst_third_asn

int64

header_len

int64

in_bytes

int64

in_pkts

int64

input_int_alias

string

input_int_desc

string

input_port

int64

l4_dst_port

int64

l4_src_port

int64

out_bytes

int64

out_pkts

int64

output_int_alias

string

output_int_desc

string

output_port

int64

protocol

int64

sample_rate

int64

sampled_packet_size

int64

src_addr

string

src_as

int64

src_bgp_comm

string

src_bpg_as_path

string

src_eth_mac

string

src_flow_tags

string

src_geo

string

src_geo_city

string

src_geo_region

string

src_nexthop

string

src_nexthop_as

int64

src_route_prefix

int64

src_second_asn

int64

src_third_asn

int64

tcp_flags

int64

tcp_rx

int64

timestamp

int64

tos

int64

vlan_in

int64

vlan_out

int64

Notes:

Firehose Data Sources

Once a ktranslate instance is deployed in your datacenter or cloud resource you can send kflow to it from the KDE or other source (see ktranslate Scenarios). Because multiple instances of ktranslate may be deployed, you need to indicate to Kentik, via the portal, the instance to which the kflow for a given data source should be sent.

Kentik Firehose URL Endpoint configuration for data integration and analytics systems.

To send kflow for a data source from KDE to ktranslate:

  1. Determine the public IP address and port number of the ktranslate instance to which you want to send kflow for a device.

  2. From the main menu in the Kentik portal, click Settings in the column at right, which takes you to the Settings page.

  3. In the card at upper left, click the Network Devices link to go to the Devices page.

  4. In the Devices list, find the device (router or host) for which you want to send enriched flow data to a ktranslate instance. Open the Device dialog for that device by clicking the Edit icon at the right of the device's row in the list.

  5. In the Firehose Endpoint field in the dialog's Integrations tab, enter the public IP address and port number of the ktranslate instance, then click the Save button to save the setting and close the dialog.

Note: To send kflow for this device to multiple ktranslate instances, enter a comma-separated list of the corresponding URLs.


© 2014-25 Kentik