Skip to content

Tips for Deployments

Michel Machado edited this page Sep 30, 2024 · 22 revisions

Introduction

These are notes on a Gatekeeper deployment consisting of one Gatekeeper server and two Grantor servers. They assume Ubuntu 20.04 servers with Gatekeeper installed via packages.

This small deployment is meant to help new users to get started with Gatekeeper, so they can evaluate Gatekeeper, write their policy, and incrementally grow their deployment from this first step.

The network topology is shown below, where the Gatekeeper server has its front port connected to a data center uplink and its back port connected to a router. The router works as a gateway for a number of servers which provide services to the Internet via the external network, while the internal network is used for administrative purposes. The Gatekeeper server uses a routing daemon such as Bird to establish a full-routing BGP session with the uplink provider and an iBGP session with the router. In this setup, the Gatekeeper server has a single uplink connection, so it will be configured with a default route to the uplink provider's router, therefore reducing its memory usage. For more complex configurations, a patched Bird version can be used to feed learned prefixes into Gatekeeper. The Grantor servers have their front port connected to the external network. They do not have a back port configuration in Gatekeeper, and the internal network link is used solely for administrator access.

                                external network
                    +-------------------+-----------+------------+
                    |                   |           | front      | front
              +-----+------+       +----+---+  +----+----+  +----+----+
              |            |       |        |  |         |  |         |
uplink -------+ gatekeeper +-------+ router |  | grantor |  | grantor |
        front |            | back  |        |  |         |  |         |
              +-----+------+       +----+---+  +----+----+  +----+----+
                    |                   |           |            |
                    +-------------------+-----------+------------+
                                internal network

Gatekeeper front IPv4: 10.1.0.1/30
Gatekeeper front IPv6: 2001:db8:1::1/126

Uplink router IPv4: 10.1.0.2/30
Uplink router IPv6: 2001:db8:1::2/126

Gatekeeper back IPv4: 10.2.0.1/30
Gatekeeper back IPv6: fd00:2::1/126

Router IPv4 on Gatekeeper link: 10.2.0.2/30
Router IPv6 on Gatekeeper link: fd00:2::2/126

External network IPv4 CIDR: 1.2.3.0/24
External network IPv6 CIDR: 2001:db8:123::/48

Grantor front IPv4: 1.2.3.4 and 1.2.3.5
Grantor front IPv6: 2001:db8:123::4 and 2001:db8:123::5

Note that in this simple example, there is a single Gatekeeper server. In a deployment that requires handling high bandwidth attacks, multiple Gatekeeper servers can be used, and the router in front of them must be configured with ECMP, using a hash of the source and destination IP addresses to achieve load balancing.

Basic configuration

These steps can be performed for both Gatekeeper and Grantor servers, with the caveat that Grantors only have a front port, so any references to the back port can be ignored.

  1. Enable IOMMU and huge pages

The Gatekeeper server in this deployment has 256 GB of RAM. We reserve 16 GB for the kernel and allocate the remaining 240 GB in 1 GB huge pages. To pass the appropriate command line arguments to the kernel, edit /etc/default/grub and set GRUB_CMDLINE_LINUX_DEFAULT, running update-grub afterwards. Here we also enable IOMMU support via the intel_iommu argument.

GRUB_CMDLINE_LINUX_DEFAULT="intel_iommu=on default_hugepagesz=1G hugepagesz=1G hugepages=240"
  1. Rename front and back ports

It's useful to have friendly interface names in machines with many NICs. We're going to call the Gatekeeper front and back ports, appropriately, "front" and "back". This will be done with systemd link files. In the link file, it's important to specify a Match section option that doesn't cause the kernel to rename back the interface once it has been taken control of by Gatekeeper.

For this deployment, we have used the PCI addresses of the interfaces. It can be obtained via udevadm:

# udevadm info /sys/class/net/<front port name> | grep ID_PATH=
E: ID_PATH=pci-0000:01:00.0

# udevadm info /sys/class/net/<back port name> | grep ID_PATH=
E: ID_PATH=pci-0000:02:00.0

Create systemd link files for the front and back interfaces (the latter only in the Gatekeeper server) and run update-initramfs -u afterwards. An example using the output from the above udevadm commands is given below:

# /etc/systemd/network/10-front.link
[Match]
Property=ID_PATH=pci-0000:01:00.0
[Link]
Name=front

# /etc/systemd/network/10-back.link
[Match]
Property=ID_PATH=pci-0000:02:00.0
[Link]
Name=back

Once these two changes are in place, reboot the machine for them to take effect. It's also important to remember that DPDK won't take over an interface that is in the UP state, so it's advised to remove the front and back interfaces from the operating system's network configuration (e.g. /etc/network/interfaces in Ubuntu).

Gatekeeper server configuration

Environment variables

The first step is to edit the /etc/gatekeeper/envvars and set the GATEKEEPER_INTERFACES variable with the PCI addresses of the front and back interfaces:

GATEKEEPER_INTERFACES="01:00.0 02:00.0"

Main configuration

For the Gatekeeper server, set gatekeeper_server to true in /etc/gatekeeper/main_config.lua:

local gatekeeper_server = true

Gatekeeper is composed of multiple functional blocks, each one with its own Lua configuration script located in /etc/gatekeeper.

GK block: /etc/gatekeeper/gk.lua

In this file, the following variables have been set as below:

local log_level = staticlib.c.RTE_LOG_NOTICE
local flow_ht_size = 250000000
local max_num_ipv4_rules = 1024
local num_ipv4_tbl8s = 128
local max_num_ipv6_rules = 1024
local num_ipv6_tbl8s = 256

The flow_ht_size variable is set close to the largest number that enables Gatekeeper to boot up. The larger the flow table, the better Gatekeeper can deal with complex attacks since it can keep state for more flows. To estimate how much memory a given value will consume, multiply flow_ht_size by the number of NUMA nodes, two (i.e. the default number of instances of GK blocks per NUMA node), and 256 bytes. The Gatekeeper server in this deployment has two Intel Xeon processors, that is, two NUMA nodes, so our setting consumes 250000000 * 2 * 2 * 256 bytes ~ 238GB. Notice that this value is an upper bound, so it, in fact, consumes less memory than this estimate. Finally, it is worth pointing out that this setup tracks 250000000 * 2 * 2 = 1 billion flows.

The values for the max_num_ipv[46]_rules and num_ipv[46]_tbl8s variables have been set to small values, as we are configuring Gatekeeper with default routes.

If you are injecting BGP routes into Gatekeeper, the values for these variables depend on the size of the routing table. To calculate them, we first generate IPv4 and IPv6 routing table dumps from full routing BGP sessions, creating, respectively, the ipv4-ranges and ipv6-ranges text files, each containing one CIDR per line. The max_num_ipv[46]_rules and num_ipv[46]_tbl8s variables are set to a round number above the values given by the gtctl tool as described in the project's README file, using the gtcl estimate command, where the ipv4-ranges and ipv6-ranges files are lists of prefixes obtained from routing table dumps of full routing IPv4 and IPv6 BGP sessions.

$ gtctl estimate -4 ipv4-ranges
ipv4: rules=1811522, tbl8s=1554

$ gtctl estimate -6 ipv6-ranges
ipv6: rules=313120, tbl8s=76228

In that case, we would set the following values:

local max_num_ipv4_rules = 2000000
local num_ipv4_tbl8s = 2000
local max_num_ipv6_rules = 400000
local num_ipv6_tbl8s = 100000

Solicitor block: /etc/gatekeeper/sol.lua

Up to version 1.1. By default, Gatekeeper limits the request bandwidth to 5% of the link capacity. In our deployment, we are using 10 Gbps interfaces for the Gatekeeper server and router, but the external network runs on 1 Gbps ethernet. With this configuration, 5% of the link capacity would amount to 50% of the external network bandwidth, so we reduce the request bandwidth rate to 0.5% of the Gatekeeper link capacity:

local req_bw_rate = 0.005

Starting at version 1.2. Gatekeeper needs to know the bandwidth of the destination (or protected) network to calculate the bandwidth of the request channel. In our deployment, we are using 10 Gbps interfaces for the Gatekeeper server and router, but the external network runs on 1 Gbps ethernet. Thus, the bandwidth of the destination network is 1 Gbps:

local destination_bw_gbps = 1

Network block configuration: /etc/gatekeeper/net.lua

In this file, set the variables below according to your network setup. Examples have been given below for a front port named front and a back port named back. In this deployment, the front port belongs to a VLAN and uses LACP, so we set the appropriate VLAN tags for IPv4 and IPv6, and the bonding mode to staticlib.c.BONDING_MODE_8023AD. In our environment, the back port is not in a VLAN, nor does it use link aggregation. The back_mtu variable is set to a high value to account for IP-IP encapsulation in packets sent to the Grantor servers. Note that the MTU for the network interfaces in the path from the Gatekeeper servers' back_port to the Grantor server's front_port should be set to this value (other network interfaces in the network do not need to be reconfigured).

local user = "gatekeeper"

local front_ports = {"front"}
local front_ips = {"10.1.0.1/30", "2001:db8:1::1/126"}
local front_bonding_mode = staticlib.c.BONDING_MODE_8023AD
local front_ipv4_vlan_tag = 1234
local front_ipv6_vlan_tag = 1234
local front_vlan_insert = true
local front_mtu = 1500

local back_ports = {"back"}
local back_ips = {"10.2.0.1/30", "fd00:2::1/126"}
local back_bonding_mode = staticlib.c.BONDING_MODE_ROUND_ROBIN
local back_ipv4_vlan_tag = 0
local back_ipv6_vlan_tag = 0
local back_vlan_insert = false
local back_mtu = 2048

Other functional blocks

In the remaining Lua configuration files, we simply set the log_level variable. For production use, we specify the WARNING level:

local log_level = staticlib.c.RTE_LOG_WARNING

Configuring routes and Grantors in Gatekeeper

Gatekeeper configuration with default routes

As mentioned in the introduction, the Gatekeeper server will be configured using the uplink provider's router as a default gateway. We recommend the usage of a default gateway for Gatekeeper whenever possible, because this configuration reduces memory usage (which can instead be used for flow storage), improves CPU cache hits (allowing for a larger packet processing rate) and simplifies the deployment process because a stock BGP daemon such as Bird can be used. Default gateways are set using Gatekeeper's dynamic configuration mechanism, which consists in writing a Lua script that performs the desired configuration and passing it to the gkctl tool.

The same mechanism is used to specify the Grantor servers used by Gatekeeper. As illustrated in the network topology description, the uplink provider's router has IPv4 address 10.1.0.2 and IPv6 address 2001:db8:1::2 and the two Grantor servers have external IPv4 addresses 1.2.3.4 and 1.2.3.5 and external IPv6 addresses 2001:db8:123::4 and 2001:db8:123::5. The router's addresses in the interface connected to the Gatekeeper server's back port are 10.2.0.2 and fd00:2::2, and the external network IPv4 and IPv6 CIDR blocks are, respectively, 1.2.3.0/24 and 2001:db8:123::/48.

Create the /etc/gatekeeper/init.lua file with the contents below:

require "gatekeeper/staticlib"

local dyc = staticlib.c.get_dy_conf()

-- IPv4 default gateway:
dylib.c.add_fib_entry('0.0.0.0/0', nil, '10.1.0.2', dylib.c.GK_FWD_GATEWAY_FRONT_NET, dyc.gk)

-- IPv6 default gateway:
dylib.c.add_fib_entry('::/0', nil, '2001:db8:1::2', dylib.c.GK_FWD_GATEWAY_FRONT_NET, dyc.gk)

-- IPv4 grantor configuration:
local addrs = {
  { gt_ip = '1.2.3.4', gw_ip = '10.2.0.2' },
  { gt_ip = '1.2.3.5', gw_ip = '10.2.0.2' },
}
dylib.add_grantor_entry_lb('1.2.3.0/24', addrs, dyc.gk)

-- IPv6 grantor configuration:
local addrs = {
  { gt_ip = '2001:db8:123::4', gw_ip = 'fd00:2::2' },
  { gt_ip = '2001:db8:123::5', gw_ip = 'fd00:2::2' },
}
dylib.add_grantor_entry_lb('2001:db8:123::/48', addrs, dyc.gk)

This script must be sent to Gatekeeper via the gkctl tool after startup. The best way to do this is to configure a systemd override with an ExecStartPost command that runs gkctl, with a long enough timeout to account for the Gatekeeper startup delay. Run systemctl edit gatekeeper and insert the following content:

[Service]
ExecStartPost=/usr/sbin/gkctl -t 300 /etc/gatekeeper/init.lua
TimeoutStartSec=300

Injecting BGP routes into Gatekeeper

If for some reason you cannot use a default gateway setup (e.g. if you have multiple uplinks connected to the Gatekeeper server), you need to deploy a patched Bird version which allows for communication between the two daemons. In that case, remove the calls to dylib.c.add_fib_entry() that set the IPv4 and IPv6 default routes from the init.lua script above, and add the following block in your Bird configuration file:

protocol device {
  ...
  port 0x6A7E;
  ...
}

protocol kernel kernel4 {
  ipv4 {
    ...
    port 0x6A7E;
    ...
  }
}

protocol kernel kernel6 {
  ipv6 {
    ...
    port 0x6A7E;
    ...
  }
}

The device and kernel settings allow Bird to interact with a userspace process listening on a socket identified by the given port ID, which must match the cps_conf.nl_pid setting in /etc/gatekeeper/cps.lua (0x6A7E is the default).

Start Gatekeeper

Simply start and enable Gatekeeper via systemd:

# systemctl start gatekeeper
# systemctl enable gatekeeper

Grantor server configuration

Main configuration

For the Grantor server, set gatekeeper_server to false in /etc/gatekeeper/main_config.lua:

local gatekeeper_server = false

GT block: /etc/gatekeeper/gt.lua

In this file, the following variables have been set as below:

local n_lcores = 2
local lua_policy_file = "policy.lua"
local lua_base_directory = "/etc/gatekeeper"

Network block configuration: /etc/gatekeeper/net.lua

For Grantor servers, the network configuration is analogous to the one for the Gatekeeper servers, with the exception that there's no back port when running Gatekeeper in Grantor mode.

Here we assume no link aggregation and no VLAN configuration. Notice the MTU configuration matching the Gatekeeper server's back_mtu value.

local user = "gatekeeper"

local front_ports = {"front"}
local front_ips = {"1.2.3.4/24", "2001:db8:123::4/48"}
local front_bonding_mode = staticlib.c.BONDING_MODE_ROUND_ROBIN
local front_ipv4_vlan_tag = 0
local front_ipv6_vlan_tag = 0
local front_vlan_insert = false
local front_mtu = 2048

Other functional blocks

In the remaining Lua configuration files, we simply set the log_level variable. For production use, we specify the WARNING level:

local log_level = staticlib.c.RTE_LOG_WARNING

The policy script

The Grantor configuration in gt.lua points to a Lua policy script, a fundamental element of the Gatekeeper architecture. When a packet from a new flow arrives at the Gatekeeper server, it is forwarded to the Grantor server for a policy decision. In the simplest case, this decision is a binary choice of granting or declining packets belonging to this flow, along with the maximum bandwidth for the granted flows and the duration of each decision. However, the policy response is in fact a reference to a BPF program installed in the Gatekeeper server, which can not only accept or deny packets, but also control the bandwidth budget available to the flow and adapt its response according to changing traffic patterns. Once a BPF program has been assigned to the flow, further packets will be handled directly by the Gatekeeper server, according to the rules encoded in the program, and no new requests will be sent to the Grantor server until the flow expires.

The entry point of the policy script is a function called lookup_policy, which receives as arguments a packet information object, which allows policy decisions to be made based on layer 2, 3 and 4 header fields, and a policy object, which can be used to set bandwidth and duration limits to the policy decision. This function must return a boolean value to indicate whether the policy decision is to grant or decline the flow. In practice, we can use the decision_granted and decision_declined functions and their variations from the policylib Lua package to set the policy parameters (i.e. the BPF program index, the bandwidth budget and the duration of the decision) and return the appropriate boolean value. These functions set the BPF program index field of the policy decision, respectively, to the granted and declined programs, which are bundled with a standard Gatekeeper installation. In the example below, we will in fact use the decision_grantedv2 function, which is a simple wrapper for decision_grantedv2_will_full_params. They set the BPF program index to the more flexible grantedv2 program, also included with Gatekeeper. It supports negative and secondary bandwidth settings, allows for direct delivery to be selected and can also be reused in custom BPF programs. We will also use the decision_web function, also a wrapper for decision_grantedv2_will_full_params, which selects the web BPF program that also comes with Gatekeeper. This example BPF program allows for ICMP packets and incoming TCP segments with destination ports HTTP, HTTPS, SSH and FTP-related ports. It also allows incoming TCP segments with source ports HTTP and HTTPS and an example of how to allow replies to connections initiated from the server. Finally, we will illustrate the use of the decision_tcpsrv function. This function accepts lists of listening and remote ports and selects the tcp-services BPF program. This is a generic BPF program that allows incoming TCP segments with a destination port matching the given listening ports or with a source port matching the given remote ports. Apart from some idiosyncratic services like FTP, allowing inbound and outbound traffic to certain ports is all that needs to be done for most TCP-based protocols, so tcp-services greatly reduces the need to write custom BPF programs. These functions have the following signatures:

function policylib.{decision_granted,decision_grantedv2,decision_web}(
  policy,          -- the policy object
  tx_rate_kib_sec, -- maximum bandwidth in KiB/s
  cap_expire_sec,  -- policy decision (capability) duration, in seconds
  next_renewal_ms, -- how long until sending a renewal request for this flow, in milliseconds
  renewal_step_ms  -- when sending renewal requests, don't send more than one per this duration, in milliseconds.
)

function policylib.decision_grantedv2_will_full_params(
  program_index,     -- corresponds to the index of the bpf_programs table in gk.lua in the Gatekeeper server.
  policy,            -- the policy object
  tx1_rate_kib_sec,  -- maximum primary bandwidth in KiB/s
  tx2_rate_kib_sec,  -- maximum secondary bandwidth in KiB/s
  cap_expire_sec,    -- policy decision (capability) duration, in seconds
  next_renewal_ms,   -- how long until sending a renewal request for this flow, in milliseconds
  renewal_step_ms,   -- when sending renewal requests, don't send more than one per this duration, in milliseconds.
  direct_if_possible -- whether to enable direct delivery
)

function policylib.decision_declined(
  policy,    -- the policy object
  expire_sec -- policy decision (capability) duration, in seconds
)

function policylib.decision_tcpsrv(
  policy,          -- the policy object
  tx_rate_kib_sec, -- maximum primary bandwidth in KiB/s
  cap_expire_sec,  -- policy decision (capability) duration, in seconds
  next_renewal_ms, -- how long until sending a renewal request for this flow, in milliseconds
  renewal_step_ms, -- when sending renewal requests, don't send more than one per this duration, in milliseconds
  ports            -- the ports object, obtained by calling policylib.tcpsrv_ports(listening_ports, remote_ports)
)

As a practical example, we show below a policy script that is able to perform the following decisions:

  • Grant or decline flows based on their source IPv4 addresses, based on labeled prefixes loaded from an external file;
  • Grant or decline flows based on their destination IPv4 addresses, allowing traffic to a subrange containing web servers;
  • Decline malformed packets;
  • Grant packets not matching the rules above, with limited bandwidth.

We start by requiring the libraries policylib from Gatekeeper and ffi from LuaJIT. Requiring policylib also gives us access to the lpmlib package, which contains functions to manipulate LPM (Longest Prefix Match) tables.

local policylib = require("gatekeeper/policylib")
local ffi = require("ffi")

Next, we define helper functions that represent our policy decisions. These functions take a policy argument, which has type struct ggu_policy, but which can be considered as an opaque object for our purposes, as it's simply forwarded to the functions policylib.decision_grantedv2 or policylib.decision_declined, described above.

-- Decline flows with malformed packets.
local function decline_malformed_packet(policy)
  return policylib.decision_declined(policy, 10)
end

-- Decline flows by policy decision.
local function decline(policy)
  return policylib.decision_declined(policy, 60)
end

-- Grant flow by policy decision.
local function grant(policy)
  return policylib.decision_grantedv2(
    policy,
    3072,   -- tx_rate_kib_sec = 3 MiB/s
    300,    -- cap_expire_sec = 5 minutes
    240000, -- next_renewal_ms = 4 minutes
    3000    -- renewal_step_ms = 3 seconds
  )
end

-- Grant flows destined to web servers by policy decision.
local function grant_web(policy)
  return policylib.decision_web(
    policy,
    3072,   -- tx_rate_kib_sec = 3 MiB/s
    300,    -- cap_expire_sec = 5 minutes
    240000, -- next_renewal_ms = 4 minutes
    3000    -- renewal_step_ms = 3 seconds
  )
end

-- Returns a policy function that grants flows destined to listening_ports or coming from remote_ports.
local function grant_tcpsrv(listening_ports, remote_ports)
  local ports = policylib.tcpsrv_ports(listening_ports, remote_ports)
  return function(policy)
    return policylib.decision_tcpsrv(
      policy,
      3072,   -- tx_rate_kib_sec = 3 MiB/s
      300,    -- cap_expire_sec = 5 minutes
      240000, -- next_renewal_ms = 4 minutes
      3000,   -- renewal_step_ms = 3 seconds
      ports
    )
  end
end

-- Grant flow not matching any policy, with reduced bandwidth.
local function grant_unmatched(policy)
  return policylib.decision_grantedv2(
    policy,
    1024,   -- tx_rate_kib_sec = 1 MiB/s
    300,    -- cap_expire_sec = 5 minutes
    240000, -- next_renewal_ms = 4 minutes
    3000    -- renewal_step_ms = 3 seconds
  )
end

We then define a Lua table that maps its indices to policy decisions. The indices in this table correspond to the label that is associated to a network prefix when inserted in LPM (Longest Prefix Match) tables to be created below. Therefore, when inspecting a packet, we can perform a lookup for its source and/or destination IP addresses in this LPM table, using the returned label to obtain the function that will grant or decline this flow.

In the table below, flows labeled 1 in the LPM table will be declined, while those labeled 2 and 3 will be granted, respectively, by the grantedv2 and web BPF programs. Flows labeled 4 will grant incoming flows to ports 25 (SMTP), 587 (Submission) and 465 (SMTPS), and from port 25 (SMTP), while those labeled 5 will grant incoming flows to port 3306 (MySQL), with no allowed flows from external services. The grant_unmatched function is called statically and therefore is not referenced in the table.

local policy_decision_by_label = {
  [1] = decline,
  [2] = grant,
  [3] = grant_web,
  [4] = grant_tcpsrv({25, 587, 465}, {25}),
  [5] = grant_tcpsrv({3306}, {}),
}

The policy script continues with the definition of the aforementioned LPM tables, with the use of the helper function new_lpm_from_file. The fact that the src_lpm_ipv4 and dst_lpm_ipv4 variables are global (i.e. their declarations do not use the local keyword) is relevant, because it allows them to be accessed by other scripts. This is useful, for example, to update an LPM table, or to print it for inspection.

The new_lpm_from_file function, given below, assumes the input file is in a two-column format, where the first column is a network prefix in CIDR notation, and the second column is its label. It uses functions in the lpmlib package to create and populate the LPM table. Given the policy_decision_by_label table above, the input file containing source addresses ranges should use label 1 for those we want to decline and label 2 for those we want to grant access to. Similarly, the input file containing destination address ranges should attach the label 3 to its prefixes.

src_lpm_ipv4 = new_lpm_from_file("/path/to/lpm/source/addresses/file")
dst_lpm_ipv4 = new_lpm_from_file("/path/to/lpm/destination/addresses/file")

function new_lpm_from_file(path)
  -- Find minimum values for num_rules and num_tbl8s.
  local num_rules = 0
  local num_tbl8s = 0

  local prefixes = {}
  for line in io.lines(path) do
    local prefix, label = string.match(line, "^(%S+)%s+(%d+)$")
    if not prefix or not label then
      error(path .. ": invalid line: " .. line)
    end
    -- Convert string in CIDR notation to IP address and prefix length.
    local ip_addr, prefix_len = lpmlib.str_to_prefix(prefix)
    num_rules = num_rules + 1
    num_tbl8s = num_tbl8s + lpmlib.lpm_add_tbl8s(ip_addr, prefix_len, prefixes)
  end

  -- Adjust parameters.
  local scaling_factor_rules = 2
  local scaling_factor_tbl8s = 2
  num_rules = math.max(1, scaling_factor_rules * num_rules)
  num_tbl8s = math.max(1, scaling_factor_tbl8s * num_tbl8s)

  -- Create and populate LPM table.
  local lpm = lpmlib.new_lpm(num_rules, num_tbl8s)
  for line in io.lines(path) do
    local prefix, label = string.match(line, "^(%S+)%s+(%d+)$")
    if not prefix or not label then
      error(path .. ": invalid line: " .. line)
    end
    -- Convert string in CIDR notation to IP address and prefix length.
    local ip_addr, prefix_len = lpmlib.str_to_prefix(prefix)
    lpmlib.lpm_add(lpm, ip_addr, prefix_len, tonumber(label))
  end

  return lpm
end

Finally, we implement the lookup_policy function. As described above, this is the entry point of the policy script, i.e., the function called by the Grantor server to obtain a policy decision for a given packet.

The function receives two arguments. The first is pkt_info, which is a gt_packet_headers struct, accessible from the policy script via the ffi module. These are the headers of the IP-in-IP encapsulated packet sent from Gatekeeper to Grantor. The second argument is policy, which we will simply pass along to the policy decision functions.

The lookup_policy function starts by checking if the inner packet is an IPv4 packet. In production we have IPv6-specific LPM tables and other policies, but for simplicity, in this example we will just apply the default policy for non-IPv4 traffic. The function then proceeds with an LPM table lookup for the source address of the incoming packet, which, if successful, will return a policy decision function that is then applied. Otherwise, the script attempts to obtain a policy by performing a lookup in the destination addresses LPM table. These two steps are performed by the helper functions lookup_src_lpm_ipv4_policy and lookup_dst_lpm_ipv4_policy, respectively, which are given below. Finally, if no policy is found, we apply the default policy decision function, grant_unmatched.

function lookup_policy(pkt_info, policy)
  if pkt_info.inner_ip_ver ~= policylib.c.IPV4 then
    return grant_unmatched(policy)
  end

  local fn = lookup_src_lpm_ipv4_policy(pkt_info)
  if fn then
    return fn(policy)
  end

  local fn = lookup_dst_lpm_ipv4_policy(pkt_info)
  if fn then
    return fn(policy)
  end

  return grant_unmatched(policy)
end

The lookup_src_lpm_ipv4_policy and lookup_dst_lpm_ipv4_policy functions perform lookups, respectively, on the src_lpm_ipv4 and dst_lpm_ipv4 tables, which were populated with network prefixes loaded from input files, as described above. We use the ffi.cast function to obtain an IPv4 header, so that we can access the packet's source IP address and look it up in the LPM table, with lpmlib.lpm_lookup. This function returns the matching label for the network prefix to which the flow's source address belongs, which will be used to obtain its associated policy decision function via the mapping in the policy_decision_by_label Lua table. Note that lpmlib.lpm_lookup returns a negative number if no match is found, and since the policy_decision_by_label table has no negative indices, the table lookup will return nil, and the lookup_policy function will proceed without performing the code in the then branch of the if statements.

function lookup_src_lpm_ipv4_policy(pkt_info)
  local ipv4_header = ffi.cast("struct rte_ipv4_hdr *", pkt_info.inner_l3_hdr)
  local label = lpmlib.lpm_lookup(src_lpm_ipv4, ipv4_header.src_addr)
  return policy_decision_by_label[label]
end

function lookup_dst_lpm_ipv4_policy(pkt_info)
  local ipv4_header = ffi.cast("struct rte_ipv4_hdr *", pkt_info.inner_l3_hdr)
  local label = lpmlib.lpm_lookup(dst_lpm_ipv4, ipv4_header.dst_addr)
  return policy_decision_by_label[label]
end

Finally, we add four helper functions to the policy script. These functions are not used by the policy itself, but by the dynamic configuration script that keeps the LPM table up to date. The add_src_v4_prefix and add_dst_v4_prefix functions take a prefix string in CIDR format and an integer label and insert them in the appropriate LPM table. The del_src_v4_prefix and del_dst_v4_prefix functions take a prefix string in CIDR format and remove them from the appropriate LPM table.

More details about dynamically updating the LPM table are given below.

function add_src_v4_prefix(prefix, label)
  local ip_addr, prefix_len = lpmlib.str_to_prefix(prefix)
  lpmlib.lpm_add(src_lpm_ipv4, ip_addr, prefix_len, label)
end

function add_dst_v4_prefix(prefix, label)
  local ip_addr, prefix_len = lpmlib.str_to_prefix(prefix)
  lpmlib.lpm_add(dst_lpm_ipv4, ip_addr, prefix_len, label)
end

function del_src_v4_prefix(prefix)
  local ip_addr, prefix_len = lpmlib.str_to_prefix(prefix)
  lpmlib.lpm_del(src_lpm_ipv4, ip_addr, prefix_len)
end

function del_dst_v4_prefix(prefix)
  local ip_addr, prefix_len = lpmlib.str_to_prefix(prefix)
  lpmlib.lpm_del(dst_lpm_ipv4, ip_addr, prefix_len)
end

Updating LPM tables with with drib and gtctl

Fetching IP prefixes

The example policy script given above loads network prefixes and labels from a file. In practice, these prefixes are usually assembled from multiple online sources of unwanted source networks, such as Spamhaus' EDROP or Team Cymru's Bogon prefixes to decline flows whose source address belongs to these prefixes.

These online unwanted prefix lists are continuously updated, and may contain intersecting network blocks, so it makes sense to use a tool designed to fetch, merge and label them automatically, generating a file that can be consumed by the policy script. The Drib tool has been developed with this purpose.

This tool aggregates IP prefixes from configurable online and offline sources and allows each source to be labeled with its own "class", which is just an arbitrary string. Once the prefixes are aggregated, Drib can render a template, feeding it with the prefixes and their respective class. We use the source class configuration in Drib as the label to be associated with a prefix when inserted in the policy's LPM table.

Going back to the policy script, recall the definition of the policy_decision_by_label variable:

local policy_decision_by_label = {
  [1] = decline,
  [2] = grant,
  [3] = grant_web,
  [4] = grant_tcpsrv({25, 587, 465}, {25}),
  [5] = grant_tcpsrv({3306}, {}),
}

This means prefixes labeled 1 will be declined, and those labeled 2-5 will be granted according to the respective BPF programs. Below we show a Drib configuration file, /etc/drib/drib.yaml, that labels network blocks fetched from the EDROP and Bogons lists with a class value of 1. To make the example more complete, we also add a static network block labeled with a class value of 2 as an "office" network from which we always want to accept traffic. Finally, we add static network blocks for web servers with a class value of 3, to which we accept web-related traffic according to the rules in the web BPF program, SMTP servers with a class value of 4, to which we accept SMTP and mail submission traffic, and database servers to which we accept MySQL traffic. Traffic related to the latter two network blocks will be governed by the same BPF program, tcp-services.

Note that Drib supports specifying a group-scoped kind setting, which is a tag shared by all prefixes in a given group. We define the decline and grant groups with kind src for the source address prefixes and the servers group with kind dst for the destination address prefixes, and use the entry.kind field in templates that will generate Lua scripts that manipulate the src_lpm_ipv4 and dst_lpm_ipv4 LPM tables.

log_level: "warn"

bootstrap: {
  input: "/etc/drib/bootstrap.tpl",
  output: "/var/lib/drib/bootstrap_{proto}_{kind}",
}

ipv4: {
  decline: {
    priority: 30,
    kind: "src",

    edrop: {
      remote: {
        url: "https://www.spamhaus.org/drop/edrop.txt",
        check_interval: "12h",
        parser: {ranges: {one_per_line: {comment: ";"}}},
      },
      class: "1",
    },

    fullbogons: {
      remote: {
        url: "https://www.team-cymru.org/Services/Bogons/fullbogons-ipv4.txt",
        check_interval: "1d",
        parser: {ranges: {one_per_line: {comment: "#"}}},
      },
      class: "1",
    },
  },

  grant: {
    priority: 30,
    kind: "src",

    office: {
      range: "100.90.80.0/24",
      class: "2",
    },
  },

  servers: {
    priority: 20,
    kind: "dst",

    web: {
      range: "1.2.3.0/26",
      class: "3",
    },
    smtp: {
      range: "1.2.3.64/27",
      class: "4",
    },
    mysql: {
      range: "1.2.3.96/27",
      class: "5",
    },
  },
}

Given this configuration, the following bootstrap template file, /etc/drib/bootstrap.tpl, is used to generate input files in the format expected by the policy script, that is, a two-column file with a network prefix in CIDR format in the first column, and an integer label in the second one:

{% for entry in ranges -%}
{{entry.range}} {{entry.class}}
{% endfor -%}

A cron job is set up to run the drib aggregate command, which will download the EDROP and Bogon prefixes, merge them, exclude the office network range from the resulting set, and save a serialization of the result in what is called an aggregate file.

We tie everything together by calling the drib bootstrap --no-download command in a systemd override ExecStartPre command. This will make Drib read an existing aggregate file (generated by the aforementioned cron job) and render the above template. When Gateekeeper runs in Grantor mode, it will run the policy script, which will then read the recently-rendered template with the set of prefixes obtained from Drib.

The systemd override can be created with the systemctl edit gatekeeper command in the Grantor servers. Add the following content to the override file:

[Service]
ExecStartPre=/usr/sbin/drib bootstrap --no-download

This ensures the policy script will load up to date data when Gatekeeper starts in Grantor mode.

Updating LPM tables incrementally

The setup described above works well for the generation of an initial (bootstrap) list of prefixes on Gatekeeper startup. However, the EDROP and Bogons lists, as well as similar online unwanted prefix lists, are continually updated, and Gatekeeper's in-memory LPM tables should be kept up to date.

To do this, we use the gtctl tool. This is a tool that is able to parse Drib's aggregate files (generated in the cron job mentioned in the previous section) and compare it to an aggregate file saved from a previous run, generating sets of newly inserted and removed IP addresses. These sets are used as inputs to render policy update scripts, which gtctl then feeds into Gatekeeper via its dynamic configuration mechanism.

The policy update template, /etc/gtctl/policy_update.lua.tpl simply generates calls to the add_src_v4_prefix, add_dst_v4_prefix, del_src_v4_prefix and del_dst_v4_prefix functions defined in the policy script. Note the usage of the entry.kind field in the template so that the appropriate function is called.

local function update_lpm_tables()
{%- for entry in ipv4.remove %}
  del_{{entry.kind}}_v4_prefix("{{entry.range}}")
{%- endfor %}

{%- for entry in ipv4.insert %}
  add_{{entry.kind}}_v4_prefix("{{entry.range}}", {{entry.class}})
{%- endfor %}
end

local dyc = staticlib.c.get_dy_conf()
dylib.update_gt_lua_states_incrementally(dyc.gt, update_lpm_tables, false)

Depending on the number of updates, it might be necessary to create a new LPM table that is able to accommodate the new set of prefixes. For this case, gtctl uses a policy replacement template, /etc/gtctl/policy_replace.lua.tpl, to generate the script:

{{lpm_table}} = nil
collectgarbage()

{{lpm_table}} = {{lpm_table_constructor}}({{params.num_rules}}, {{params.num_tbl8s}})

local function update_lpm_tables()
{%- for entry in ipv4.insert %}
  add_{{entry.kind}}_v4_prefix("{{entry.range}}", {{entry.class}})
{%- endfor %}
end

local dyc = staticlib.c.get_dy_conf()
dylib.update_gt_lua_states_incrementally(dyc.gt, update_lpm_tables, false)

The template above mentions the params variable. This variable is created by gtctl after running a parameters estimation script, /etc/gtctl/lpm_params.lua.tpl, which is also rendered from a template:

require "gatekeeper/staticlib"
require "gatekeeper/policylib"

local dyc = staticlib.c.get_dy_conf()

if dyc.gt == nil then
  return "Gatekeeper: failed to run as Grantor server\n"
end

local function get_lpm_params()
  local lcore = policylib.c.gt_lcore_id()
  local num_rules, num_tbl8s = {{lpm_params_function}}({{lpm_table}})
  return lcore .. ":" .. num_rules .. "," .. num_tbl8s .. "\n"
end

dylib.update_gt_lua_states_incrementally(dyc.gt, get_lpm_params, false)

Given these templates, the gtctl configuration file, /etc/gtctl/gtctl.yaml, which references them, is shown below.

log_level: "warn"
remove_rendered_scripts: true
socket: "/var/run/gatekeeper/dyn_cfg.socket"
state_dir: "/var/lib/gtctl"

replace: {
  input: "/etc/gtctl/policy_replace.lua.tpl",
  output: "/var/lib/gtctl/policy_replace_{proto}_{kind}.{2i}.lua",
  max_ranges_per_file: 1500,
}

update: {
  input: "/etc/gtctl/policy_update.lua.tpl",
  output: "/var/lib/gtctl/policy_update_{proto}_{kind}.{2i}.lua",
  max_ranges_per_file: 1500,
}

lpm: {
  table_format: "{kind}_lpm_{proto}", # for this example's drib.yaml, yields "src_lpm_ipv4" and "dst_lpm_ipv4"

  parameters_script: {
    input: "/etc/gtctl/lpm_params.lua.tpl",
    output: "/var/lib/gtctl/lpm_params_{proto}_{kind}.lua",
  },

  ipv4: {
    lpm_table_constructor: "lpmlib.new_lpm",
    lpm_get_params_function: "lpmlib.lpm_get_paras",
  },

  ipv6: {
    lpm_table_constructor: "lpmlib.new_lpm6",
    lpm_get_params_function: "lpmlib.lpm6_get_paras",
  },
}

The only missing piece is a way to run gtctl once a new aggregate file has been generated by Drib. Our current solution is to rely on our configuration management tool, Puppet, to detect this and trigger the gtctl execution:

file { '/var/lib/gtctl/aggregate.new':
  ensure => 'present',
  source => 'puppet:///drib/aggregate',
  owner  => 'root',
  group  => 'root',
  mode   => '0644',
  notify => Exec['gtctl'],
}

exec { 'gtctl':
  command     => 'gtctl dyncfg -a /var/lib/gtctl/aggregate.new',
  onlyif      => 'systemctl is-active gatekeeper',
  refreshonly => true,
}

Using custom BPF programs

Let's extend the example above with a new range of IPv4 addresses for recursive DNS servers. These are assumed to be for internal use only (i.e. they are used only by other servers, and not open to the Internet), and therefore should accept no external connections. However, in order to be able to perform recursive DNS queries, replies to packets sent to TCP and UDP port 53 must be allowed to reach the server. In other words, the BPF program must accept incoming packets with TCP and UDP source port 53.

Create a new BPF program

We create a dns-recursive.c file with the following changes compared to the web.c file from the Gatekeeper repository.

  1. Include UDP headers: add the udp.h header file to the list of the program's includes near the top of the file:
#include <netinet/udp.h>
  1. Handle UDP traffic: this code grants access to UDP datagrams with source port 53, meaning they are sent by other DNS servers as replies to the queries made by our server. In the switch statement on the ctx->l4_proto field, we add the following case:
case IPPROTO_UDP: {
    struct udphdr *udp_hdr;

    if (ctx->fragmented)
        goto secondary_budget;
    if (unlikely(pkt->l4_len < sizeof(*udp_hdr))) {
        /* Malformed UDP header. */
        return GK_BPF_PKT_RET_DECLINE;
    }
    udp_hdr = rte_pktmbuf_mtod_offset(pkt, struct udphdr *,
           pkt->l2_len + pkt->l3_len);

    /* Authorized external services. */
    switch (ntohs(udp_hdr->uh_sport)) {
    case 53:    /* DNS */
        break;
    default:
        return GK_BPF_PKT_RET_DECLINE;
    }

    goto forward;
}
  1. In the TCP section (below the comment "Only TCP packets from here on") we remove the whole switch statement on the TCP destination port (tcp_hdr->th_dport), replacing it with a switch statement on the TCP source port, analogously to the UDP source port switch described in the previous step.
/* Authorized external services. */
switch (ntohs(tcp_hdr->th_sport)) {
case 53:	/* DNS */
    if (tcp_hdr->syn && !tcp_hdr->ack) {
        /* No listening ports. */
        return GK_BPF_PKT_RET_DECLINE;
    }
    break;

default:
    return GK_BPF_PKT_RET_DECLINE;
}

To compile the program, it is necessary to build Gatekeeper by following the instructions in the Build from Source section of the README. Once Gatekeeper is compiled, run the following command:

$ GATEKEEPER_ROOT=/path/to/gatekeeper/repository
$ clang -O2 -target bpf \
    -I$(GATEKEEPER_ROOT)/include -I$(GATEKEEPER_ROOT)/bpf -Wno-int-to-void-pointer-cast \
    -o dns-recursive.bpf -c dns-recursive.c

Install a new BPF program

The resulting dns-recursive.bpf file must be uploaded to the Gatekeeper server and installed along the other BPF programs, by default in /etc/gatekeeper/bpf. Next, it must be added to the bpf_programs table in the gk.lua file. We add it with an index of 100, as indices below that number are considered to be reserved. In /etc/gatekeeper/gk.lua, the bpf_programs variable will look like this:

local bpf_programs = {
  [0] = "granted.bpf",
  [1] = "declined.bpf",
  [2] = "grantedv2.bpf",
  [3] = "web.bpf",
  [4] = "tcp-services.bpf",
  -- Add the line below:
  [100] = "dns-recursive.bpf",
}

The new BPF program will be loaded when Gatekeeper is restarted, but it is possible to load it dynamically using gkctl. Create the following Lua script in a file named insert-bpf-program.lua:

require "gatekeeper/staticlib"

local dyc = staticlib.c.get_dy_conf()

local path = "/etc/gatekeeper/bpf/dns-recursive.bpf"
local index = 100
local ret = dylib.c.gk_load_bpf_flow_handler(dyc.gk, index, path, true)
if ret < 0 then
  return "gk: failed to load BPF program " .. path .. " (" .. index .. ") in runtime"
end

return "gk: done"

Then load it into a running Gatekeeper instance with gkctl:

# gkctl insert-bpf-program.lua

Update the policy script

Now that we have a new BPF program installed in the Gatekeeper server, we can adapt our policy to use it. First, add the grant_dns function:

local function grant_dns(policy)
  return policylib.decision_grantedv2_will_full_params(
    100,    -- dns-recursive.bpf index in bpf_programs in gk.lua
    policy,
    10240,  -- primary bandwidth limit = 10 MiB/s
    512,    -- secondary bandwidth limit (5% of primary bandwidth)
    300,    -- cap_expire_sec = 5 minutes
    240000, -- next_renewal_ms = 4 minutes
    3000,   -- renewal_step_ms = 3 seconds
    true    -- direct_if_possible
  )
end

Next, add this function to the policy_decision_by_label table so that is looks like this:

local policy_decision_by_label = {
  [1] = decline,
  [2] = grant,
  [3] = grant_web,
  [4] = grant_tcpsrv({25, 587, 465}, {25}),
  [5] = grant_tcpsrv({3306}, {}),
  -- Add the line below:
  [6] = grant_dns,
}

Install the new policy

Copy the new policy.lua file to the Grantor servers, replacing the previous one in /etc/gatekeeper/policy.lua. The new policy will be read when the gatekeeper service is restarted on the Grantor server, but we can also use gkctl to reload it on a running server. If Gatekeeper was installed using the provided Debian packages, the script /usr/share/gatekeeper/reload_policy.lua should be available in the Grantor server. Otherwise, it can be found in the gkctl/scripts directory in the Gatekeeper repository. Simply run the command below.

# gkctl /usr/share/gatekeeper/reload_policy.lua

Feed the new IPv4 range to the Grantor server

If Drib is being used to manage IP address ranges, add the recursive DNS IPv4 range to the servers block in /etc/drib/drib.yaml. Note the use of class 6 to match the index added to the policy_decision_by_label variable in the policy script.

servers: {
  # ...
  dns: {
    range: "1.2.3.64/29",
    class: "6",
  },
},

Complete custom BPF program

For completeness' sake, the full source code for the dns-recursive.c program can be found below.

#include <net/ethernet.h>
#include <netinet/tcp.h>
#include <netinet/udp.h>

#include "grantedv2.h"
#include "libicmp.h"

SEC("init") uint64_t
dns_init(struct gk_bpf_init_ctx *ctx)
{
	return grantedv2_init_inline(ctx);
}

SEC("pkt") uint64_t
dns_pkt(struct gk_bpf_pkt_ctx *ctx)
{
	struct grantedv2_state *state =
		(struct grantedv2_state *)pkt_ctx_to_cookie(ctx);
	struct rte_mbuf *pkt = pkt_ctx_to_pkt(ctx);
	uint32_t pkt_len = pkt->pkt_len;
	struct tcphdr *tcp_hdr;
	uint64_t ret = grantedv2_pkt_begin(ctx, state, pkt_len);

	if (ret != GK_BPF_PKT_RET_FORWARD) {
		/* Primary budget exceeded. */
		return ret;
	}

	/* Allowed L4 protocols. */
	switch (ctx->l4_proto) {
	case IPPROTO_ICMP:
		ret = check_icmp(ctx, pkt);
		if (ret != GK_BPF_PKT_RET_FORWARD)
			return ret;
		goto secondary_budget;

	case IPPROTO_ICMPV6:
		ret = check_icmp6(ctx, pkt);
		if (ret != GK_BPF_PKT_RET_FORWARD)
			return ret;
		goto secondary_budget;

	case IPPROTO_UDP: {
		struct udphdr *udp_hdr;

		if (ctx->fragmented)
			goto secondary_budget;
		if (unlikely(pkt->l4_len < sizeof(*udp_hdr))) {
			/* Malformed UDP header. */
			return GK_BPF_PKT_RET_DECLINE;
		}
		udp_hdr = rte_pktmbuf_mtod_offset(pkt, struct udphdr *,
		       pkt->l2_len + pkt->l3_len);

		/* Authorized external services. */
		switch (ntohs(udp_hdr->uh_sport)) {
		case 53:	/* DNS */
			break;
		default:
			return GK_BPF_PKT_RET_DECLINE;
		}

		goto forward;
	}

	case IPPROTO_TCP:
		break;

	default:
		return GK_BPF_PKT_RET_DECLINE;
	}

	/*
	 * Only TCP packets from here on.
	 */

	if (ctx->fragmented)
		goto secondary_budget;
	if (unlikely(pkt->l4_len < sizeof(*tcp_hdr))) {
		/* Malformed TCP header. */
		return GK_BPF_PKT_RET_DECLINE;
	}
	tcp_hdr = rte_pktmbuf_mtod_offset(pkt, struct tcphdr *,
	       pkt->l2_len + pkt->l3_len);

	/* Authorized external services. */
	switch (ntohs(tcp_hdr->th_sport)) {
	case 53:	/* DNS */
		if (tcp_hdr->syn && !tcp_hdr->ack) {
			/* No listening ports. */
			return GK_BPF_PKT_RET_DECLINE;
		}
		break;

	default:
		return GK_BPF_PKT_RET_DECLINE;
	}

	goto forward;

secondary_budget:
	ret = grantedv2_pkt_test_2nd_limit(state, pkt_len);
	if (ret != GK_BPF_PKT_RET_FORWARD)
		return ret;
forward:
	return grantedv2_pkt_end(ctx, state);
}

Exporting logs to InfluxDB with gkle

Gatekeeper has a log exporter that aggregates and exports Gatekeeper's logs to an InfluxDB instance, allowing for traffic information observability, for example via a Chronograf dashboard.

The log exporter can be installed via a Debian package availble in the project's releases page. Edit the exporter configuration file with your InfluxDB credentials and then start the gkle service with systemctl.

In the log exporter repository, a Chronograf dashboard example can be found. It can be imported into Chronograf and it will provide a few graphs of the aggregated data collected from Gatekeeper.

The following log statistics are available:

  • tot_pkts_num: total number of packets;
  • tot_pkts_size: total size of packets;
  • pkts_num_granted: number of granted packets;
  • pkts_size_granted: size of granted packets;
  • pkts_num_request: number of request packets;
  • pkts_size_request: size of request packets;
  • pkts_num_declined: number of declined packets;
  • pkts_size_declined: size of declined packets;
  • tot_pkts_num_dropped: total number of dropped packets;
  • tot_pkts_size_dropped: total size of dropped packets;
  • tot_pkts_num_distributed: total number of distributed packets;
  • tot_pkts_size_distributed: total size of distributed packets.
  • flow_table_occupancy: percentage of the flow table in use.

Example Bird configuration

This section contains a sample Bird configuration file for the Gatekeeper server. It sets up BGP sessions with the uplink provider and with the router as described in the Introduction.

In this section we use ASNs 64502 for the uplink provider and 64501 for the AS where the Gatekeeper server is running.

log syslog { info, warning, error, auth, fatal, bug };

router id 10.1.0.1;

# The order in which the files are loaded is important.

protocol kernel kernel4 {
  ipv4 {
    export all;
  };
}

protocol kernel kernel6 {
  ipv6 {
    export all;
  };
}

protocol device {
  scan time 10;
}

ipv4 table t_bgp4;
ipv6 table t_bgp6;

# Send routes learnt via BGP to the kernel through the master table.

protocol pipe bgp_into_master4 {
  table master4;
  peer table t_bgp4;
  import all;  # from table t_bgp into table master
  export none; # from table master into table t_bgp
}

protocol pipe bgp_into_master6 {
  table master6;
  peer table t_bgp6;
  import all;  # from table t_bgp6 into table master
  export none; # from table master into table t_bgp6
}

#
# Functions used by filters, below.
#

function is_ipv4_martian()
prefix set martians;
{
  martians = [
    0.0.0.0/8+,
    10.0.0.0/8+,
    100.64.0.0/10+,
    127.0.0.0/8+,
    169.254.0.0/16+,
    172.16.0.0/12+,
    192.0.0.0/24+,
    192.0.2.0/24+,
    192.168.0.0/16+,
    198.18.0.0/15+,
    198.51.100.0/24+,
    203.0.113.0/24+,
    224.0.0.0/3+,
    255.255.255.255/32+
  ];
  return net ~ martians;
}

function ipv4_prefix_can_be_imported(prefix set my_prefixes; int peer_asn)
{
  if net = 0.0.0.0/0 then {
    print "rejecting ipv4 default route via BGP";
    return false;
  }
  if is_ipv4_martian() then {
    printn "ipv4 martian prefix: ";
    print net;
    return false;
  }
  if net ~ my_prefixes then {
    printn "ipv4 prefix is owned by me: ";
    print net;
    return false;
  }
  if peer_asn > 0 && bgp_path.first != peer_asn then {
    printn "next hop is not the BGP neighbor (";
    printn bgp_path.first;
    printn " is not ";
    printn peer_asn;
    printn "): ";
    print net;
    return false;
  }
  return true;
}

function ipv4_prefix_can_be_exported(prefix set my_prefixes)
{
  return net ~ my_prefixes;
}

function is_ipv6_martian()
prefix set martians;
{
  martians = [
    ::/128,
    ::1/128,
    ::ffff:0:0/96+,
    100::/64+,
    2001::/23,
    2001:2::/48+,
    2001:10::/28+,
    2001:db8::/32+,
    2002::/17+,
    3ffe::/16+,
    5f00::/8+,
    fc00::/7,
    fe80::/10,
    ff00::/8+
  ];
  return net ~ martians;
}

function ipv6_prefix_can_be_imported(prefix set my_prefixes; int peer_asn)
{
  if net = ::/0 then {
    print "rejecting ipv6 default route via BGP";
    return false;
  }
  if is_ipv6_martian() then {
    printn "ipv6 martian prefix: ";
    print net;
    return false;
  }
  if net ~ my_prefixes then {
    printn "ipv4 prefix is owned by me: ";
    print net;
    return false;
  }
  if peer_asn > 0 && bgp_path.first != peer_asn then {
    printn "next hop is not the BGP neighbor (";
    printn bgp_path.first;
    printn " is not ";
    printn peer_asn;
    printn "): ";
    print net;
    return false;
  }
  return true;
}

function ipv6_prefix_can_be_exported(prefix set my_prefixes)
{
  return net ~ my_prefixes;
}

#
# Uplink provider IPv4 BGP session
#
ipv4 table t_uplink_ipv4;

# Prefixes we want to export to uplink_ipv4.
protocol static export_to_uplink_ipv4 {
  ipv4 {
    table t_uplink_ipv4;
    import all;
  };
  # send routes to table t_uplink_ipv4.
  route 1.2.3.0/24 reject;
}

# Import filters.
filter uplink_ipv4_in
int peer_asn;
prefix set my_prefixes;
{
  peer_asn = 64502;
  my_prefixes = [
    1.2.3.0/24+
  ];
  if ipv4_prefix_can_be_imported(my_prefixes, peer_asn) then
    accept;
  reject "prefix cannot be imported";
}

# Export filters.
filter uplink_ipv4_out
prefix set my_prefixes;
{
  my_prefixes = [
    1.2.3.0/24
  ];
  if ipv4_prefix_can_be_exported(my_prefixes) then {
    accept;
  }
  printn "configuration error: tried to export prefix ";
  print net;
  reject;
}

# The BGP session.
protocol bgp uplink_ipv4 {
  description "uplink_ipv4";
  local as 64501;
  neighbor 10.1.0.2 as 64502;
  source address 10.1.0.1;
  ipv4 {
    table t_uplink_ipv4;
    igp table master4;
    import filter uplink_ipv4_in;
    export filter uplink_ipv4_out;
  };
}

# Send all routes learnt in the BGP session above to the central bgp table.
protocol pipe uplink_ipv4_into_bgp {
  table t_bgp4;
  peer table t_uplink_ipv4;
  import where proto = "uplink_ipv4";
  export none;
}

#
# Uplink provider IPv6 BGP session
#
ipv6 table t_uplink_ipv6;

# Prefixes we want to export to uplink_ipv6.
protocol static export_to_uplink_ipv6 {
  ipv6 {
    table t_uplink_ipv6;
    import all;
  };
  # send routes to table t_uplink_ipv6.
  route 2001:db8:123::/48 reject;
}

# Import filters.
filter uplink_ipv6_in
int peer_asn;
prefix set my_prefixes;
{
  peer_asn = 64502;
  my_prefixes = [
    2001:db8:123::/48+
  ];
  if ipv6_prefix_can_be_imported(my_prefixes, peer_asn) then
    accept;
  reject "prefix cannot be imported";
}

# Export filters.
filter uplink_ipv6_out
prefix set my_prefixes;
{
  my_prefixes = [
    2001:db8:123::/48
  ];
  if ipv6_prefix_can_be_exported(my_prefixes) then {
    accept;
  }
  printn "configuration error: tried to export prefix ";
  print net;
  reject;
}

# The BGP session.
protocol bgp uplink_ipv6 {
  description "uplink_ipv6";
  local as 64501;
  neighbor 2001:db8:1::2 as 64502;
  source address 2001:db8:1::1;
  ipv6 {
    table t_uplink_ipv6;
    igp table master6;
    import filter uplink_ipv6_in;
    export filter uplink_ipv6_out;
  };
}

# Send all routes learnt in the BGP session above to the central bgp table.
protocol pipe uplink_ipv6_into_bgp {
  table t_bgp6;
  peer table t_uplink_ipv6;
  import where proto = "uplink_ipv6";
  export none;
}

#
# iBGP IPv4 session with our router
#
protocol bgp router_ipv4 {
  description "router_ipv4";
  direct;
  local as 64501;
  neighbor 10.2.0.2 as 64501;
  source address 10.2.0.1;
  ipv4 {
    table t_bgp4;
    igp table master4;
    next hop self;
    import none;
    export all;
  };
}

#
# iBGP IPv6 session with our router
#
protocol bgp router_ipv6 {
  description "router_ipv6";
  direct;
  local as 64501;
  neighbor fd00:2::2 as 64501;
  source address fd00:2::1;
  ipv6 {
    table t_bgp6;
    igp table master6;
    next hop self;
    import none;
    export all;
  };
}
Clone this wiki locally