### statistical packet distribution with iptables

So, Linux iptables has a couple of modules which allow you to distribute traffic across multiple hosts. But there isn’t any good documentation I can find which *correctly* explains how to use them. I figured it out, so I’m going to share. :)

Say you have four systems, and you want to distribute traffic evenly among them in a round-robin fashion. New connection 1 goes to server 1, new connection 2 goes to server 2, and so on. You can use the “nth” module for that. I want to do this with CFEngine, so I’ll use incoming port 5308, and will create a new chain to keep my rules somewhat clean:

sudo iptables -t nat -N CFENGINE

sudo iptables -t nat -F CFENGINE

sudo iptables -t nat -A CFENGINE -p tcp \

-m conntrack --ctstate NEW \

-m statistic --mode nth --every 4 \

-j DNAT --to-destination server1:5308

sudo iptables -t nat -A CFENGINE -p tcp \

-m conntrack --ctstate NEW \

-m statistic --mode nth --every 3 \

-j DNAT --to-destination server2:5308

sudo iptables -t nat -A CFENGINE -p tcp \

-m conntrack --ctstate NEW \

-m statistic --mode nth --every 2 \

-j DNAT --to-destination server3:5308

sudo iptables -t nat -A CFENGINE -p tcp \

-m conntrack --ctstate NEW \

-m statistic --mode nth --every 1 \

-j DNAT --to-destination server4:5308

sudo iptables -A PREROUTING -p tcp \

-m tcp --dport 5308 -j CFENGINE

How does that work? Connections coming in with port 5308 all go to the CFEngine chain. Every fourth packet (25%) will get redirected to server1. The other three packets will pass through that rule. So, you have 3/4 of the traffic hitting the next rule. Of those, every third packet will go to server 2, and the other two will pass on through. Of those two packets going through, one out of every two goes to server 3, and the other passes through. Then, finally, 1 out of every one packets (or “all of them”) goes to server 4. That last rule doesn’t need the “nth” thing, but I like to leave it in there for consistency.

You’ll see some documentation online showing “–every 4” on all four lines, but that doesn’t work; each rule is processing the remainder of packets which pass through the rules above, so you would end up with 25% of the traffic going to the first server, then 25% of 75% (about 19%) going to the second rule, etc. In current Linux kernels, the rules don’t share a counter – and they haven’t for years.

Those rules work in round-robin fashion. If you want to do something more complex with an uneven distribution, you can either find a common demoninator and alternate hosts, or use the probability module. With that one, the rule uses a random number for each packet which will match some percentage of the time. So, say you have the same four servers. You want servers 1, 2, and 3 to get about 30% of the incoming new traffic each, and server 4 should only get about 10%:

`sudo iptables -t nat -N CFENGINE`

sudo iptables -t nat -F CFENGINE

sudo iptables -t nat -A CFENGINE -p tcp \

-m conntrack --ctstate NEW \

-m statistic --mode random --probability 0.3000 \

-j DNAT --to-destination server1:5308

sudo iptables -t nat -A CFENGINE -p tcp \

-m conntrack --ctstate NEW \

-m statistic --mode random --probability 0.4286 \

-j DNAT --to-destination server2:5308

sudo iptables -t nat -A CFENGINE -p tcp \

-m conntrack --ctstate NEW \

-m statistic --mode random --probability 0.7500 \

-j DNAT --to-destination server3:5308

sudo iptables -t nat -A CFENGINE -p tcp \

-m conntrack --ctstate NEW \

-j DNAT --to-destination server4:5308

sudo iptables -A PREROUTING -p tcp \

-m tcp --dport 5308 -j CFENGINE

So, what’s the deal with those numbers? Well, it’s the same thing as the other one. Server 1 gets 30% of the traffic, so the probability should be 0.3 – which is 30/100. Then 70% passes through. So, server 2 only sees 70% of the traffic. So, it needs to match 30% *of 70%* – which is 30/70, or 0.428571. It’s matching about 43% of the remainder, which is 30% of the total. For server three, 60% of the traffic is gone now, so it gets 30/40, which is 0.75; in other words, 75% of the remainder is 30% of the total after 60% is gone. After that, we should have about 10% left, and we want to match 100% of the remainder. Even though I like consistency, I don’t want to allow any traffic to pass this chain (which would mean processing would go back to the prerouting chain, surprising everyone). So, I didn’t use a probability module on the last rule; it just always matches if traffic gets that far.

Basically, the value used is “percentage of whole / (100 – sum of previously matched percentages)”, which should be fairly easy to implement in any kind of automation script.

Even distribution would be easily attained by counting the target servers, and either doing “nth” rules while decrementing the number of servers as rules are added, or by dividing 100 by the total number of servers and using that as the target percentage.

Oh, you’ll also want to enable ip forwarding with sysctl, and you’ll likely need to use something like this

`sudo iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE`

in order to make sure that the servers send responses back through you rather than directly connecting to the client.