The Asian Tux: May 2013

Monday, May 27, 2013

Traffic control in Linux: classifiend and prioritizing traffic 1/2

In Linux we can use the tool tc (traffic control) to manage the traffic and provide some QoS. In this example, we are going to classify traffic according to the following:

10.65.18.0/24 will have priority 100 and classification as 1:10
10.65.20.0/24 will have priority 50 and classification as 1:20
SSH traffic will have a priority of 10 and classified as 1:2

To classify networks we can use the route classifier on tc. To classify traffic depending on it's packets, protocol or ports we can use the u32 classifier.

First thing, we will prepare the interface we want to use. There are 3 different classful qdiscs (HTB, CBQ, PRIO). In our example we will use HTB on eth0:

$ sudo tc qdisc add dev eth0 root handle 1:0 htb

We add a qdisc (queuing discipline) to eth0, handling the top of the classifier chain 1:0 using the classful disc htb.

Now, we are going to specify that traffic to 10.65.18.0/24 will be classified as 1:10 with priority 100:

$ sudo tc filter add dev eth0 parent 1:0 protocol ip prio \ 100 route to 10 classid 1:10
$ sudo ip route add 10.65.18.0/24 via 10.65.17.1 dev \ eth0 realm 10

We are adding a filter to eth0, specifying the ip protocol and route to realm 10 using the class id 1:10. After that, we need to create the realm 10 with ip route.

Now, we will do the same for the next network with priority 50:

$ sudo tc filter add dev eth0 parent 1:0 protocol ip prio \ 50 route to 20 classid 1:20
$ sudo ip route add 10.65.20.0/24 via 10.65.17.1 dev \ eth0 realm 20

Now we will classify and prioritize the ssh traffic. With u32 we can specify attributes from the packets as the documentation states, but we will make it simple specifying destination port and protocol number (TCP is protocol 0x6):

$ sudo tc filter add dev eth0 parent 1:0 prio 10 u32 \
        match tcp dst 22 0xffff \
        match ip protocol 0x6 0xff \
        flowid 1:2

Next post will give information about how to limit traffic bandwidth for each class.

For more information you can visit The Linux Documentation Project and Linux Advanced Routing & Traffic Control.

Sunday, May 19, 2013

Bonding interfaces in Debian

Bonding interfaces allows us to team multiple network interfaces into a single one. We have several options with the bonding linux driver (for some of them we might need to configure the network ports to use lacp in the switch):

Balance-rr (mode 0)
Active-backup (mode 1)
Balance-xor (mode 2)
Broadcast (mode 3)
802.3ad (mode4)
Balance-tlb (mode 5)
Balance-alb (mode 6)

A full description of each mode is available at kernel.org website.

For this example we will use a basic mode 1, using eth0 and eth1 to generate bond0. First, we create the module configuration we need in the folder /etc/modprobe.d:

# vim bonding.conf
alias netdev-bond0 bonding
options bond0 miimon=100 mode=1

Note that miimon parameter is the link monitoring frequency in milliseconds (how often the link will be inspected to check if its failing or its busy).

When we create a bonding, we need to put the teamed interfaces in slave mode. For making this easy we will install the package ifenslave:

$ sudo apt-get install ifenslave

This package will generate some scripts in the /etc/network/if-up.d and /etc/network/if-pre-up.d directories that will configure the slaves to serve the master (bonding interface).

Now, we configure the file /etc/network/interfaces. We comment the entries for our defined interfaces and specify the configuration for the bond0 interfaces:

$ sudo vim /etc/network/interfaces

auto lo
iface lo inet loopback

# The primary network interface
#allow-hotplug eth0
#iface eth0 inet dhcp
auto bond0
iface bond0 inet dhcp
        slaves eth0 eth1
        bond-mode 1

        bond-miimon 100

In this case we have duplicated the information already passed to the module. If we need multiple bondX interfaces, we need to declare the configuration in this file.

Now we restart the networking service and we get our interface working:

$ sudo service networking stop
$ sudo service networking start

Let's check:

$ /sbin/ifconfig bond0
bond0     Link encap:Ethernet HWaddr 08:00:27:e1:89:77
          inet addr:10.65.17.158 Bcast:10.65.17.255 Mask:255.255.255.0
          inet6 addr: fe80::a00:27ff:fee1:8977/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
          RX packets:27926 errors:0 dropped:0 overruns:0 frame:0
          TX packets:599 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:2230068 (2.1 MiB) TX bytes:40408 (39.4 KiB)

If we have issues getting up the bond0 interface, it could be that the slaves are not well defined or not set as slaves. We can check with mii-tool:

# mii-tool bond0
bond0: 10 Mbit, half duplex, link ok

If we don't get a message similar to the one above and similar to "No Mii transceivers found" most likely there's an issue with the slaves.

For a full list of options available have a look at the kernel documentation for bonding.

Tuesday, May 14, 2013

Network load sharing using multiple interfaces in Linux

One way to share the load between two or more interfaces is trough the traffic control settings. This is not to be confused with high availability, as if you put down one of the cards it might take a while to adjust itself - I rather use bonding for that, to be explained in a future post.

Let's put the case we have a host Client1, with eth0 10.65.17.158 and eth1 10.65.17.118. We want to unify these two cards to work as one.

First, we need to load the module sch_teql using the command:

# modprobe sch_teql

Now we will add both interfaces to the TEQL device:

# tc qdisc add dev eth0 root teql0
# tc qdisc add dev eth1 root teql0

Now we turn up the teql0 device, and give it a valid ip:

# ip link set dev teql0 up
# ip addr add dev teql0 10.65.17.154/24

Now we can use the new IP 10.65.17.154 as a load share device between eth0 and eth1. Packets will arrive to the interfaces using a destination IP other than it's own, so would be discarded. To avoid this we can disable the rp_filter on each device:

# echo 0 > /proc/sys/net/ipv4/conf/eth0/rp_filter
# echo 0 > /proc/sys/net/ipv4/conf/eth1/rp_filter

That's all. For more info you can visit the official website for Linux Advanced Routing & Traffic Control

Monday, May 13, 2013

Removing file name spaces using $IFS Bash shell variable.

At some point all Linux users need to deal with file name spaces, specially with movies or pictures. Using the shell variable IFS we can easily remove those and solve the issue. In this example, assuming I have all my pictures in the current folder, I will remove the space of my pictures in one :

$ ls -l *jpg
-rw-r--r-- 1 amartin amartin   22175 May 10 09:08 foto 1.jpg
-rw-r--r-- 1 amartin amartin   22175 May 10 09:08 foto 2.jpg
-rw-r--r-- 1 amartin amartin   22175 May 10 09:08 foto 3.jpg

$ IFS=$(echo -en "\n\t");for file in `ls *jpg`; do newname=`echo $file | sed -e 's/ //g'`; cp "$file" $newname; done

$ ls -ltr *jpg
-rw-r--r-- 1 amartin amartin   22175 May 10 09:08 foto 1.jpg
-rw-r--r-- 1 amartin amartin   22175 May 10 09:08 foto 2.jpg
-rw-r--r-- 1 amartin amartin   22175 May 10 09:08 foto 3.jpg
-rw-r--r-- 1 amartin amartin   22175 May 10 10:07 foto1.jpg
-rw-r--r-- 1 amartin amartin   22175 May 10 10:07 foto2.jpg
-rw-r--r-- 1 amartin amartin   22175 May 10 10:07 foto3.jpg

If we would like a dash '-' or any other character we just need to replace the sed command for sed -e 's/ /-/g'.

The IFS variable defaults to space, tab and new line in Debian:

$ echo "$IFS" | cat -TE
^I$
$

Or the equivalent:

$ IFS=$(echo -en " \n\t")

We just removed the space in between the " and \n from the IFS (echo -en "_\n\t") to make this example work.

Thursday, May 9, 2013

Routing and traffic control in Linux: using MARKed packets

On our Linux gateway box we can establish different routes for different hosts or nets using ip route. If we would like to do this, depending for example in the machine originating the traffic, a neat way to do it would be using netfilter, marking the packets we want to route.

First, to use this we need these kernel options (enabled by default on Debian):

IP: advanced router (CONFIG_IP_ADVANCED_ROUTER)
IP: policy routing (CONFIG_IP_MULTIPLE_TABLES)
IP: use netfilter MARK value as routing key (CONFIG_IP_ROUTE_FWMARK)

Now, let's say our gateway is 10.65.17.153. We have two internet providers, one very fast and one slow. We have a user (10.65.17.8) who we want to access a specific website (106.10.165.51) via the slow link.

First, we will mark the packets from our user. We will use the mark '1' using the mangle table:

# iptables -A PREROUTING -t mangle -s 10.65.17.8 -d 106.10.165.51 -j MARK --set-mark 1

Now we need to add an action for marked packets. Let's have a look to the default rt_tables:

# cat /etc/iproute2/rt_tables
#
# reserved values
#
255     local
254     main
253     default
0       unspec
#
# local
#
#1      inr.ruhep

We are going to add a table starhub.link with table number 20 :

# echo 20 starhub.link >> /etc/iproute2/rt_tables

Now we associate table 20 with the Mark 1:

# ip rule add fwmark 1 table starhub.link

Last, we specify what does the table 20 (AKA starhub.link) do:

# ip route add default via 10.65.17.253 table starhub.link

That's it. Now our client will use Starhub link instead of the default one. The route appears in our system as follows:

# ip route ls
default via 10.65.17.1 dev eth0 proto static
10.65.17.0/24 dev eth0 proto kernel scope link src 10.65.17.153

Done. Thanks to the flexibility of IPTABLES we can make a lot of uses from this.

Tuesday, May 7, 2013

Linux peformance tips: disk i/o scheduler

The disk i/o scheduler is the method that Linux uses to describe how data will be submitted to the storage devices. They apply to disk devices and not partitions.

On Linux, the main algorithms are these three:

CFQ (Completely fair Queuing)
Noop
Deadline

There is another popular scheduler called Anticipatory, but as from kernel version 2.6.33 it has been removed.

CFQ places synchronous requests on per-process queues and allocates time slices for each one of the queues to access the disk. Length of each time slice depends on the process priority, and allows process to idle at the end of the I/O call in anticipation of a close-by read request (another read on the same sector). You can use ionice to give priorities.

Tuning parameters can be given at /sys/block/<device>/queue/iosched/slice_idle, /sys/block/<device>/queue/iosched/quantum and /sys/block/<device>/queue/iosched/low_latency.

Noop operates as a simple FIFO queue, first in first out.

Deadline imposes a deadline to all requests, to prevent processes to "hang" waiting for disk. In addition of the read and write queues, it does maintain two deadline queues (one for read, one for write), so the deadline scheduler will check if the requests have expired in the deadline queues. Read queues have higher priority.

Tuning parameters can be given at /sys/block/<device>/queue/iosched/writes_starved, /sys/block/<device>/queue/iosched/read_expire and /sys/block/<device>/queue/iosched/write_expire.

To check what scheduler we are using we can query the block device (sda in my case):

$ cat /sys/block/sda/queue/scheduler
noop deadline [cfq]

I'm using CFQ. To change it to noop or deadline, we can insert the desired scheduler into the same file we use to query:

(as root)

# echo deadline > /sys/block/sda/queue/scheduler
# cat /sys/block/sda/queue/scheduler
noop [deadline] cfq

For any other device the route would be /sys/block/<device name>/queue/scheduler. You can change it anytime, without crashing the system.

Which one is better? it depends on your environment. If you have a proper process priority scheme in your server, CFQ could be the best. For backup servers with low performance disks, deadline worked pretty fine for me in the past.

To put some numbers, on my desktop using default scheduler, dd operations timing to write 1 GB are as follows:

$ dd if=/dev/zero of=tmp1 bs=512 count=2000000

CFQ 10.8294 s, 94.6 MB/s
Deadline 9.90455 s, 103 MB/s
Noop 10.0025 s, 102 MB/s

Reading + writing :

$ dd if=tmp1 of=tmp2

CFQ 26.3413 s, 38.9 MB/s
Deadline 30.449 s, 33.6 MB/s
Noop 28.9345 s, 35.4 MB/s

Reading, deadline is ahead due to the prioritized read queue - however not so far from a FIFO algorithm. Read + Write makes CFQ more advantageous not because the algorithm itself (as I have not given high priority to the dd command) but just as result of performance degradation of deadline and noop.

Thursday, May 2, 2013

Adding a static arp entry in Windows 7

Following up on a previous article where I explained how to add a static ARP entry on Linux, trying to do the same on Windows 7 you would get an error message Access denied. This is how we would do it on Windows 7:

If, for example, we are using a network connection, first we execute ipconfig on a cmd box to list the interface name:

We see that our Wireless network is named Wireless Network Connection, also that the gateway has the IP address 10.43.1.1. From a cmd command prompt we list the current arp table:

We have identified the mac address as 00-30-48-99-de-97. Now, from an elevated cmd prompt we execute the following netsh command to add the static entry:

netsh.exe interface ipv4 add neighbors "Wireless Network Connection" 10.43.1.1 00-30-48-99-de-97

That's all.