Tuesday, June 25, 2013

Modify CPU affinity in Linux

For a long time already, Linux is able to use multi core CPUs. The ability to delegate processes in some or all of them is called CPU Affinity.

By default, unless there's a compatibility issue, Linux will use all your available processors. If we would like to modify that policy we can use a tool called taskset. Let's install it on our system:

$ sudo apt-get install taskset

Now let's see how many CPUs we have (you should already know tho :)):

$ cat /proc/cpuinfo

or, execute the command top and then press 1. This will break down the CPU status:

top - 17:52:13 up  9:00, 21 users,  load average: 1.75, 1.77, 1.71
Tasks: 210 total,   3 running, 207 sleeping,   0 stopped,   0 zombie
%Cpu0  : 33.1 us,  7.8 sy,  0.0 ni, 58.1 id,  0.7 wa,  0.0 hi,  0.3 si,  0.0 st
%Cpu1  : 13.1 us, 16.4 sy,  0.0 ni, 68.1 id,  2.3 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu2  : 15.7 us,  6.8 sy,  0.0 ni, 77.5 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu3  :  9.4 us, 26.2 sy,  0.0 ni, 62.1 id,  2.0 wa,  0.0 hi,  0.3 si,  0.0 st
KiB Mem:   8188992 total,  7601696 used,   587296 free,    64868 buffers
KiB Swap: 15624188 total,     1320 used, 15622868 free,  2751376 cached

In our example we have 4 CPUs. Let's run the kaffeine media player on the two first ones:

taskset 03 kaffeine

The hex mask works as the man page details:
0x00000001 (01) for the #1
0x00000002 (02) for the #2
0x00000003 (03) for #1 and #2
0x00000004 (04) for #3

and so on. An 'f' mask would mean system managed on all the processors.

We can also specify the processor instead of the mask:

taskset -c 3 <command to execute>

Let's check what affinity the process have:

$ taskset -p 700
pid 700's current affinity mask: 3

We can modify it:

$ taskset -p 03 700

If we want to assign a processor range:

$ taskset -pc 3-4 700
pid 700's current affinity list: 0,1
pid 700's new affinity list: 2,3

Are we using all the CPUs for the system? an easy way to do it is checking the /proc/interrupts file:

$ cat /proc/interrupts 
           CPU0       CPU1       CPU2       CPU3       
  0:         77         34         19         22   IO-APIC-edge      timer
  1:          8          7         11          9   IO-APIC-edge      i8042
  8:          0          0          0          1   IO-APIC-edge      rtc0
  9:      10052       9947      10066       9977   IO-APIC-fasteoi   acpi
 10:      64476      64276      64381      64603   IO-APIC-edge      ite-cir
 12:        348        303        347        327   IO-APIC-edge      i8042
 16:         33         37         36         37   IO-APIC-fasteoi   mmc0, ehci_hcd:usb1
 17:          0          0          0          0   IO-APIC-fasteoi   brcmsmac
 18:          0          0          0          0   IO-APIC-fasteoi   ips
 19:        117        152        120        130   IO-APIC-fasteoi   firewire_ohci
 23:    8770094    8769570    8768178    8770493   IO-APIC-fasteoi   ehci_hcd:usb2
 41:     300133     300414     300139     300056   PCI-MSI-edge      ahci
 42:         56         58         57         59   PCI-MSI-edge      snd_hda_intel
 43:         25         26         24         25   PCI-MSI-edge      snd_hda_intel
 44:    1093397    1093774    1094078    1092726   PCI-MSI-edge      eth0
 45:     463403     463582     464719     463726   PCI-MSI-edge      fglrx[0]@PCI:2:0:0
NMI:         34         22         16         14   Non-maskable interrupts
LOC:   48777039   50875209   50309525   47323147   Local timer interrupts
SPU:          0          0          0          0   Spurious interrupts
PMI:         34         22         16         14   Performance monitoring interrupts
IWI:          0          0          0          0   IRQ work interrupts
RES:   15846948   16275344    8478731    8848893   Rescheduling interrupts
CAL:      59602      55581      89923      81541   Function call interrupts
TLB:     180524     182431     106818     105437   TLB shootdowns
TRM:          0          0          0          0   Thermal event interrupts
THR:          0          0          0          0   Threshold APIC interrupts
MCE:          0          0          0          0   Machine check exceptions
MCP:        112        112        112        112   Machine check polls
ERR:          0

If any of the CPU columns would be plenty of 0's, that would mean trouble. Otherwise all our CPUs are plenty of work.

Wednesday, June 12, 2013

Traffic control in Linux: classifiend and prioritizing traffic 2/2

According to the manual, tc uses the following rules for bandwidth definitions:

mbps = 1024 kbps = 1024 * 1024 bps => byte/s
mbit = 1024 kbit => kilo bit/s.
mb = 1024 kb = 1024 * 1024 b => byte
mbit = 1024 kbit => kilo bit.

Internally, the number is stored in bps and b.

When tc prints the rate, it uses following :

1Mbit = 1024 Kbit = 1024 * 1024 bps => byte/s
The kernel will honor the TOS field in the packets (Type Of Service), which is defined as:

TOS     Bits  Means                    Linux Priority    Band
------------------------------------------------------------
0x0     0     Normal Service           0 Best Effort     1
0x2     1     Minimize Monetary Cost   1 Filler          2
0x4     2     Maximize Reliability     0 Best Effort     1
0x6     3     mmc+mr                   0 Best Effort     1
0x8     4     Maximize Throughput      2 Bulk            2
0xa     5     mmc+mt                   2 Bulk            2
0xc     6     mr+mt                    2 Bulk            2
0xe     7     mmc+mr+mt                2 Bulk            2
0x10    8     Minimize Delay           6 Interactive     0
0x12    9     mmc+md                   6 Interactive     0
0x14    10    mr+md                    6 Interactive     0
0x16    11    mmc+mr+md                6 Interactive     0
0x18    12    mt+md                    4 Int. Bulk       1
0x1a    13    mmc+mt+md                4 Int. Bulk       1
0x1c    14    mr+mt+md                 4 Int. Bulk       1
0x1e    15    mmc+mr+mt+md             4 Int. Bulk       1
As an example, from the RFC 1349 we can see these definitions:

TELNET -> 1000 (8 in decimal) => minimize delay
FTP Control -> 1000 (8 in decimal) => minimize delay
FTP Data -> 0100 (4 in decimal) => maximize throughput
Modifying the TOS field on the traffic we can get our on line games to have priority over the rest of the network : (ok, perhaps some other uses? :)


# iptables -t mangle -N games
# iptables -t mangle -A games -p tcp -s <my playstation ip> -j RETURN
# iptables -t mangle -A games -j TOS --set-tos Maximize-Throughput
# iptables -t mangle -A games -j RETURN
# iptables -t mangle -A POSTROUTING -p tcp -m tos --tos Minimize-Delay -j games
Now, let's put the scenario that we have a Internet line connected to our eth0 with a bandwidth of 10 MBPS. We want to reserve 30% of it to browse internet, 30% for our ftp server and the rest for a online game which uses the port 20000. First we create the qdisc:

# tc qdisc add dev eth0 root handle 1: htb default 90
# tc class add dev eth0 parent 1: classid 1:1 htb rate 10000 kbit ceil 10000 kbit
(we set the root class as 10MBPS)

Next classes will have 30%, 30% and 40% of the bandwidth:
# tc class add dev eth0 parent 1:1 classid 1:10 htb rate 3000kbit ceil 10000kbit
# tc class add dev eth0 parent 1:1 classid 1:20 htb rate 3000kbit ceil 10000 kbit
# tc class add dev eth0 parent 1:1 classid 1:30 htb rate 4000kbit ceil 10000 kbit

Now we will use the Stochastic Fairness Queueing (SFQ) to settle the defined classes. We will use the recommended value of 10 seconds for the queue:

# tc qdisc add dev eth0 parent 1:10 handle 10: sfq perturb 10
# tc qdisc add dev eth0 parent 1:20 handle 20: sfq perturb 10
# tc qdisc add dev eth0 parent 1:30 handle 30: sfq perturb 10

To finish, we will use the tc to identify traffic for each class (explained in the previous post). Destination port to browse will be 80, source port for our FTP server is 20 and the destination port for the game is 20000:

# tc filter add dev eth0 parent 1:0 protocol ip u32 match ip dport 80 0xffff classid 1:10
# tc filter add dev eth0 parent 1:0 protocol ip u32 match ip sport 20 0xffff classid 1:20
# tc filter add dev eth0 parent 1:0 protocol ip u32 match ip dport 20000 0xffff classid 1:30


That's it. Is interesting to have a look to at the manual for other options like bursting, etc.