Saturday, 13 October 2012

PPPoE and obscure packet dropping (some websites not working)...

I have been having a mare with packets just disappearing or packets being marked as invalid and dropped, then resulting in established and related packets being dropped in response to mangled requests.  I spent ages trying to debug this, logging anything and everything with iptables log rules.  I couldn't determine what on earth was going on, until I started reading about MSS exceeding the MTU of 1500 set by the ISP.  This is only apparent with large amounts of data and even more apparent on devices connecting from behind the firewall, with mangling going on.  Essentially, it boils down to packets being mangled, increasing the segment size, which eventually leads to a packet that exceeds the MTU when the ISP reroutes it.  At the firwall level, the packet is fine and fits just within the 1500 limit.  Proposed solutions seem to be centred around adjusting the PPPoE interface MTU to something that would not exceed the limit of 1500 when mangled.  This is a good idea to have set.  For me, I modified my PPP configuration and changed the MTU:

$ sed -i.bak 's/^mtu [0-9]\+$/mtu 1454/' /etc/ppp/peers/dsl-provider

This is not always guaranteed to work alone though, as I found out.  This is because you don't know what additional mangling is going on at the ISP's end.  In the end, I started reading the iptables man page for the 3.4 kernel I am running.  I found something very interesting...

       This target allows to alter the MSS value of TCP SYN packets, to control the maximum size for that connection (usually limiting it to your  outgoing  inter-
       face's MTU minus 40 for IPv4 or 60 for IPv6, respectively).  Of course, it can only be used in conjunction with -p tcp.

       This  target is used to overcome criminally braindead ISPs or servers which block "ICMP Fragmentation Needed" or "ICMPv6 Packet Too Big" packets.  The symp-
       toms of this problem are that everything works fine from your Linux firewall/router, but machines behind it can never exchange large packets:
        1) Web browsers connect, then hang with no data received.
        2) Small mail works fine, but large emails hang.
        3) ssh works fine, but scp hangs after initial handshaking.
       Workaround: activate this option and add a rule to your firewall configuration like:

               iptables -t mangle -A FORWARD -p tcp --tcp-flags SYN,RST SYN
                           -j TCPMSS --clamp-mss-to-pmtu

       --set-mss value
              Explicitly sets MSS option to specified value. If the MSS of the packet is already lower than value, it will not  be  increased  (from  Linux  2.6.25
              onwards) to avoid more problems with hosts relying on a proper MSS.

              Automatically  clamp  MSS  value  to (path_MTU - 40 for IPv4; -60 for IPv6).  This may not function as desired where asymmetric routes with differing
              path MTU exist -- the kernel uses the path MTU which it would use to send packets from itself to the source and destination IP  addresses.  Prior  to
              Linux  2.6.25,  only  the  path MTU to the destination IP address was considered by this option; subsequent kernels also consider the path MTU to the
              source IP address.

       These options are mutually exclusive.

So I wasn't going mad after all and it does appear to be a widely known issue.  So I gave it a try...

-A FORWARD -p tcp --tcp-flags SYN,RST SYN -j LOG --log-prefix "CLAMP-MSS-TO-PMTU" --log-tcp-options --log-ip-options --log-level 7
-A FORWARD -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --clamp-mss-to-pmtu

I am only specifying the LOG action here to ensure that it works.  Indeed it does.  After enabling it, I instantly started seeing entries appear in the logs and all the websites that weren't previously accessible  suddenly started working.  I don't need to explain why this works, since it's explained enough in the above manual excerpt.  But if you share my blight, then hopefully this helps you too!

Friday, 12 October 2012

Broadcast ping for host discovery

I saw a question posted on an exchange website today that took me back a few years.  Often, people assume that broadcast ping can be used for host discovery on a network.  But these days, broadcast ICMP requests are dropped silently to avoid stimulating springboard style DoS attacks.  This is essentially where you ping a number of hosts, as many as possible across a multitude of subnets, but have all the responses returned to the machine you want to attack.  This overloads the machine's network interface with ICMP traffic, obstructing real network traffic.  It also poses as a threat to ADSL connections, when you consider that most people have caps on their broadband.  It would be quite easy to render someone's internet connection useless through this attack mechanism.  For this reason, the default response for any machine, is to not respond to broadcast ICMP requests.  Most routers also filter these out, to prevent ICMP broadcast packets spreading across different subnets.

Anyway, host discovery can actually be obtained through a much simpler mechanism, given that you can ping specific hosts for a reply; again, providing that those hosts reply to ICMP requests.  The solution is to ping each possible permutation of addresses on a given subnet, looking for a response from any of them.  This sounds like a really heavyweight process, but it's actually very easy and very fast.  The following command demonstrates this, which can be incorporated into a script if you wish (brackets are important).

$ time ( s=192.168.0 ; for i in $(seq 1 254) ; do ( ping -n -c 1 -w 1 $s.$i 1>/dev/null 2>&1 && printf "%-16s %s\n" $s.$i responded ) & done ; wait ; echo )      responded     responded      responded    responded      responded

real    0m1.317s
user    0m0.004s
sys 0m0.084s

TP-WN822N AP/Master Mode - Part 3

The new kernel worked, at least for a little while.  I had to radically rework the network configuration in order to get it functioning with minimal firewall changes.  So initially I had 1 Gb on board Ethernet wired to the internal network, 100 Mb USB Ethernet on the ADSL line with the wifi provided by a bridge on the switch.  Now, the 1 GB provides the PPPoE interface and WiFi supplies the network.  I use the USB Ethernet adapter on it's own subnet, connected directly to my laptop, as a management port.

The best way to avoid changing the firewall, other than a simple bit of 'sedding', was to create a bridge interface between what is now regarded as the physical interface and the wireless interface.  The physical interface doesn't actually exist any more, but it's there just in case; call it future proofing against my own change of mind.  So let's look at the network/interfaces configuration...

allow-hotplug eth4
iface eth4 inet manual

allow-hotplug wlan0
iface wlan0 inet manual

auto br0
iface br0 inet static
    bridge_ports eth4 wlan0
    address ...

The rest is pretty straight forward.  This creates the bridge interface.  Actually, nothing happens on boot, because there is no eth4 device and no wlan0 device, therefore no bridge device is created.  But once udevd detects the ath9k driver, a udev rule takes care of starting the hostapd daemon, and thus creates the bridge.  The script also sets up extra iptables rules to allow traffic on br0, since everything is implicitly dropped.  Having hostapd start on system boot is no good, even though it's the recommended route.  With USB devices, they have to wait until the USB hub driver has loaded before plugged devices can be found.  By this point, the hostapd will have failed because the devices weren't present.  Instead, I completely rewrote my own version of the hostapd script, to ensure that the right thing is done when a device comes online and goes of line.  I have tested this, by repeatedly unplugging the USB adapter and plugging it back in.  Each time, udev unloads and loads the ath9k module, runs my script, thus installs firewall rules and runs hostapd.

Dilema! (another one)

The 3.2 kernel I am using seems to have a bug in the Atheros driver.  After a period of the driver being loaded and the adapter being connected, sending and receiving traffic, the kernel crashes.  There are a few problems with this.  One is that the kernel crashes!!  Two is that the kernel panic doesn't make it to syslog, so I have no idea what happens.  Three, I need to create serial connector for the hidden serial port on the NAS drive.  Hassle!

So next steps...  I have purchased a Nokia CA 42 lead from Amazon.  This has an inbuilt TTL, and considering it's only £4, it is far cheaper than the £30 for a proper USB TTL lead.  When it arrives, I will set to work on getting the serial console up and running so I can debug the driver crash.

In the meantime, I am going to build the 3.4 kernel using my scripts and investigate what changes have been made to the ath9k modules between 3.2, 3.4 and 3.6 kernels.  It's quite possible that I can backport a change that provides a solution, to my 3.2 kernel.

Stay tuned for more progress...

Monday, 8 October 2012

TP-WN822N AP/Master Mode - Part 2

Kernel Building!

Okay, building the kernels did eventually become a bit tedious given the problems I was having.  Initially the 3.5.5 kernel I built worked, but was shortly followed by intermittent kernel crashes; no panics, no core dumps (disabled).  So I immediately looked to the 3.4.X kernel, which would appear to be more stable.  At this point I had opted for the cross compiling approach, given 7 hours to wait for a build was a bit tedious.  To hopefully make the process of building kernels for the NAS more portable and easier, I created a script to do the job for me.  It automates the job of installing the appropriate build tools (cross compilers, utilities and libraries) as well as keeping these up-to-date by running through the install process on each invocation.

After some playing, I found a base configuration that worked well, based on my 2.6 kernel config.  At this point, I snap shot it and placed it with the script.  The script now snap shots according to kernel versions built out of the same directory.  I have the compilation time on my i7 down to 4 minutes.  That's incredible given it took 7 hours on a 200 MHz ARM CPU.  I supposed the biggest difference is the amount of memory available to me on the i7 and how many processes can be spawned.  I am running make with unlimited parallel processes, limited only by the system load.  I have this set to a load average of around 5.  So initially, there is a surge in memory usage and CPU usage, before it all settles down, with all 8 cores working at an average capacity of about 80%.  But to get an image and modules out in under 5 minutes is still astonishing!

Download the build-ukernel script

Stay tuned for more...

Friday, 5 October 2012

TP-WN822N AP/Master Mode

I have been running Debian on a hacked Buffalo LS500GL NAS for at least the past four years now and this little beast is still going strong.  I've since upgraded the HDD to a 1TB disk and use it for various things, such as Firewall, NIS server, NFS server, DLNA server, Web server and just about any other kind of server you can cram into the 128 MB of RAM.  It only has a 200 MHz ARM CPU, so it's no good for use as a transcoder, but for general services, it serves me well.

However, having just upgrades to a fibre connection, I thought I'd take advantage of the PPPoE availability and run a wireless access point, instead of having the switch plugged in.  The NAS uses some 11 watts, so is really economic.  Having the switch plugged in as well, draws an extra 8 or 9 watts; which, when you consider it is not really necessary, I may as well ditch it in favour of a USB access point.


There is of course a dilemma.  Most wireless drivers on Linux don't support access point mode.  The simple work around is to use NDIS wrapper to wrap native Windows kernel drivers, providing the Windows kernel API, whilst interfacing with the Linux kernel.  However, this is not possible when you are not running a supported Windows architecture, or at least one that you can finder drivers for.  In this case, there are no Windows drivers for ARM.


The solution is simple, so it would seem.  Find a device that has a Linux driver that supports AP mode.  These are few and far between and figuring out which of the devices will be suitable before you buy is a tough one.  Luckily I was able to find a few resources that indicated that the TP-WN822N USB adapter supports AP mode using the Atheros drivers (ath9k variant).  This driver is available in the 3.X.X kernels and onwards, so first port of call is to upgrade my 2.6.X kernel.

Cross compiling means running up a computer for hours on end while it builds, so I opt for building them on the NAS.  It takes a lot longer, given the memory restrictions and slow CPU, but you can log in, run up '/usr/bin/screen' and leave it running, while you go off and do something else as equally constructive for 7 hours, like sleep!

So here is my procedure:

ssh root@nasdrive
exec screen
cd /usr/src/linux-3.X.X
make menuconfig

Installation is slightly different given the U-Boot loader that runs out of the NAS flash.  For the NAS, you need to build the zImage and modify it slightly:

make zImage
mkimage -O linux -T kernel -C none -a 0x00008000 -e 0x00008000 -n 'linux' -d <( devio 'wl 0xe3a01c06,4' 'wl 0xe3811031,4'; cat arch/arm/boot/zImage )
mv uImage-3.X.X
mv .config config-3.X.X

Once all this has been built, it can be copied to the /boot partition and the default links updated appropriately.  It's no big deal if the NAS doesn't come up on reboot.  You can dismantle it, remove the HDD, relink the old kernel and start over.  This doesn't happen often, but when it does, is usually the result of not having built the uImage with the correct devio header.

So, let's leave it there for now.  I am going to set my kernel building and will come back and report the next stages of getting the ath9k driver loaded and getting the wlan interface up for the TP-LINK adapter.  Stay tuned...