Archive for June, 2009

My Experiences with a Virtualized pfSense Router

Sunday, June 7th, 2009

Here in SFU’s residences, we have mandatory internet provided for us through a service called ResNet (or RezNet depending on where you look). It has been notoriously unreliable over the five semesters I have been in SFU, which I’ll discuss a little more later. The constant reliability issues over my first two semesters prompted me to get a second connection from Shaw so that I would generally always have internet connectivity.
But I got tired of manually swapping the connections between computers when the ResNet one wasn’t working properly. I just happened to have a server of reasonable speed with three network interfaces and a license of VMware Workstation. So it naturally followed that I decided to use that to run a router, but the issue was that I needed something that was good at load balancing, and in my research, Linux routers weren’t the best at it. Enter PFsense.

Initially, I had intended to setup NetBSD on my server to act as a Xen host and then run Linux on top of that for an OS I was familiar with, with NetBSD being a router host with PF, a central part of a router/firewall, running the sharing of the two connections.
Unfortunately, I could not get NetBSD to play nicely with my hardware (and in fact, no Linux kernel older than 2.6.27) and gave up and finally settled on Arch Linux due to the fact that it is almost always up to date. However, it still sat in a configuration where I had to manually change configuration files in order to switch internet connections due to my inexperience with VMware Workstation.
After some reading, I finally figured out that Workstation already had a bunch of built in networking functions, including bridging, which had been my stumbling block. Once I figured out how easy it was to use it to bridge between physical and virtual connections, there wasn’t anything keeping me from using Workstation to host a router.
So I went in search of an OS that had strong load balancing capabilities, and finally settled on PFsense, which even had an already made VMware compatible virtual appliance.

So, with PFsense happily running on Workstation, and my two WAN facing NICs setup not to grab IP addresses for the host OS, I was ready to try PFsense out.
Before I continue, I think a little background on my network layout is probably helpful. I’ve got both internet connections fed into a smart-managed (read cheap but with some advanced functionality) switch and from there, through port-based VLANs to my two onboard NICs in my server. Those are bridged to the WAN side of my PFsense VM (Virtual Machine), and then there’s another hardware NIC going from the LAN side of the VM to the rest of my physical switch, which then connects all my other various devices and computers together. It is a nice arrangement, and allows me to reconfigure my network at will in software rather than in hardware (actually moving cables between ports).

PFsense itself is based on FreeBSD with PF, and has a great web-configuration interface, as well as SSH access and normal console access. Setup itself was easy, based on a wiki article for Windows setup here that also works with Linux with some changes. Once I had it configured for one connection, I followed the tutorials for load balancing here, with the changes that I don’t have any routers between me and the internet.
There is a word of warning here, and that is that running virtualized is less secure than if I was running PFsense on a standalone machine, but I think I’ve properly mitigated with as much as I can by making sure Linux passes traffic on my two WAN ports directly to the VM and doesn’t do anything else with it. Your mileage, as the saying goes, may vary in this department and it is suggested that you have a NAT router in front of the VM. Which is something I was trying to avoid as I routinely kill $300 pro-sumer/SoHo routers when using them heavily and I needed something that could take the load of my typical usage patterns, including over 1TB of transfer through it in some months.
So after I had the router configured, I proceeded to make a silly mistake (due to not quite understanding what I was doing in PFsense) and proceeded to mis-set the failover rules so that they would never use the Shaw connection, which due to the then current levels of packet loss happening with my ResNet connection, was annoying to say the least. Once I noticed that I’d set both failovers to go the same way (WAN1->WAN2) instead of one each, I noticed my internet experience get much better.
PFsense has nice graphs of connection quality, and I thought I’d include one here to illustrate just how bad ResNet was around the time when I first got PFsense up and running. Additional graphs show processor utilization, states, transfer and many other interesting and useful statistics.

2 day quality graph generating using RRDtool through PFsense. The part that actually looks decent was a period of 100% packet loss as the graphs are only generated to the gateway.

2 day quality graph generating using RRDtool through PFsense. The part that actually looks decent was a period of 100% packet loss as the graphs are only generated to the gateway. This can be changed by manually editing a configuration file.

After I got the load balancing working, I set up some policy based routing (as per the tutorial) so that traffic to SFU’s IP ranges automatically goes to the ResNet connection (which ostensibly is connected to SFU’s network through a high-speed link, although I’m unsure about that some days) and when that is down, goes out through my Shaw connection. Additionally, SSL/HTTPS and other encrypted traffic is set to use my Shaw connection unless that is down (and has had less than four hours of downtime in nine months) to keep banking websites and the like from complaining if the connection were normally load balanced and hopped physical connections (with different IPs). Additionally, my Shaw email is set to only use the Shaw connection.

Now for some on my actual experiences using PFsense for the past 5+ months or so.

I have had one issue, but I’m not sure if it’s a PFsense issue or a VMware issue, but one time when I restarted my server, PFsense had lost its configuration. Luckily, I had a configuration backed up and was able to easily and quickly restore it, but it was still an annoyance.
I’ve never had PFsense lockup or crash on me that wasn’t a fault of either me directly, or of my messing with something on the host OS. The only issue I have seen is a marked slowdown when it is initially starting due to Snort hogging all of the system resources to startup. (The VM is running with 384MB of physical memory and 1 core of a Q6600 assigned to it.)
Performance is great. I have greatly increased the number of connections μTorrent allows BitTorrent transfers to use to take advantage of the two connections, allowing speeds to hit ~5.2MB/s down (~41.6Mb/s!) and up to around 250kB/s upstream (~2Mb/s upstream). Even when running many torrent transfers (~50) and with nearly 25,000 connections open, the router barely consumes more than 25% CPU of a single core of a 2.4GHz Core 2 Duo and stays under the maximum of 384MB of RAM. The thing that seems to consume the most resources seems to be the Snort plugin from the PFsense repos.

A speedtest showing how fast two internet connections can be when load balanced. 15Mb/s Shaw connection (possibly up to 35Mb/s with PowerBoost) and 10Mb/s SFU ResNet connection (possibly up to 100Mb/s, but Ive never seen it go over about 50Mb/s on its own).

A speedtest showing how fast two internet connections can be when load balanced. 15Mb/s Shaw connection (possibly up to 35Mb/s with PowerBoost) and 10Mb/s SFU ResNet connection (possibly up to 100Mb/s, but I've never seen it go over about 50Mb/s on its own).

But the speed isn’t why I wanted two internet connections (though it is a nice bonus), the reliability is. I have never a time when both connections were down simultaneously, and PFsense wouldn’t mind if I added a third connection from a diverse provider (say Rogers Portable Internet or something similar from a local WISP) for added redundancy.

All in all, PFsense does everything I need it to do and I’ve been very happy with the results. I highly suggest it for straight up load balancing for those needing a higher speed connection (but beware, there are caveats, such as most things other than BitTorrent will not run at speeds faster than one single connection) or have a single unreliable connection and don’t have the option of switching providers.

pfSense Router Virtualization part 2 (ISP discussion)

Sunday, June 7th, 2009

And now for a little bit on my experiences with my ISPs themselves, to give a little more background about why I even needed a second connection.
SFU’s residence has a mandatory internet connection that costs $127.16 per semester (or ~$32/month). During my first year here (2007/2008 school year) it was being run by ConnectWest Networks, a subsidiary of Data Fortress, and there were many reliability issues with it all school year, the largest of which was caused by a virus attack compounding poor network planning. A virus attack, which a friend and I diagnosed and informed ResNet about, but that’s a story for another post.
The unreliability was even worse in the fall semester of 2008, which by time the start of the semester rolled around, was completely out for almost two weeks. According to monitoring software I was running (just a version of MTR with aggregate stats, actually) for most of the fall semester, packet loss was an average of about 20% for the semester, although it was mostly grouped into large blocks of huge packet loss.

ResNets quality since I installed PFsense. The little break is a period when I was having some issues with my server and the large break is form when it was down when I wasnt at school. This is over a six month period, showing twelve hour averages. Over the period, the internet was never out for longer than a couple or hours, which is why there are no spike to 100% loss shown.

ResNet's quality since I installed PFsense. The little break is a period when I was having some issues with my server and the large break is form when it was down when I wasn't at school. This is over a six month period, showing twelve hour averages. Over the period, the internet was never out for longer than a couple or hours, which is why there is no spike to 100% loss shown.

This starkly contrasts to my Shaw connection, which has been down less than four hours while I’ve been monitoring it, and has only been down on three separate occasions. Once, and the longest, was about three hours do to a car crash which took out a power pole and Shaw’s plant on it. The other two I don’t know the causes of, but one may be due to someone’s improperly connected TV. Shaw wasn’t sure about that one either.
Oh, did I mention I pay about $35 a month for Shaw (through the student plan) which offers a more reliable connection that is only ~$3/month more money? Additionally, Shaw offered to credit me back every time I’ve called them about internet downtime, whereas for the number of times ResNet has been down/slow/unreliable, I only received a $16 credit for what should have amounted to a full refund for the semester (at home Shaw credited us for two months of somewhat unreliable service when we were among the first people in town to upgrade to DOCSIS).
ResNet is trying to make things better though, undergoing a large infrastructure upgrade to increase the bandwidth locally, which was not nearly enough for the amount of people using the network and possibly transferring files between each other. For comparison I have a faster internal network for my few computers and have deployed faster network for a few hundred people than they did for around 1800 people.
Additionally, the off campus connection is a measly 200Mb/s (at least, last I knew about) for 1800 people, which given us a contention ratio of 90:1 (1800 people with a 10Mb/s connection) which is much higher than Shaw, who supposedly is closer to 25:1-35:1 depending on neighborhood. This means that at peak times, the internet slows down quite a bit. I have never noticed a corresponding slowdown with my Shaw connection.

So, all in all, I’m happy with my Shaw connection. But my ResNet connection certainly isn’t worth what I’m paying for it, especially compared to my Shaw connection.
I, however, appreciate the changes that the new ISP, Urban Networks, is doing to improve the situation, but I still have at least one connection interruption about every day on average, though they generally only last a few seconds. How the network performs come fall, when there are many more people on it, remains to be seen.