Here in SFU’s residences, we have mandatory internet provided for us through a service called ResNet (or RezNet depending on where you look). It has been notoriously unreliable over the five semesters I have been in SFU, which I’ll discuss a little more later. The constant reliability issues over my first two semesters prompted me to get a second connection from Shaw so that I would generally always have internet connectivity.
But I got tired of manually swapping the connections between computers when the ResNet one wasn’t working properly. I just happened to have a server of reasonable speed with three network interfaces and a license of VMware Workstation. So it naturally followed that I decided to use that to run a router, but the issue was that I needed something that was good at load balancing, and in my research, Linux routers weren’t the best at it. Enter PFsense.
Initially, I had intended to setup NetBSD on my server to act as a Xen host and then run Linux on top of that for an OS I was familiar with, with NetBSD being a router host with PF, a central part of a router/firewall, running the sharing of the two connections.
Unfortunately, I could not get NetBSD to play nicely with my hardware (and in fact, no Linux kernel older than 2.6.27) and gave up and finally settled on Arch Linux due to the fact that it is almost always up to date. However, it still sat in a configuration where I had to manually change configuration files in order to switch internet connections due to my inexperience with VMware Workstation.
After some reading, I finally figured out that Workstation already had a bunch of built in networking functions, including bridging, which had been my stumbling block. Once I figured out how easy it was to use it to bridge between physical and virtual connections, there wasn’t anything keeping me from using Workstation to host a router.
So I went in search of an OS that had strong load balancing capabilities, and finally settled on PFsense, which even had an already made VMware compatible virtual appliance.
So, with PFsense happily running on Workstation, and my two WAN facing NICs setup not to grab IP addresses for the host OS, I was ready to try PFsense out.
Before I continue, I think a little background on my network layout is probably helpful. I’ve got both internet connections fed into a smart-managed (read cheap but with some advanced functionality) switch and from there, through port-based VLANs to my two onboard NICs in my server. Those are bridged to the WAN side of my PFsense VM (Virtual Machine), and then there’s another hardware NIC going from the LAN side of the VM to the rest of my physical switch, which then connects all my other various devices and computers together. It is a nice arrangement, and allows me to reconfigure my network at will in software rather than in hardware (actually moving cables between ports).
PFsense itself is based on FreeBSD with PF, and has a great web-configuration interface, as well as SSH access and normal console access. Setup itself was easy, based on a wiki article for Windows setup here that also works with Linux with some changes. Once I had it configured for one connection, I followed the tutorials for load balancing here, with the changes that I don’t have any routers between me and the internet.
There is a word of warning here, and that is that running virtualized is less secure than if I was running PFsense on a standalone machine, but I think I’ve properly mitigated with as much as I can by making sure Linux passes traffic on my two WAN ports directly to the VM and doesn’t do anything else with it. Your mileage, as the saying goes, may vary in this department and it is suggested that you have a NAT router in front of the VM. Which is something I was trying to avoid as I routinely kill $300 pro-sumer/SoHo routers when using them heavily and I needed something that could take the load of my typical usage patterns, including over 1TB of transfer through it in some months.
So after I had the router configured, I proceeded to make a silly mistake (due to not quite understanding what I was doing in PFsense) and proceeded to mis-set the failover rules so that they would never use the Shaw connection, which due to the then current levels of packet loss happening with my ResNet connection, was annoying to say the least. Once I noticed that I’d set both failovers to go the same way (WAN1->WAN2) instead of one each, I noticed my internet experience get much better.
PFsense has nice graphs of connection quality, and I thought I’d include one here to illustrate just how bad ResNet was around the time when I first got PFsense up and running. Additional graphs show processor utilization, states, transfer and many other interesting and useful statistics.
2 day quality graph generating using RRDtool through PFsense. The part that actually looks decent was a period of 100% packet loss as the graphs are only generated to the gateway. This can be changed by manually editing a configuration file.
After I got the load balancing working, I set up some policy based routing (as per the tutorial) so that traffic to SFU’s IP ranges automatically goes to the ResNet connection (which ostensibly is connected to SFU’s network through a high-speed link, although I’m unsure about that some days) and when that is down, goes out through my Shaw connection. Additionally, SSL/HTTPS and other encrypted traffic is set to use my Shaw connection unless that is down (and has had less than four hours of downtime in nine months) to keep banking websites and the like from complaining if the connection were normally load balanced and hopped physical connections (with different IPs). Additionally, my Shaw email is set to only use the Shaw connection.
Now for some on my actual experiences using PFsense for the past 5+ months or so.
I have had one issue, but I’m not sure if it’s a PFsense issue or a VMware issue, but one time when I restarted my server, PFsense had lost its configuration. Luckily, I had a configuration backed up and was able to easily and quickly restore it, but it was still an annoyance.
I’ve never had PFsense lockup or crash on me that wasn’t a fault of either me directly, or of my messing with something on the host OS. The only issue I have seen is a marked slowdown when it is initially starting due to Snort hogging all of the system resources to startup. (The VM is running with 384MB of physical memory and 1 core of a Q6600 assigned to it.)
Performance is great. I have greatly increased the number of connections μTorrent allows BitTorrent transfers to use to take advantage of the two connections, allowing speeds to hit ~5.2MB/s down (~41.6Mb/s!) and up to around 250kB/s upstream (~2Mb/s upstream). Even when running many torrent transfers (~50) and with nearly 25,000 connections open, the router barely consumes more than 25% CPU of a single core of a 2.4GHz Core 2 Duo and stays under the maximum of 384MB of RAM. The thing that seems to consume the most resources seems to be the Snort plugin from the PFsense repos.
A speedtest showing how fast two internet connections can be when load balanced. 15Mb/s Shaw connection (possibly up to 35Mb/s with PowerBoost) and 10Mb/s SFU ResNet connection (possibly up to 100Mb/s, but I've never seen it go over about 50Mb/s on its own).
But the speed isn’t why I wanted two internet connections (though it is a nice bonus), the reliability is. I have never a time when both connections were down simultaneously, and PFsense wouldn’t mind if I added a third connection from a diverse provider (say Rogers Portable Internet or something similar from a local WISP) for added redundancy.
All in all, PFsense does everything I need it to do and I’ve been very happy with the results. I highly suggest it for straight up load balancing for those needing a higher speed connection (but beware, there are caveats, such as most things other than BitTorrent will not run at speeds faster than one single connection) or have a single unreliable connection and don’t have the option of switching providers.
Tags: bsd, internet, linux, load-balance, pfSense, ResNet, RezNet, routing, SFU, Shaw
Nice write up. The quality graph made me laugh, though I’m sure it wasn’t too funny for you.
I think that’s the worst I’ve ever seen that wasn’t 100% down.
Glad you enjoy the project.
@Chris: we did sometimes joke that it was faster/more reliable when unplugged