Experimenting with ECMP on Netscreen

I have been experimenting with ECMP to balance outgoing connections over multiple internet connections. Configuring this is quite simple but it does have quite some side effects you might not immediately think about.

  • The one thing that I expected to be a major problem is HTTP/HTTPS traffic. And this turned out to be the case. Several websites lock a session to a specific IP address. When requests are balanced over two entirely different public IP addresses, you can expect to see some very bizar behavior. This includes websites of banks and the management interfaces of several devices that I need to use. So it didn't take long before all web traffic was forced back over one internet connection.
  • The result of a traceroute is completely useless. Each packet it sends is seen as a new session and they are sent out over both internet connections.
  • One system in the DMZ was somewhat slow. Tests quickly revealed that it was a DNS problem (surprise, surprise). While most systems use our own resolvers, this system was configured to use the DNS servers of both internet providers. Combined with ECMP this resulted in the following behavior:
    • Server sends (recursive) DNS Query to providerA. Firewall decides to send this packet via providerB. The DNS server of providerA won't allow the recursive query because it arrived from an address owned by providerB.
    • Server repeats the DNS Query, this time sending it to providerB. Firewall sends the packet via providerA. Same problem
    • Server tries again and gets lucky, query is sent to providerA and the firewall also routes the packet via providerA.
  • ...

That was all for today. I'm sure there will be other issues. Upgrading to a faster internet connection is so much easier, but a lot less fun ;)

update: I ran into another problem when I enabled anti-spoofing functionality on the external interfaces. When a packet arrives on interface ext0, the firewall looks up the route back to the source-ip. When a route exists via ext0, the packet is accepted. When the route points to another interface, this is detected as spoofing and the packet is dropped. The problem occurs when someone on the internal network tries to access a system in the DMZ. Systems in the DMZ have private addresses with 1-to-1 NAT on the firewall and users usually connect to the public IP because thats how it is registered in DNS. Actually, the systems have two public addresses, one on interface ext0 and one on interface ext1 and both are configured as round robbin DNS entries.
What happens is when user connects to a public address on ext1? Because the firewall doesn't know it "owns" the destination ip, this is simply configured as a NAT rule on ext1, it routes the traffic to the internet. If ext1 is chosen as the external interface, the firewall will detect that the packet loops back to itself and it will be delivered without a problem. If ext0 is chosen however, the packet will be sent to the internet, eventually arriving back on ext1. At this point the anti-spoofing functionality kicks in and drops the packet because the source-ip is that of ext0 and according to the routing tables, that address is located on ext0.

filed under

Post new comment

  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li>
  • Lines and paragraphs break automatically.
  • Web page addresses and e-mail addresses turn into links automatically.