Rant: Network Application Firewalls

I often get into discussions about "next-generation firewalls" versus "traditional firewalls". All firewall vendors are moving in this same direction, adding deeper inspection to their devices.
For sales, it's a very easy pitch. Everyone is annoyed with the limited visibility L4 firewalls have and proxies have always been a necessary evil. Along comes this one new box that will fix all these issues. Who wouldn't want one?

Marketing will say you no longer need to worry about TCP or UDP ports, just applications. But if you think about this for a few seconds, you'll know this isn't true. Applications can only be identified once a connection has been established and a certain amount of traffic has been passed back and forth. So if you want to allow HTTP on any port, you need to allow TCP connections to any port and block them once they turn out to be something different than HTTP. You just opened your entire network to portscans and all sorts of badness. This is a very easy configuration error to make - I've seen it happen several times already.

I was going to write about some of the interesting things you could do to trick the application-identification, but found an excellent presentation which pretty much sums up my thoughts:
DEFCON19 Presentation Slides
DEFCON19 Presentation Video
You may recognize the CLI - but this applies to all vendors. It's about the technology, not the individual products.

In summary:

  • application-identification only looks at the first bytes and can be tricked.
  • application-identification caches can be poisoned.
  • its not all unicorns and rainbows, sorry.

Conclusion:
Does this mean application firewalls are useless? Absolutely not. You simply need to be aware of its limitations.

In my opinion, L7 firewalls are an additional layer of defense, not a replacement for the stateful firewalls. You still need a stateful L4 box in front to attach the network segments to that don't require application-identification or need better performance. This also reduces the risk of configuration errors on the L7 device.
On the L7 firewall you need to remember to only allow applications on the ports they need and that application-identification is an expensive operation, especially for certain UDP based protocols.

Juniper SRX DNS ALG behavior

The Juniper SRX firewalls have a special application layer gateway (ALG) for the DNS protocol. This ALG performs the following tasks:

  1. Closes the session as soon as a DNS reply is seen.
  2. Limits the maximum message length.
  3. Drops packet when domain name is more than 255 bytes or the label is more than 63 bytes.
  4. Decompresses points and detects loops.
  5. Performs DNS Doctoring.

Some of these functions are interesting to provide a little bit more security but most of the have major drawbacks and their default settings are just plain stupid. The default maximum message size is 512, even though everyone has been using EDNS0 for years. DNS doctoring is enabled by default, very poorly documented and causes unexpected changes in DNS replies. I'll explain the issues a bit more in detail and then suggest what I would consider to be a better default configuration.
Feel free to skip the rant :)

Closing sessions when a response is received

Closing the session as soon as the DNS server responds sounds like a very good idea to reduce the number of sessions in use on the firewall. DNS transactions are typically just two packets, waiting for the UDP session timeout every time would mean the firewall has to keep track of a lot more sessions and as most will probably know, DNS is the easiest way to kill stateful firewalls.

The drawback here is that the ALG does not keep track of the DNS transaction IDs, it closes the session as soon as a reply packet is seen. If you have an ancient DNS server which sends all its requests from the same source port, this will cause problems when there is a lot of DNS traffic. The DNS server may send multiple requests in parallel to a remote server for different names, but as soon as the first response is received, the ALG closes the session and the subsequent responses are dropped.
In practice this isn't that much of an issue nowadays. Everyone should be using random source ports on their DNS servers, for security reasons. But years after the Kaminsky hype, there are still servers that haven't been patched.

Limiting the message length to 512 bytes

I have no idea why Juniper even implemented this "feature", let alone make it the default behavior. When the first SRX was released, EDNS0 was already used by most DNS servers and that requires the firewall to allow larger DNS packet sizes.

EDNS0 is an extension which allows for larger DNS packet sizes. The original RFC only allowed 512 bytes, but now that more and more domains are switching to IPv6 and start using DNSSEC, responses are often larger than that. Nearly all DNS servers support EDNS0 and will attempt to to use it. The problem is that the SRX will block the responses, the DNS server waits for a timeout and then retries with a smaller packet size, which will get a response. To the end user, it will appear as if "the internet is slow".
BIND based name servers will generate a lot of these log messages:

named[666]: success resolving 'news.bbc.co.uk/AAAA' (in 'bbc.co.uk'?) after reducing the advertised EDNS UDP packet size to 512 octets

Luckily this behavior can be fixed in configuration. I would recommend everyone using the DNS ALG to increase the permitted message size:
set security alg dns maximum-message-length 8192

DNS Doctoring

DNS Doctoring is a functionality where the firewall will look at DNS responses from your DNS servers for addresses that have a static NAT rule defined and will then change the IP in the DNS response to the NAT address. This behavior is wrong in so many ways. There is very little documentation about this - as far as I know this behavior gets triggered when both the DNS server and the response have a static NAT rule, but I may be wrong. If you think you need functionality like this, you should rethink your DNS infrastructure. Other than it being an extremely ugly kludge, it doesn't always work and will fail in the future if you decide to use DNSSEC.

When this feature was first introduced, it couldn't even be disabled. But in more recent JunOS releases it can be disabled using the following command:

set security alg dns doctoring sanity-check

Known limitiation: DDNS Updates

As documented on the Juniper KB, the SRX DNS ALG does not permit DDNS updates. Especially when the SRX is used as an internal firewall, separating clients from their Active Directory DNS servers, this can be a problem. Many people rely on DDNS to register client or server names in DNS. The SRX will drop these registrations, resulting in missing or outdated DNS records. The only solution until now is disabling the DNS ALG:

set security alg dns disable

ALG Statistics

There are no CLI commands to see how much traffic was dropped by the ALG, but the information can be obtained from the forwarding daemon directly:

admin@srx> request pfe execute target fwdd command "show usp alg dns stats"
SENT: Ukern command: show usp alg dns stats
GOT:
GOT: dns-alg init state: 1
GOT: dns-alg (total 34)
GOT:    pkts received                      32265728
GOT:    pkts received NULL jbuf                   0
GOT:    pkts received dup NULL jbuf               0
GOT:    pkts received ipfrag                      0
GOT:    pkts received V6                     181659
GOT:    pkts received V4                   32084069
GOT:    pkts invalid                              0
GOT:    pkts examine                       32119585
GOT:    pkts reply                         15771693
GOT:    pkts truncated message                 2448
GOT:    pkts oversize message                     1
GOT:    parse oversize name                       0
GOT:    parse oversize label                      0
GOT:    oversize compression pointer              0
GOT:    undersize compression pointer             1
GOT:    pkts parse quesion fail                   3
GOT:    pkts parse answers fail                   0
GOT:    pkts parse authority fail            227501
GOT:    pkts parse additional fail                3
GOT:    pkts parse fail                      227507
GOT:    pkts NAT need                       2599821
GOT:    pkts NAT V4 need                          0
GOT:    pkts NAT V6 need                          0
GOT:    pkts NAT xlated                           0
GOT:    pkts NATPT need                           0
GOT:    pkts DUP A query                          0
GOT:    pkts Dup A query fail                     0
GOT:    pkts receive A response                   0
GOT:    pkts receive DUP A response               0
GOT:    pkts xlate A2AAAA fail                    0
GOT:    pkts xlate A2AAAA                         0
GOT:    session interest                   13808061
GOT:    session not interest                  91325
GOT:    pkts update name offset                   0
GOT: max message length                        8192
LOCAL: End of file


TL;DR: Recommended Default Config

For branch SRX devices, I would use the following default configuration:

security {
    alg {
        dns {
            maximum-message-length 8192;
            doctoring {
                sanity-check;
            }
        }
    }
}

This maintains some of the ALG advantages and fixes the annoying default behavior.
When you need to allow DDNS through the firewall, the only option is to disable the ALG entirely:

security {
    alg {
        dns {
            disable;
        }
    }
}

Blog Category:

Juniper SRX flow traceoptions: plugins

Looking at SRX flow traces, there are a lot references to internal IDs. Without knowing what all these numbers mean, its hard to tell which configuration or additional services may adversely affect a flow.
Lets use the trace output as an example:

Feb 19 10:25:28 10:25:27.1062565:CID-0:RT:<192.168.50.251/46800->216.239.36.10/53;17> matched filter foo:
Feb 19 10:25:28 10:25:27.1062565:CID-0:RT:packet [80] ipid = 37701, @423ad69a
Feb 19 10:25:28 10:25:27.1062565:CID-0:RT:---- flow_process_pkt: (thd 1): flow_ctxt type 14, common flag 0x0, mbuf 0x423ad480, rtbl_idx = 0
Feb 19 10:25:28 10:25:27.1062565:CID-0:RT: flow process pak fast ifl 70 in_ifp fe-0/0/1.0
Feb 19 10:25:28 10:25:27.1062565:CID-0:RT: find flow: table 0x4cbec950, hash 38561(0xffff), sa 192.168.50.251, da 216.239.36.10, sp 46800, dp 53, proto 17, tok 6
Feb 19 10:25:28 10:25:27.1062565:CID-0:RT:  flow_first_create_session
Feb 19 10:25:28 10:25:27.1062565:CID-0:RT:  flow_first_in_dst_nat: in <fe-0/0/1.0>, out <N/A> dst_adr 216.239.36.10, sp 46800, dp 53
Feb 19 10:25:28 10:25:27.1062565:CID-0:RT:  chose interface fe-0/0/1.0 as incoming nat if.
Feb 19 10:25:28 10:25:27.1062565:CID-0:RT:flow_first_rule_dst_xlate: DST no-xlate: 0.0.0.0(0) to 216.239.36.10(53)
Feb 19 10:25:28 10:25:27.1062565:CID-0:RT:flow_first_routing: vr_id 0, call flow_route_lookup(): src_ip 192.168.50.251, x_dst_ip 216.239.36.10, in ifp fe-0/0/1.0, out ifp N/A sp 46800, dp 53, ip_proto 17, tos 0
Feb 19 10:25:28 10:25:27.1062565:CID-0:RT:Doing DESTINATION addr route-lookup
Feb 19 10:25:28 10:25:27.1062565:CID-0:RT:  routed (x_dst_ip 216.239.36.10) from trust (fe-0/0/1.0 in 0) to fe-0/0/0.0, Next-hop: 84.196.0.1
Feb 19 10:25:28 10:25:27.1062565:CID-0:RT:  policy search from zone trust-> zone untrust (0x0,0xb6d00035,0x35)
Feb 19 10:25:28 10:25:27.1062565:CID-0:RT:  app 16, timeout 60s, curr ageout 60s
Feb 19 10:25:28 10:25:27.1062565:CID-0:RT:flow_first_src_xlate:  nat_src_xlated: False, nat_src_xlate_failed: False
Feb 19 10:25:28 10:25:27.1062565:CID-0:RT:flow_first_src_xlate: src nat returns status: 1, rule/pool id: 1/2, pst_nat: False.
Feb 19 10:25:28 10:25:27.1062565:CID-0:RT:  dip id = 2/0, 192.168.50.251/46800->84.196.14.21/22287 protocol 17
Feb 19 10:25:28 10:25:27.1062565:CID-0:RT:  choose interface fe-0/0/0.0 as outgoing phy if
Feb 19 10:25:28 10:25:27.1062565:CID-0:RT:is_loop_pak: No loop: on ifp: fe-0/0/0.0, addr: 216.239.36.10, rtt_idx:0
Feb 19 10:25:28 10:25:27.1062565:CID-0:RT:jsf sess interest check. regd plugins 19
Feb 19 10:25:28 10:25:27.1062565:CID-0:RT: Allocating plugin info block for 20 plugin(s) from OL
Feb 19 10:25:28 10:25:27.1062565:CID-0:RT:-jsf int check: plugin id  2, svc_req 0x0. rc 4
Feb 19 10:25:28 10:25:27.1062565:CID-0:RT:-jsf int check: plugin id  3, svc_req 0x0. rc 4
Feb 19 10:25:28 10:25:27.1062565:CID-0:RT:-jsf int check: plugin id  5, svc_req 0x4. rc 3
Feb 19 10:25:28 10:25:27.1062565:CID-0:RT:Add plugid:5 to int table at :0, fill hole:0, holes:0
Feb 19 10:25:28 10:25:27.1062565:CID-0:RT:-jsf int check: plugin id  6, svc_req 0x0. rc 4
Feb 19 10:25:28 10:25:27.1062565:CID-0:RT:-jsf int check: plugin id  7, svc_req 0x0. rc 4
Feb 19 10:25:28 10:25:27.1062565:CID-0:RT:-jsf int check: plugin id  8, svc_req 0x0. rc 4
Feb 19 10:25:28 10:25:27.1062565:CID-0:RT:-jsf int check: plugin id 12, svc_req 0x0. rc 4
Feb 19 10:25:28 10:25:27.1062565:CID-0:RT:+++++++++++jsf_test_plugin_data_evh: 3
Feb 19 10:25:28 10:25:27.1062565:CID-0:RT:-jsf int check: plugin id 13, svc_req 0x0. rc 4
Feb 19 10:25:28 10:25:27.1062565:CID-0:RT:-jsf int check: plugin id 14, svc_req 0x0. rc 4
Feb 19 10:25:28 10:25:27.1062565:CID-0:RT:-jsf int check: plugin id 15, svc_req 0x0. rc 4
Feb 19 10:25:28 10:25:27.1062565:CID-0:RT:-jsf int check: plugin id 18, svc_req 0x0. rc 2
Feb 19 10:25:28 10:25:27.1062565:CID-0:RT:-jsf int check: plugin id 19, svc_req 0x0. rc 4
Feb 19 10:25:28 10:25:27.1062565:CID-0:RT: Allocating plugin info block for 1 plugin(s) from OL
Feb 19 10:25:28 10:25:27.1062565:CID-0:RT:  Attaching plugin 5, at index 0
Feb 19 10:25:28 10:25:27.1062565:CID-0:RT: Releasing plugin info block for 20 plugin(s) to OL
Feb 19 10:25:28 10:25:27.1062565:CID-0:RT:  Plugins enabled for session = 1 (frwk svcs mask 0x0), post_nat cnt 0
...

So this packet is permitted by the policy, but "plugin 5" has been enabled for the session. The question is of course, what is this plugin? To my knowledge this information is not available in the CLI, but luckily we can ask the forwarding deamon directly using the following command:

admin@srx> request pfe execute target fwdd command "show usp plugins"
SENT: Ukern command: show usp plugins
GOT:
GOT: Number of plugins: 19
GOT: Plugin: id: 1, name: junos-syn-term
GOT: Plugin: id: 2, name: junos-screen-adapter
GOT: Plugin: id: 3, name: junos-fwauth-adapter
GOT: Plugin: id: 4, name: junos-syn-init
GOT: Plugin: id: 5, name: junos-appid-packet
GOT: Plugin: id: 6, name: junos-appfw
GOT: Plugin: id: 7, name: junos-idp
GOT: Plugin: id: 8, name: junos-uf
GOT: Plugin: id: 9, name: junos-tcp-svr-emul
GOT: Plugin: id: 10, name: junos-ssl-term
GOT: Plugin: id: 12, name: junos-captive-portal
GOT: Plugin: id: 13, name: junos-test
GOT: Plugin: id: 14, name: junos-alg
GOT: Plugin: id: 15, name: junos-utm
GOT: Plugin: id: 16, name: junos-ssl-init
GOT: Plugin: id: 17, name: junos-tcp-clt-emul
GOT: Plugin: id: 18, name: junos-uac
GOT: Plugin: id: 19, name: junos-utm-udp
LOCAL: End of file

So this mysterious plugin turns out to be appid, which I did indeed activate for testing purposes the other day. In this case it didn't cause any problems but when you see plugins being triggered, it always good to know what they are used for, they could indicate a configuration error.

Blog Category:

Juniper SRX: selectively disable TCP SYN or Sequence checking

SRX are stateful firewalls and will only allow traffic which matches an existing session. Sessions are created when a TCP SYN packet is received and it is permitted by the security policy. This of course means that the firewall needs to see both directions of a flow (client-server and server-client), otherwise these checks will block legitimate packets.
Whenever possible its best to ensure that asymmetric flows can't occur, but this is not always possible. Therefor you can disable these checks globally on the SRX:

set security flow tcp-session no-syn-check
set security flow tcp-session no-sequence-check

Obviously configuring this has a security impact and because it is a global option, it applies to all traffic flowing through the device. That's unfortunate as these checks typically only need to be enabled for a few policies. Luckily recent JunOS releases allow these checks to be enabled on a per-policy basis, like this:

policy trust-to-untrust {
    match {
        source-address any;
        destination-address any;
        application any;
    }
    then {
        permit {
            tcp-options {
                syn-check-required;
                sequence-check-required;
            }
        }
    }
}

The problem here is that Juniper implemented "syn-check-required" and "sequence-check-required" options instead of "no-syn-check-required" and "no-sequence-check-required" which would be a lot more usable in the real word. But because this is JunOS, there are ways around this of course. To disable TCP SYN or sequence checking on one policy while enabling it on all other policies, an apply-group can be used. The idea here is the following:

  1. Globally disable syn and sequence checking
  2. Using an apply-group to set "syn-check-required" and "sequence-check-required" on ALL security policies
  3. Using apply-groups-except to disable this apply-group on the few policies where syn or sequence checking is not desired

In JunOS code it looks like this:

groups {
    require_syn_seq_checking {
        security {
            policies {
                from-zone <*> to-zone <*> {
                    policy <*> {
                        then {
                            permit {
                                tcp-options {
                                    syn-check-required;
                                    sequence-check-required;
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
 
security {
    policies {
        apply-groups require_syn_seq_checking;
    }
}
 
security {
    policies {
	    from-zone foo to-zone bar {
		    policy one {
			    apply-groups-except require_syn_seq_checking;
                ...
			}
		}
	}
}

Hopefully Juniper will some day implement the "no-sequence-check" option at a per-policy level, but until then, this workaround can be used.

Pages

Subscribe to Bart Jansens RSS Subscribe to Bart Jansens - All comments