BGP Additional-paths vs. Multipath

In the following post we're going to take a look at these two features: what are their purpose and in which situations can they be useful. They both provide high availability, but only BGP Multipath provides load balancing. Additional-paths is used to avoid sub-optimal routing in a Route Reflector (RR) environment.

BGP Additional-paths

What's the problem with the topology below? We have a Route Reflector (R1) which has five clients (R3, R5, R7, R8 and R9). So we have five BGP sessions between the edge routers and the RR, the edge routers don't have BGP neighborship with each other, they only receive/send prefixes to/from the RR. R10 advertises a single prefix 10.10.10.10/32 from AS 65002 in this topology.

BGP Add-path topology

So the RR receives the prefix from R3 (10.3.3.3) R5 (10.5.5.5) and R7 (10.7.7.7):

R1#show bgp ipv4 unicast | begin Net
     Network          Next Hop            Metric LocPrf Weight Path
 * i  10.10.10.10/32   10.7.7.7                 0    100      0 65002 i
 * i                   10.5.5.5                 0    100      0 65002 i
 *>i                   10.3.3.3                 0    100      0 65002 i

The RR, just like any other BGP router selects his best path to reach the destination. R1 chooses R3 as the exit point to reach AS 65002. Why did R1 choose R3? Basically all of the path attributes are the same, the IGP metric to reach the next-hop is the tiebreaker here:

R1#show bgp ipv4 unicast 10.10.10.10/32
BGP routing table entry for 10.10.10.10/32, version 2
Paths: (3 available, best #3, table default)
  Advertised to update-groups:
     1          2         
  Refresh Epoch 1
  65002, (Received from a RR-client)
    10.7.7.7 (metric 5) from 10.7.7.7 (10.7.7.7)
      Origin IGP, metric 0, localpref 100, valid, internal
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 1
  65002, (Received from a RR-client)
    10.5.5.5 (metric 4) from 10.5.5.5 (10.5.5.5)
      Origin IGP, metric 0, localpref 100, valid, internal
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 1
  65002, (Received from a RR-client)
    10.3.3.3 (metric 3) from 10.3.3.3 (10.3.3.3)
      Origin IGP, metric 0, localpref 100, valid, internal, best
      rx pathid: 0, tx pathid: 0x0

The lowest IGP metric (OSPF cost in this case) is preferred to reach the next-hop. R3 has the lowest metric (cost of 3), so R1 chooses 10.3.3.3 as the next-hop for his best path. What does R1 reflect to his RR-client? Only his best path with 10.3.3.3 as the next-hop:

R9#show bgp ipv4 unicast | begin Net
     Network          Next Hop            Metric LocPrf Weight Path
 *>i  10.10.10.10/32   10.3.3.3                 0    100      0 65002 i

R8#show bgp ipv4 unicast | begin Net
     Network          Next Hop            Metric LocPrf Weight Path
 *>i  10.10.10.10/32   10.3.3.3                 0    100      0 65002 i

R8 and R9 have no idea about that they could also use R5 and R7 as an exit point to reach AS 65002. So this results in suboptimal routing: for example if R9 wanted to reach AS 65002, we would use the following path R9 -> R4 -> R2 -> R3. (In my case R4 and R2 don't have route to 10.10.10.10/32 because they don't run BGP, so they would simply drop the packet in this case. That's not the point here. Image that we run MPLS in the core, and we provide an L3VPN service for AS 65002, in that case we would switch labels in the core, so the 'P' don't need to have the customer routes.) The point is that the RR only reflects the best path from HIS perspective to the edge routers. So the RR sends only ONE path, the other edge routers, R8 and R9 have no visibility of the other exit points such as R5 and R7.

With the Additional-paths capability the RR can send not just his best path, but also the other paths which he has in his BGP table. The additional-paths capability must be enabled both on the RR and on the edge routers as well. First let's enable it on the RR:

R1(config)#router bgp 65001
R1(config-router)#address-family ipv4 unicast 
R1(config-router-af)#neighbor 10.8.8.8 additional-paths ?
  disable  Disable additional paths for this neighbor
  receive  Receive additional paths from neighbors
  send     Send additional paths to this neighbor

R1(config-router-af)#neighbor 10.8.8.8 additional-paths send ?
  receive  Receive additional paths from this neighbor
  <cr>     <cr>

R1(config-router-af)#neighbor 10.8.8.8 additional-paths send
R1(config-router-af)#neighbor 10.9.9.9 additional-paths send

On the RR we need the send capability, while on the edge routers we're going to activate the receive capability. We can also enable both if we want, but that's not needed in this case. On the edge routers we need the receive capability:

R8(config)#router bgp 65001
R8(config-router)#address-family ipv4 unicast 
R8(config-router-af)#neighbor 10.1.1.1 additional-paths receive

R9(config)#router bgp 65001
R9(config-router)#address-family ipv4 unicast 
R9(config-router-af)#neighbor 10.1.1.1 additional-paths receive 

Now I do this for the IPv4 unicast address family, but we can also enable this feature for other address families as well if we want (VPNv4 for example). Here I enabled the feature for a single neighbor, but we can also enable it globally for the whole address family with the bgp additional-paths send|receive command. Also notice that these capabilities will be negotiated in the BGP OPEN messages, which results in a session flap:

BGP-5-ADJCHANGE: neighbor 10.8.8.8 Down Capability changed
BGP_SESSION-5-ADJCHANGE: neighbor 10.8.8.8 IPv4 Unicast topology base removed from session  Capability changed
BGP-5-ADJCHANGE: neighbor 10.8.8.8 Up 
BGP-5-ADJCHANGE: neighbor 10.9.9.9 Down Capability changed
BGP_SESSION-5-ADJCHANGE: neighbor 10.9.9.9 IPv4 Unicast topology base removed from session  Capability changed
BGP-5-ADJCHANGE: neighbor 10.9.9.9 Up 

BGP OPEN additional-paths
BGP OPEN message: now the RR can send additional-paths besides his best path

We can also verify the additional-paths capabilities with the following command:

R1#show ip bgp neighbors 10.8.8.8 | include Additional
  Additional Paths send capability: advertised
  Additional Paths receive capability: received

Now we need to select which paths we want to advertise on the RR:

R1(config-router-af)#bgp additional-paths select ?
  all            Select all available paths
  backup         Select backup path
  best           Select best N paths
  best-external  Select best-external path
  group-best     Select group-best path

R1(config-router-af)#bgp additional-paths select all
R1(config-router-af)#neighbor 10.8.8.8 advertise additional-paths ?
  all         Select all available paths
  best        Select best N paths
  group-best  Select group-best paths

R1(config-router-af)#neighbor 10.8.8.8 advertise additional-paths all

With the all option we simply select all paths which are available (and valid) in the BGP table, even if they have different next-hops. Alternatively with the best option we could select the second or the second and third best paths as well. So we can send two or three more additional paths in addition to the best path. The group-best option selects the best path from each AS. The backup option is mainly used with PIC, and we'll take a closer later at the best-external. Now if we take a look at the BGP table of the RR:

R1(config-router-af)#do show bgp ipv4 unicast | begin Net
     Network          Next Hop            Metric LocPrf Weight Path
 * ia 10.10.10.10/32   10.7.7.7                 0    100      0 65002 i
 * ia                  10.5.5.5                 0    100      0 65002 i
 *>i                   10.3.3.3                 0    100      0 65002 i

R1(config-router-af)#do show bgp ipv4 unicast 10.10.10.10
BGP routing table entry for 10.10.10.10/32, version 6
Paths: (3 available, best #3, table default)
  Path advertised to update-groups:
     4         
  Refresh Epoch 1
  65002, (Received from a RR-client)
    10.7.7.7 (metric 5) from 10.7.7.7 (10.7.7.7)
      Origin IGP, metric 0, localpref 100, valid, internal, all
      rx pathid: 0, tx pathid: 0x1
  Path advertised to update-groups:
     4         
  Refresh Epoch 1
  65002, (Received from a RR-client)
    10.5.5.5 (metric 4) from 10.5.5.5 (10.5.5.5)
      Origin IGP, metric 0, localpref 100, valid, internal, all
      rx pathid: 0, tx pathid: 0x2
  Path advertised to update-groups:
     1          2          4         
  Refresh Epoch 1
  65002, (Received from a RR-client)
    10.3.3.3 (metric 3) from 10.3.3.3 (10.3.3.3)
      Origin IGP, metric 0, localpref 100, valid, internal, best
      rx pathid: 0, tx pathid: 0x0

We have an 'a' in front of every row, meaning these routes can be sent as additional-paths. This doesn't necessarily mean that the router sends all three paths for all of his eBGP neighbors. Notice the different path IDs: this is what differentiates the paths and makes them unique. Now let's take a look at the routes we send to R8 (Adj-RIB-Out):

R1#show ip bgp neighbors 10.8.8.8 advertised-routes | begin Netw
     Network          Next Hop            Metric LocPrf Weight Path
 *>i  10.10.10.10/32   10.3.3.3                 0    100      0 65002 i
 * ia 10.10.10.10/32   10.7.7.7                 0    100      0 65002 i
 * ia 10.10.10.10/32   10.5.5.5                 0    100      0 65002 i

Total number of prefixes 3 

As we've expected the RR now advertises all three paths to R8. This is because we've issued the neighbor 10.8.8.8 advertise additional-paths allcommand. This is how the BGP Update looks like, notice the different path IDs, this indicates that we use additional-paths.

BGP Update with path-ID
BGP Update sent by the RR to R8

If we take a look at the advertised routes to R9:

R1#show ip bgp neighbors 10.9.9.9 advertised-routes | begin Netw
     Network          Next Hop            Metric LocPrf Weight Path
 *>i  10.10.10.10/32   10.3.3.3                 0    100      0 65002 i

Total number of prefixes 1 

We can see that the RR only advertises a single path for R9. This is because we didn't issue the previous command for R9, so let's do that:

R1(config-router-af)#neighbor 10.9.9.9 advertise additional-paths all

Now R1 sends all three paths to both R8 and R9. But this doesn't necessarily mean that R8 and R9 can use these path as backup paths for example. If we check the BGP table of R8, we can see that now R8 receives all three paths:

R8#show bgp ipv4 unicast | begin Net
     Network          Next Hop            Metric LocPrf Weight Path
 * i  10.10.10.10/32   10.7.7.7                 0    100      0 65002 i
 *>i                   10.5.5.5                 0    100      0 65002 i
 * i                   10.3.3.3                 0    100      0 65002 i

That looks good, R8 has now full visibility, and instead of R3, now he chooses R5 as his exit-point because of the lower IGP metric. But we can also go one step further. At this point only a single route is installed into the FIB with a next-hop of 10.5.5.5. We can also install a backup path into the FIB so that if R5 fails, the backup path can be used right away:

R8(config-router)#address-family ipv4 unicast 
R8(config-router-af)#bgp additional-paths select all
R8(config-router-af)#bgp additional-paths install

Now if we take a look at the BGP table we can see the 'a's and also a 'b' meaning that this path can be used as the backup path:

R8(config-router-af)#do show bgp ipv4 unicast | begin Net
     Network          Next Hop            Metric LocPrf Weight Path
 *bia 10.10.10.10/32   10.7.7.7                 0    100      0 65002 i
 *>i                   10.5.5.5                 0    100      0 65002 i
 * ia                  10.3.3.3                 0    100      0 65002 i

R8(config-router-af)#do show bgp ipv4 unicast 10.10.10.10/32
BGP routing table entry for 10.10.10.10/32, version 6
Paths: (3 available, best #2, table default)
  Additional-path-install
  Path not advertised to any peer
  Refresh Epoch 4
  65002
    10.7.7.7 (metric 3) from 10.1.1.1 (10.1.1.1)
      Origin IGP, metric 0, localpref 100, valid, internal, backup/repair, all
      Originator: 10.7.7.7, Cluster list: 10.1.1.1
      rx pathid: 0x1, tx pathid: 0x1
  Path advertised to update-groups:
     6         
  Refresh Epoch 4
  65002
    10.5.5.5 (metric 3) from 10.1.1.1 (10.1.1.1)
      Origin IGP, metric 0, localpref 100, valid, internal, best
      Originator: 10.5.5.5, Cluster list: 10.1.1.1
      rx pathid: 0x2, tx pathid: 0x0
  Path not advertised to any peer
  Refresh Epoch 4
  65002
    10.3.3.3 (metric 4) from 10.1.1.1 (10.1.1.1)
      Origin IGP, metric 0, localpref 100, valid, internal, all
      Originator: 10.3.3.3, Cluster list: 10.1.1.1
      rx pathid: 0x0, tx pathid: 0x2

So this feature is called PIC (Prefix Independent Convergence), and it helps reduce the data plane convergence time. In addition to 10.5.5.5, also 10.7.7.7 is installed into the FIB and can be used instantly in case R5 fails. Notice that we don't use load-balancing or Multipath at this point, 10.7.7.7 can only be used as the backup path.

BGP best external

Let's say that we want to use an outbound policy: we want to use R3 as our primary exit-point within the whole AS. So on R3 I increase the Local Preference attribute (100 is the default, we need a higher value) for the prefix 10.10.10.10/32:

R3(config)#ip prefix-list R10LOOP permit 10.10.10.10/32
R3(config)#route-map SET_LP permit 10
R3(config-route-map)#match ip address prefix-list R10LOOP
R3(config-route-map)#set local-preference 101
R3(config-route-map)#exit
R3(config)#route-map SET_LP permit 20
R3(config-route-map)#exit
R3(config)#router bgp 65001
R3(config-router)#neighbor 172.3.10.10 route-map SET_LP in
R3(config-router)#do clear ip bgp * in
R3(config-router)#do clear ip bgp * out

But this change affects the whole AS, not just R3. Now R5 and R7 also prefer R3 (10.3.3.3) to their eBGP neighbors:

R5#show bgp ipv4 unicast | begin Net
     Network          Next Hop            Metric LocPrf Weight Path
 *>i  10.10.10.10/32   10.3.3.3                 0    101      0 65002 i
 *                     172.5.10.10              0             0 65002 i

R7#show bgp ipv4 unicast | begin Net
     Network          Next Hop            Metric LocPrf Weight Path
 *>i  10.10.10.10/32   10.3.3.3                 0    101      0 65002 i
 *                     172.7.10.10              0             0 65002 i

By default they can only advertise their best path, since their best path is an iBGP path, they don't advertise the path through their eBGP neighbors (175.5.10.10 and 172.7.10.10). So the RR only has a single path though R3 to reach 10.10.10.10/32:

R1#show ip bgp | begin Net
     Network          Next Hop            Metric LocPrf Weight Path
 *>i  10.10.10.10/32   10.3.3.3                 0    101      0 65002 i

The RR has no idea that he could also use R5 and R7 as next-hops to reach 10.10.10.10/32. With the advertise-best-external feature we can make R5 advertise his eBGP path for the RR, even if R5 doesn't use his eBGP peer to reach the prefix. It's a single command which can be configured under the address family:

R5(config)#router bgp 65001
R5(config-router)#address-family ipv4 unicast 
R5(config-router-af)#bgp advertise-best-external 

Now R5 advertises his eBGP path, even though R5 uses an iBGP path to reach the prefix:

R5#show ip bgp neighbors 10.1.1.1 advertised-routes 
BGP table version is 4, local router ID is 10.5.5.5
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal, 
              r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter, 
              x best-external, a additional-path, c RIB-compressed, 
              t secondary path, 
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

     Network          Next Hop            Metric LocPrf Weight Path
 *b x 10.10.10.10/32   172.5.10.10              0             0 65002 i

Total number of prefixes 1 

The 'x' in front of the prefix indicates R5 uses the best-external feature. Now the RR receives the path from R5 and with the additional-paths feature it can also send it to R8, who can use the path as the backup path:

R1#show ip bgp | begin Net
     Network          Next Hop            Metric LocPrf Weight Path
 * ia 10.10.10.10/32   10.5.5.5                 0    100      0 65002 i
 *>i                   10.3.3.3                 0    101      0 65002 i

R8#show ip bgp | begin Net
     Network          Next Hop            Metric LocPrf Weight Path
 *bia 10.10.10.10/32   10.5.5.5                 0    100      0 65002 i
 *>i                   10.3.3.3                 0    101      0 65002 i

Notice that R8 will also use R3 as the next-hop because of the higher Local Preference. This time the IGP metric isn't compared, the LP has way more priority than IGP metric (it's the second most influential path attribute after Weight).