Implementing OSPF Sham-link

We're going to continue the previous inter-AS L3VPN Option C lab, and in this post we're going to be focusing on customer GREEN who runs OSPF as the PE-CE routing protocol at both sites.

I shut down R14, R5, R11 and R15, they won't be relevant in this lab. As I've configured in the inter-AS L3VPN Option C lab, we have an end-to-end LSP between R6 and R12. If we check the routing table of R13, we can see that we learned the routes of the remote site as 'O IA' inter-area routes:

R13#show ip route ospf | begin Gate
Gateway of last resort is not set

      16.0.0.0/32 is subnetted, 1 subnets
O IA     16.16.16.16 [110/2] via 172.2.6.6, 00:01:41, GigabitEthernet0/1
      172.2.0.0/16 is variably subnetted, 6 subnets, 2 masks
O IA     172.2.11.0/24 [110/2] via 172.2.6.6, 00:01:41, GigabitEthernet0/1
O IA     172.2.12.0/24 [110/2] via 172.2.6.6, 00:01:41, GigabitEthernet0/1

OSPF Domain-ID

If we use OSPF as the PE-CE routing protocol we have two choices: we either learn the remote prefixes as Type-3 Summary LSAs or as Type-5 External LSAs. So we either have 'O IA' or 'O E2' routes in the RIB. We simply cannot learn these routes as intra-area, even if the two remote sites are in the same area. So how is this decided if they are inter-area or external? It basically depends on the OSPF Domain-ID which is an extended BGP community sent between the two PEs. If the Domain-IDs are the same, Type-3 Summary LSAs will be created. If they are different, then the remote PE creates Type-5 External LSAs. We can check the Domain-ID in the BGP table, the remote PE sends this as an extended community:

R6#show bgp vpnv4 unicast all 16.16.16.16/32
BGP routing table entry for 16:2:16.16.16.16/32, version 20
Paths: (1 available, best #1, table GREEN)
Flag: 0x100
  Not advertised to any peer
  Refresh Epoch 1
  712, imported path from 16:22:16.16.16.16/32 (global)
    10.12.12.12 (metric 1) (via default) from 10.4.4.4 (10.4.4.4)
      Origin incomplete, metric 0, localpref 100, valid, internal, best
      Extended Community: RT:16:20 OSPF DOMAIN ID:0x0005:0x000000020200 
        OSPF RT:0.0.0.0:2:0 OSPF ROUTER ID:172.2.12.12:0
      mpls labels in/out nolabel/12017
      rx pathid: 0, tx pathid: 0x0

Here we have it in hex, if you take a pcap with Wireshark, it'll be actually displayed in decimal, so don't be confused, they are the same:

BGP Update with extended communities: OSPF Domain-ID
BGP Update with extended communities: OSPF Domain-ID

If not configured explicitly with the domain-id command, the Domain-ID is actually based on the OSPF process ID. In this case at both sites we have OSPF process 2 configured for the GREEN VRF, so they are the same. Let's actually change the Domain-ID manually, so that they'll be different:

R12(config)#router ospf 2 vrf GREEN
R12(config-router)#domain-id type ?
  0005  Type 0x0005
  0105  Type 0x0105
  0205  Type 0x0205
  8005  Type 0x8005
R12(config-router)#domain-id 0.0.0.1

We can also change the domain-id type, which is 0x0005 by default. As you can see this value is at the beginning of the Domain-ID. What these values mean doesn't really matter that much right now, just remember that 0x0005 is used by default on Cisco IOS devices which means we use standard redistribution between BGP and OSPF. After I changed this to 0.0.0.1 on R12, R6 receives the prefix with a different Domain-ID:

R6#show bgp vpnv4 unicast all 16.16.16.16/32
BGP routing table entry for 16:2:16.16.16.16/32, version 26
Paths: (1 available, best #1, table GREEN)
  Not advertised to any peer
  Refresh Epoch 1
  712, imported path from 16:22:16.16.16.16/32 (global)
    10.12.12.12 (metric 1) (via default) from 10.4.4.4 (10.4.4.4)
      Origin incomplete, metric 0, localpref 100, valid, internal, best
      Extended Community: RT:16:20 OSPF DOMAIN ID:0x0005:0x000000010200 
        OSPF RT:0.0.0.0:2:0 OSPF ROUTER ID:172.2.12.12:0
      mpls labels in/out nolabel/12017
      rx pathid: 0, tx pathid: 0x0

And because of this, he creates Type-5 External LSAs for R13:

R13#show ip route ospf | begin Gate
Gateway of last resort is not set

      16.0.0.0/32 is subnetted, 1 subnets
O E2     16.16.16.16 [110/1] via 172.2.6.6, 00:00:59, GigabitEthernet0/1
      172.2.0.0/16 is variably subnetted, 6 subnets, 2 masks
O E2     172.2.11.0/24 [110/1] via 172.2.6.6, 00:00:59, GigabitEthernet0/1
O E2     172.2.12.0/24 [110/1] via 172.2.6.6, 00:00:59, GigabitEthernet0/1

Let's also change the Domain-ID on R6 to 0.0.0.1:

R6(config)#router ospf 2 vrf GREEN
R6(config-router)#domain-id 0.0.0.1

And now both PE routers create Type-3 Summary LSA for the CE devices, so we have inter-area routes (O IA) on the customer side.

OSPF Sham-link

Next we change the topology a bit, and deploy a new router (R17), and create a backdoor link between the two CE routers:

OSPF Sham-link topology
We create a backdoor link between R13 and R16

Let's say we want to use this link just as a backup, and the MPLS infrastructures as the primary path. I configured R17 in area 0, just as the links between the PE-CE routers, basically everything is in area 0 on each router. The problem is that the prefixes the CE routers learn from R17 are going to be inrea-area (Type-1 and Type-2 LSAs), as opposed to the prefixes CEs learn from the PEs which are going to be inter-area (Type-3 Summary LSAs). Even if I increase the OSPF cost on the links between R13-R17 and R16-R17, the routers are going to be prefer the intra-area route to the inter-area route because of OSPF path selection rules:

R13(config)#int g0/2
R13(config-if)#ip ospf cost 65534
R13(config-if)#do show ip route ospf | begin Gateway
Gateway of last resort is not set

      16.0.0.0/32 is subnetted, 1 subnets
O        16.16.16.16 
           [110/65536] via 192.13.17.17, 00:01:34, GigabitEthernet0/2
      17.0.0.0/32 is subnetted, 1 subnets
O        17.17.17.17 
           [110/65535] via 192.13.17.17, 00:01:34, GigabitEthernet0/2
      172.2.0.0/16 is variably subnetted, 6 subnets, 2 masks
O        172.2.11.0/24 
           [110/65536] via 192.13.17.17, 00:01:34, GigabitEthernet0/2
O        172.2.12.0/24 
           [110/65536] via 192.13.17.17, 00:01:34, GigabitEthernet0/2
O     192.16.17.0/24 
           [110/65535] via 192.13.17.17, 00:01:34, GigabitEthernet0/2

As it can be seen above, everything is 'O' intra-area route via R17. We can't change the OSPF path selection rules, intra-area routes are always preferred to inter-area routes, even if they have a higher cost. What we have to do is to make routes from the MPLS cloud intra-area routes as well, and then we can do some OSPF traffic engineering by modifying the costs. That is what the sham-link is used for. The sham-link is similar to the OSPF virtual-link, it is a tunnel connecting the two PE routers, and it is configured in the same area. The two PEs establish an OSPF adjacency and exchange LSAs with each other, we basically extend area 0 through the MPLS core to the remote PE. The difference is that the virtual-link is configured between OSPF router-ids and spans a single transit area, the source and destination of the sham-link are actually loopback addresses configured in the GREEN VRF in this case.

Sham-link configuration

First we create a new loopback interface on both PEs which we associate with the GREEN VRF. It's important that the loopbacks must be a /32 addresses on the PE routers. Next we advertise these new loopbacks with BGP to the remote PE:

R6(config)#int lo1
R6(config-if)#vrf forwarding GREEN
R6(config-if)#ip addr 200.6.6.6 255.255.255.255
R6(config)#router bgp 16
R6(config-router)#address-family ipv4 unicast vrf GREEN
R6(config-router-af)#network 200.6.6.6 mask 255.255.255.255

R12(config)#int lo1
R12(config-if)#vrf forwarding GREEN
R12(config-if)#ip address 200.16.16.16 255.255.255.255
R12(config-if)#router bgp 712
R12(config-router)#address-family ipv4 unicast vrf GREEN
R12(config-router-af)#network 200.12.12.12 mask 255.255.255.255

Next we create the sham-link:

R6(config)#router ospf 2 vrf GREEN
R6(config-router)#area 0 sham-link ?
  A.B.C.D  IP addr associated with sham-link source

R6(config-router)#area 0 sham-link 200.6.6.6 ?
  A.B.C.D  IP addr associated with sham-link destination

R6(config-router)#area 0 sham-link 200.6.6.6 200.12.12.12 ?
  cost          Associate a cost with the sham-link
  ttl-security  TTL security check
  <cr>          <cr>

R6(config-router)#area 0 sham-link 200.6.6.6 200.12.12.12 cost 10

R12(config-router)#area 0 sham-link 200.12.12.12 200.6.6.6 cost 10

The source of the sham-link is the local PE's new loopback address, and the destination is the remote PE's loopback. We can also associate a cost with the sham-link if we want. The sham-link configuration on R12 is just basically the mirror image: the source is 200.12.12.12 and the destination is 200.6.6.6. Now if you issue the show ip ospf sham-links command, the sham-link should come up. In my case it did not. Why? It's important to remember that these new addresses should ONLY be advertised by BGP, these prefixes SHOULD NOT be advertised into OSPF on the PEs. And this was the problem in my case. So I created a route-map and filtered these prefixes out:

R6(config)#ip prefix-list DENY_LO1 deny 200.6.6.6/32
R6(config)#ip prefix-list DENY_LO1 deny 200.12.12.12/32
R6(config)#ip prefix-list DENY_LO1 permit 0.0.0.0/0 le 32
R6(config)#route-map NO_LO1 permit 10
R6(config-route-map)#match ip addr prefix-list DENY_LO1
R6(config)#router ospf 2 vrf GREEN                       
R6(config-router)#redistribute bgp 16 subnets route-map NO_LO1  

R12(config)#ip prefix-list DENY_LO1 seq 5 deny 200.6.6.6/32
R12(config)#ip prefix-list DENY_LO1 seq 10 deny 200.12.12.12/32
R12(config)#ip prefix-list DENY_LO1 seq 15 permit 0.0.0.0/0 le 32
R12(config)#route-map NO_LO1 permit 10
R12(config-route-map)#match ip addr prefix-list DENY_LO1
R12(config)#router ospf 2 vrf GREEN
R12(config-router)#redistribute bgp 712 subnets route-map NO_LO1

Now R17 doesn't know 200.6.6.6/32 and 200.12.12.12/32, and the PEs can only reach the loopback1 address of each other via BGP. At this point the sham-link should come up:

R6#show ip ospf sham-links 
Sham Link OSPF_SL0 to address 200.12.12.12 is up
Area 0 source address 200.6.6.6
  Run as demand circuit
  DoNotAge LSA allowed. Cost of using 10 State POINT_TO_POINT,
  Timer intervals configured, Hello 10, Dead 40, Wait 40,
    Hello due in 00:00:07
    Adjacency State FULL (Hello suppressed)
    Index 1/2/2, retransmission queue length 0, number of retransmission 2
    First 0x0(0)/0x0(0)/0x0(0) Next 0x0(0)/0x0(0)/0x0(0)
    Last retransmission scan length is 2, maximum is 2
    Last retransmission scan time is 0 msec, maximum is 0 msec

And because we increased the cost of the backdoor link previously, now we can reach the remote site through the MPLS infrastructure, since now we receive intra-area routes from the PE router:

R13#show ip route ospf | beg Gate
Gateway of last resort is not set

      16.0.0.0/32 is subnetted, 1 subnets
O        16.16.16.16 [110/13] via 172.2.6.6, 00:07:58, GigabitEthernet0/1
      17.0.0.0/32 is subnetted, 1 subnets
O        17.17.17.17 
           [110/65535] via 192.13.17.17, 00:30:51, GigabitEthernet0/2
      172.2.0.0/16 is variably subnetted, 6 subnets, 2 masks
O        172.2.11.0/24 [110/13] via 172.2.6.6, 00:07:58, GigabitEthernet0/1
O        172.2.12.0/24 [110/12] via 172.2.6.6, 00:07:58, GigabitEthernet0/1
O     192.16.17.0/24 
           [110/65535] via 192.13.17.17, 00:30:51, GigabitEthernet0/2

We can also verify with a traceroute:

R13#traceroute 16.16.16.16 source lo0 numeric 
Type escape sequence to abort.
Tracing the route to 16.16.16.16
VRF info: (vrf in name/id, vrf out name/id)
  1 172.2.6.6 3 msec 2 msec 2 msec
  2 10.0.36.3 [MPLS: Labels 3015/12017 Exp 0] 5 msec 6 msec 4 msec
  3 10.0.13.1 [MPLS: Labels 1017/12017 Exp 0] 5 msec 5 msec 6 msec
  4 100.1.17.7 [MPLS: Labels 7017/12017 Exp 0] 5 msec 5 msec 5 msec
  5 10.7.9.9 [MPLS: Labels 9014/12017 Exp 0] 5 msec 5 msec 6 msec
  6 172.2.12.12 [MPLS: Label 12017 Exp 0] 5 msec 5 msec 5 msec
  7 172.2.12.16 6 msec *  5 msec

We can verify that the traffic goes though the MPLS cloud. Look at the VPN labels: they are the same, it looks like we have an intra-AS configuration from the customer's perspective, but behind the scenes we know that this is an inter-AS Option C implementation.