Implementing Site-to-Site IPsec VPN with crypto map (Part 2): Troubleshooting

We continue the previous IPsec lab with basic troubleshooting. If you haven't read part 1, read that post first before you continue.

IPsec topology

Basic Troubleshooting and how to decode the debugs and show commands

So what are the show commands and how can we verify whether our IPsec VPN is working properly? The most important show command for Phase 1 verification is the following:

R1#show crypto isakmp sa
IPv4 Crypto ISAKMP SA
dst             src             state          conn-id status
100.23.0.3      100.12.0.1      QM_IDLE           1001 ACTIVE

The state of the remote peer should be QM_IDLE, if they've successfully built the Phase 1 (ISAKMP) tunnel. This indicates the the peers could successfully agree on the four attributes (encryption, hashing, DH and authentication) and have successfully authenticated each other (Main Mode, 6 message exchanged). But QM_IDLE does not necessarily mean that the Phase 2 tunnel has been built and we could actually send encrypted data packets!

If the output of this command doesn't show anything, that could indicate a routing issue: remember that routing always happens first: the router does a lookup in the RIB before reaching out the tunnel endpoint. If R1 couldn't find a matching entry for subnet 172.16.2.0/16 in his RIB then the initiation of the IPsec tunnel won't get triggered. Same thing happens if the crypto map has not been applied to the correct outgoing interface, or the ACL is incorrect.

If the state of the Phase 1 tunnel is QM_IDLE, we can go further. To verify Phase 2 use the following command:

R1#show crypto ipsec sa

interface: GigabitEthernet0/0
    Crypto map tag: S2S_VPN, local addr 100.12.0.1

   protected vrf: (none)
   local  ident (addr/mask/prot/port): (192.168.1.0/255.255.255.0/0/0)
   remote ident (addr/mask/prot/port): (172.16.2.0/255.255.255.0/0/0)
   current_peer 100.23.0.3 port 500
     PERMIT, flags={origin_is_acl,}
    #pkts encaps: 104, #pkts encrypt: 104, #pkts digest: 104
    #pkts decaps: 104, #pkts decrypt: 104, #pkts verify: 104
    #pkts compressed: 0, #pkts decompressed: 0
    #pkts not compressed: 0, #pkts compr. failed: 0
    #pkts not decompressed: 0, #pkts decompress failed: 0
    #send errors 0, #recv errors 0

     local crypto endpt.: 100.12.0.1, remote crypto endpt.: 100.23.0.3
     plaintext mtu 1452, path mtu 1500, ip mtu 1500, ip mtu idb GigabitEthernet0/0
     current outbound spi: 0x970D0032(2534211634)
     PFS (Y/N): N, DH group: none

     inbound esp sas:
          
     inbound ah sas:
      spi: 0x790108EE(2030110958)
        transform: ah-sha256-hmac ,
        in use settings ={Tunnel, }
        conn id: 1, flow_id: SW:1, sibling_flags 80004050, crypto map: S2S_VPN
        sa timing: remaining key lifetime (k/sec): (4335687/3377)
        replay detection support: Y
        Status: ACTIVE(ACTIVE)

     inbound pcp sas:

     outbound esp sas:

     outbound ah sas:
      spi: 0x970D0032(2534211634)
        transform: ah-sha256-hmac ,
        in use settings ={Tunnel, }
        conn id: 2, flow_id: SW:2, sibling_flags 80004050, crypto map: S2S_VPN
        sa timing: remaining key lifetime (k/sec): (4335687/3377)
        replay detection support: Y
        Status: ACTIVE(ACTIVE)

     outbound pcp sas:

Here we should have two SAs: an inbound and an outgoing SA. This can be either ESP or an AH SA (depending on what we've defined with the transform-set) with a unique SPI. Under the SPI we can see the transform-set we're using to protect the data packets. In this example we use AH with SHA-256 for hashing in tunnel mode. Besides that on the top of the show command notice the #pkts encaps and the #pkts decaps counters: these numbers should increase if packets are sent between the two sites.

What if the transform-sets don't match?

Let's actually change the transform-set on one side to demonstrate this:

R3(config)#crypto ipsec transform-set BAD_TR esp-aes 256 esp-sha-hmac 
R3(config)#crypto map S2S_VPN 1 ipsec-isakmp 
R3(config-crypto-map)#set transform-set BAD_TR

On R3 I created a new transform-set with ESP for encryption and hashing (on R1 we only use AH), and I cleared the SAs:

R3#clear crypto sa

Now if R4 tries to ping R5:

R4#ping 172.16.2.5
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 172.16.2.5, timeout is 2 seconds:
.....
Success rate is 0 percent (0/5)

The pings fail. The output of the show crypto isakmp sa still shows QM_IDLE:

R1#show crypto isakmp sa
IPv4 Crypto ISAKMP SA
dst             src             state          conn-id status
100.23.0.3      100.12.0.1      QM_IDLE           1001 ACTIVE

Phase 1 is up, but we couldn't send data packets between the two sites. The first suspect in this situation should be the transform-set: they should match on both peers. If they don't match we won't have inbound/outbound SAs, but we'll have just empty rows:

R1#show crypto ipsec sa | sec sas
     inbound esp sas:
     inbound ah sas:
     inbound pcp sas:
     outbound esp sas:
     outbound ah sas:
     outbound pcp sas:

If we also run the debug crypto isakmp as well, we'll see the following message which gives a very accurate explanation what the problem is:

ISAKMP-ERROR: (1001):phase 2 SA policy not acceptable! (local 100.23.0.3 remote 100.12.0.1)

What if the PSKs don't match?

Let's demonstrate this. What if a sysadmin on one side had a typo and the pre-shared keys don't match?

R3(config)#no crypto isakmp key cisco123 address 100.12.0.1     
R3(config)# crypto isakmp key cisco12 address 100.12.0.1        
R3#clear crypto isakmp 

Before going though the show run commands and calling the remote site, let's actually verify that the keys don't match with the debug. In this case we'll have many retransmission messages like these:

ISAKMP-PAK: (1002):received packet from 100.12.0.1 dport 500 sport 500 Global (R) MM_KEY_EXCH
ISAKMP: (1002):phase 1 packet is a duplicate of a previous packet.
ISAKMP: (1002):retransmitting due to retransmit phase 1
ISAKMP: (1002):retransmitting phase 1 MM_KEY_EXCH...
ISAKMP: (1002):: incrementing error counter on sa, attempt 4 of 5: retransmit phase 1
ISAKMP: (1002):retransmitting phase 1 MM_KEY_EXCH
ISAKMP-PAK: (1002):sending packet to 100.12.0.1 my_port 500 peer_port 500 (R) MM_KEY_EXCH
ISAKMP: (1002):Sending an IKE IPv4 Packet.

and messages like these:

ISAKMP-ERROR: (1002):deleting SA reason "Death by retransmission P1" state (R) MM_KEY_EXCH (peer 100.12.0.1)
ISAKMP-ERROR: (1002):deleting SA reason "Death by retransmission P1" state (R) MM_KEY_EXCH (peer 100.12.0.1) 
ISAKMP: (1002):Deleting the unauthenticated sa

"unauthenticated sa" indicates that something is wrong with the authentication. The state of MM_KEY_EXCH indicates that the peers could successfully agree on the four main attributes (encryption, authentication method [NOT the actual PSK!], hashing and DH), but they just "stuck" in this state and can't pass step 3 (message five and six) where the peers authenticate each other. The output of the show crypto isakmp sa shows MM_NO_STATE:

R3#show crypto isakmp sa
IPv4 Crypto ISAKMP SA
dst             src             state          conn-id status
100.23.0.3      100.12.0.1      MM_NO_STATE       1002 ACTIVE (deleted)

What if the four main Phase 1 attributes don't match?

Let's change the DH Group on R3, I leave the other attributes unchanged:

R3(config)#crypto isakmp policy 10
R3(config-isakmp)#group 5

The debug messages are very straightforward here:

ISAKMP: (0):Checking ISAKMP transform 1 against priority 10 policy
ISAKMP: (0):      encryption AES-CBC
ISAKMP: (0):      keylength of 256
ISAKMP: (0):      hash SHA256
ISAKMP: (0):      default group 19
ISAKMP: (0):      auth pre-share
ISAKMP: (0):      life type in seconds
ISAKMP:      life duration (VPI) of  0x0 0x1 0x51 0x80 
ISAKMP-ERROR: (0):Diffie-Hellman group offered does not match policy!
ISAKMP-ERROR: (0):atts are not acceptable. Next payload is 0
ISAKMP-ERROR: (0):no offers accepted!
ISAKMP-ERROR: (0):phase 1 SA policy not acceptable! (local 100.23.0.3 remote 100.12.0.1)

ISAKMP-ERROR: (0):deleting SA reason "Phase1 SA policy proposal not accepted" state (R) MM_NO_STATE (peer 100.12.0.1)

Remember that all main four attributes have to match, in this case I only changed the DH Group on one side. The debug message "Diffie-Hellman group offered does not match policy!" indicates that we should check the DH Groups in the Phase 1 ISAKMP policy.