Archive for the 'Work' category

Manage FreeIPA certificates with ACME on vSphere (or not)

 | 4 May 2023 16:28

Some days ‘in the office’ are great, others … not so much. But we learn from both, right?!

While writing this article, I ended up ditching the original plan and gave up altogether on running an ACME client on vCenter. Instead, I’m going to use an IPA client to request the certificate and upload it through the REST API, just like I do for FreeIPA certificates for Palo Alto firewalls and Panorama. But, you’ll have to wait for the next article to see how that works out.

My attempts (original ramblings)

This article gave me the inspiration to use ACME to automate FreeIPA certificate renewal for vCenter (vSphere Client UI).

However, I ran into some issues. First, with acme.sh as I couldn’t work out how to get it to update the DNS record on FreeIPA. Let me explain the requirement for dns-01 when using an ACME client on vCenter.

ACME certificate verification requires a challenge-response to authorise the certificate request, else anyone could request a certificate for any domain they like completely defeating the point of SSL as a mechanism of verifying source authenticity. FreeIPA ACME supports two mechanisms for this challenge-response: http-01 and dns-01. http-01 is based on the client hosting a temporary key on port 80 on the domain the certificate is being requested for, thus the ACME client does not need to communicate with the IPA server nor does the client need to be IPA enrolled. dns-01 on the other hand requires the client to control a TXT record in the domain’s DNS zone. If IPA is the nameserver for this zone, then the client will need an IPA account and privileges to manage the DNS entries. The default means of authenticating is Kerberos.

If not acme.sh, then what?

Well, acme.sh isn’t the only ACME client out there and over the last couple of years, I’ve actually shifted some deployments of acme.sh to dehydrated, which supports hooks for ACME servers. Enter ipa-dns-hook by Jimmy Hedman, which uses the Python library HTTPKerberosAuth to authenticate against IPA using stored credentials. I’m not a fan of storing credentials in a text file, but IPA does have decent controls for delegating privileges, which should limit the impact if the account was ever compromised.

What about IPA principals for certificates?

It’s important to realise that traditional certificate requests (IPA web-UI, ipa-getcert etc.) require an IPA principal for the FQDN, ACME does not. Note that the validity lengths are different as well: ‘Traditional’ IPA host and service certificates are valid for 2 years while ACME certificates are valid for 90 days.

How can an ACME client create/change DNS records on a FreeIPA server?

Fraser Tweedale gives this example using a custom hook for Certbot, which uses the ipapython Python library to edit DNS records with Kerberos authentication. You can download his certbot-dns-ipa.py GitHub Gist here. The code is small, this is it in its entirety:

#!/usr/bin/python3
import os
from dns import resolver
from ipalib import api 
from ipapython import dnsutil

certbot_domain = os.environ['CERTBOT_DOMAIN']
certbot_validation = os.environ['CERTBOT_VALIDATION']
if 'CERTBOT_AUTH_OUTPUT' in os.environ:
    command = 'dnsrecord_del'
else:
    command = 'dnsrecord_add'

validation_domain = f'_acme-challenge.{certbot_domain}'
fqdn = dnsutil.DNSName(validation_domain).make_absolute()
zone = dnsutil.DNSName(resolver.zone_for_name(fqdn))
name = fqdn.relativize(zone)

api.bootstrap(context='cli')
api.finalize()
api.Backend.rpcclient.connect()

api.Command[command](zone, name, txtrecord=[certbot_validation], dnsttl=60)

And Jimmy Hedman kindly provides this hook for dehydrated, which uses json calls to IPA using either a userid and password or Kerberos. The relevant section:

def _call_freeipa(json_operation):
    headers = {'content-type': 'application/json',
               'referer': 'https://%s/ipa' % IPA_SERVER}
    if IPA_USER:
        # Login and keep a cookie
        login_result = requests.post("https://%s/ipa/session/login_password" % IPA_SERVER,
                                     data="user=%s&password=%s" % (IPA_USER, IPA_PASSWORD),
                                     headers={'content-Type':'application/x-www-form-urlencoded',
                                              'referer': 'https://%s/ipa' % IPA_SERVER},
                                     verify='/etc/ipa/ca.crt')

        # No auth
        auth = None
        # Use cookies
        cookies=login_result.cookies
    else:
        # Use kerberos authentication
        auth = HTTPKerberosAuth(mutual_authentication=REQUIRED,
                                sanitize_mutual_error_response=False)
        # No cookies
        cookies = None

    result = requests.post("https://%s/ipa/session/json" % IPA_SERVER,
                           data=json_operation,
                           headers=headers,
                           auth=auth,
                           cookies=cookies,
                           verify='/etc/ipa/ca.crt')

    retval = result.json()

While these two examples are by no means exhaustive, they do shed light on what’s possible using different approaches.

The (not) solution?

While dehydrated seemed to be an obvious candidate. I’ve used it elsewhere, it runs in plain bash, doesn’t require much and is extendible with hooks. But this is where the dream ended for me, because though dehydrated will run on vCenter, the hook won’t due to requiring the requests-kerberos Python library: It won’t build on my vCenter 7.0.

So while ipa-dns-hook allows for the use of dehydrated on systems that do not have native Kerberos ability by using a userid and password. I’m not prepared to hack vCenter to the point where things might break or official support could end up being affected.

Install dehydrated on vCenter 7

If you want to try it yourself then this is how to get dehydrated and the hook onto vCenter and install the requirements for ipa-dns-hook

WARNING – try this at your own peril!

# Assumed login as root
yum install python3-pip
mkdir -p ~/dehydrated/hooks/ipa-dns && cd ~/dehydrated
wget https://github.com/dehydrated-io/dehydrated/raw/master/dehydrated
chmod +x dehydrated
wget -P hooks/ipa-dns https://github.com/HeMan/ipa-dns-hook/raw/master/{ipa-dns-hook.py,requirements.txt}
pip install -r hooks/ipa-dns/requirements.txt

The problem

requests-kerberos wants to install or update wheels and then things break catastrophically. It didn’t break vCenter for me, but we don’t end up with requests-kerberos either.

root@vcenter [ ~/dehydrated ]# pip3 install -r hooks/ipa-dns/requirements.txt
Requirement already satisfied: requests>=2.21.0 in /usr/lib/python3.7/site-packages (from -r hooks/ipa-dns/requirements.txt (line 1)) (2.24.0)
Collecting requests-kerberos>=0.12.0
  Downloading requests_kerberos-0.14.0-py2.py3-none-any.whl (11 kB)
Requirement already satisfied: urllib3>=1.25.2 in /usr/lib/python3.7/site-packages (from -r hooks/ipa-dns/requirements.txt (line 3)) (1.25.11)
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/lib/python3.7/site-packages (from requests>=2.21.0->-r hooks/ipa-dns/requirements.txt (line 1)) (3.0.4)
Requirement already satisfied: idna<3,>=2.5 in /usr/lib/python3.7/site-packages (from requests>=2.21.0->-r hooks/ipa-dns/requirements.txt (line 1)) (2.7)
Requirement already satisfied: certifi>=2017.4.17 in /usr/lib/python3.7/site-packages (from requests>=2.21.0->-r hooks/ipa-dns/requirements.txt (line 1)) (2018.8.24)
Collecting pyspnego[kerberos]
  Downloading pyspnego-0.9.0-py3-none-any.whl (132 kB)
     |????????????????????????????????| 132 kB 10.6 MB/s
Requirement already satisfied: cryptography>=1.3 in /usr/lib/python3.7/site-packages (from requests-kerberos>=0.12.0->-r hooks/ipa-dns/requirements.txt (line 2)) (2.8)
Requirement already satisfied: six>=1.4.1 in /usr/lib/python3.7/site-packages (from cryptography>=1.3->requests-kerberos>=0.12.0->-r hooks/ipa-dns/requirements.txt (line 2)) (1.12.0)
Requirement already satisfied: cffi!=1.11.3,>=1.8 in /usr/lib/python3.7/site-packages (from cryptography>=1.3->requests-kerberos>=0.12.0->-r hooks/ipa-dns/requirements.txt (line 2)) (1.11.5)
Requirement already satisfied: pycparser in /usr/lib/python3.7/site-packages (from cffi!=1.11.3,>=1.8->cryptography>=1.3->requests-kerberos>=0.12.0->-r hooks/ipa-dns/requirements.txt (line 2)) (2.18)
Collecting krb5>=0.3.0
  Downloading krb5-0.5.0.tar.gz (220 kB)
     |????????????????????????????????| 220 kB 14.0 MB/s
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Installing backend dependencies ... done
    Preparing wheel metadata ... done
Collecting gssapi>=1.6.0
  Downloading gssapi-1.8.2.tar.gz (94 kB)
     |????????????????????????????????| 94 kB 3.9 MB/s
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Installing backend dependencies ... done
    Preparing wheel metadata ... done
Collecting decorator
  Downloading decorator-5.1.1-py3-none-any.whl (9.1 kB)
Building wheels for collected packages: gssapi, krb5
  Building wheel for gssapi (PEP 517) ... error
  ERROR: Command errored out with exit status 1:
   command: /bin/python3 /usr/lib/python3.7/site-packages/pip/_vendor/pep517/in_process/_in_process.py build_wheel /tmp/tmptj6zqrpb
       cwd: /tmp/pip-install-rgl5qe_t/gssapi_84b8dab848e24790aee0f6754af32bc2
  Complete output (58 lines):
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib.linux-x86_64-cpython-37
  creating build/lib.linux-x86_64-cpython-37/gssapi
  copying gssapi/sec_contexts.py -> build/lib.linux-x86_64-cpython-37/gssapi
  copying gssapi/names.py -> build/lib.linux-x86_64-cpython-37/gssapi
  copying gssapi/mechs.py -> build/lib.linux-x86_64-cpython-37/gssapi
  copying gssapi/exceptions.py -> build/lib.linux-x86_64-cpython-37/gssapi
  copying gssapi/creds.py -> build/lib.linux-x86_64-cpython-37/gssapi
  copying gssapi/_win_config.py -> build/lib.linux-x86_64-cpython-37/gssapi
  copying gssapi/_utils.py -> build/lib.linux-x86_64-cpython-37/gssapi
  copying gssapi/__init__.py -> build/lib.linux-x86_64-cpython-37/gssapi
  creating build/lib.linux-x86_64-cpython-37/gssapi/raw
  copying gssapi/raw/named_tuples.py -> build/lib.linux-x86_64-cpython-37/gssapi/raw
  copying gssapi/raw/__init__.py -> build/lib.linux-x86_64-cpython-37/gssapi/raw
  creating build/lib.linux-x86_64-cpython-37/gssapi/raw/_enum_extensions
  copying gssapi/raw/_enum_extensions/__init__.py -> build/lib.linux-x86_64-cpython-37/gssapi/raw/_enum_extensions
  creating build/lib.linux-x86_64-cpython-37/gssapi/tests
  copying gssapi/tests/test_raw.py -> build/lib.linux-x86_64-cpython-37/gssapi/tests
  copying gssapi/tests/test_high_level.py -> build/lib.linux-x86_64-cpython-37/gssapi/tests
  copying gssapi/tests/__init__.py -> build/lib.linux-x86_64-cpython-37/gssapi/tests
  copying gssapi/py.typed -> build/lib.linux-x86_64-cpython-37/gssapi
  copying gssapi/raw/types.pyi -> build/lib.linux-x86_64-cpython-37/gssapi/raw
  copying gssapi/raw/sec_contexts.pyi -> build/lib.linux-x86_64-cpython-37/gssapi/raw
  copying gssapi/raw/oids.pyi -> build/lib.linux-x86_64-cpython-37/gssapi/raw
  copying gssapi/raw/names.pyi -> build/lib.linux-x86_64-cpython-37/gssapi/raw
  copying gssapi/raw/misc.pyi -> build/lib.linux-x86_64-cpython-37/gssapi/raw
  copying gssapi/raw/message.pyi -> build/lib.linux-x86_64-cpython-37/gssapi/raw
  copying gssapi/raw/mech_krb5.pyi -> build/lib.linux-x86_64-cpython-37/gssapi/raw
  copying gssapi/raw/ext_set_cred_opt.pyi -> build/lib.linux-x86_64-cpython-37/gssapi/raw
  copying gssapi/raw/ext_s4u.pyi -> build/lib.linux-x86_64-cpython-37/gssapi/raw
  copying gssapi/raw/ext_rfc6680_comp_oid.pyi -> build/lib.linux-x86_64-cpython-37/gssapi/raw
  copying gssapi/raw/ext_rfc6680.pyi -> build/lib.linux-x86_64-cpython-37/gssapi/raw
  copying gssapi/raw/ext_rfc5801.pyi -> build/lib.linux-x86_64-cpython-37/gssapi/raw
  copying gssapi/raw/ext_rfc5588.pyi -> build/lib.linux-x86_64-cpython-37/gssapi/raw
  copying gssapi/raw/ext_rfc5587.pyi -> build/lib.linux-x86_64-cpython-37/gssapi/raw
  copying gssapi/raw/ext_rfc4178.pyi -> build/lib.linux-x86_64-cpython-37/gssapi/raw
  copying gssapi/raw/ext_password_add.pyi -> build/lib.linux-x86_64-cpython-37/gssapi/raw
  copying gssapi/raw/ext_password.pyi -> build/lib.linux-x86_64-cpython-37/gssapi/raw
  copying gssapi/raw/ext_krb5.pyi -> build/lib.linux-x86_64-cpython-37/gssapi/raw
  copying gssapi/raw/ext_iov_mic.pyi -> build/lib.linux-x86_64-cpython-37/gssapi/raw
  copying gssapi/raw/ext_ggf.pyi -> build/lib.linux-x86_64-cpython-37/gssapi/raw
  copying gssapi/raw/ext_dce_aead.pyi -> build/lib.linux-x86_64-cpython-37/gssapi/raw
  copying gssapi/raw/ext_dce.pyi -> build/lib.linux-x86_64-cpython-37/gssapi/raw
  copying gssapi/raw/ext_cred_store.pyi -> build/lib.linux-x86_64-cpython-37/gssapi/raw
  copying gssapi/raw/ext_cred_imp_exp.pyi -> build/lib.linux-x86_64-cpython-37/gssapi/raw
  copying gssapi/raw/exceptions.pyi -> build/lib.linux-x86_64-cpython-37/gssapi/raw
  copying gssapi/raw/creds.pyi -> build/lib.linux-x86_64-cpython-37/gssapi/raw
  copying gssapi/raw/chan_bindings.pyi -> build/lib.linux-x86_64-cpython-37/gssapi/raw
  running build_ext
  building 'gssapi.raw.misc' extension
  creating build/temp.linux-x86_64-cpython-37
  creating build/temp.linux-x86_64-cpython-37/gssapi
  creating build/temp.linux-x86_64-cpython-37/gssapi/raw
  x86_64-unknown-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -O2 -g -O2 -g -fPIC -Igssapi/raw -I./gssapi/raw -I/usr/include/python3.7m -c gssapi/raw/misc.c -o build/temp.linux-x86_64-cpython-37/gssapi/raw/misc.o
  error: command 'x86_64-unknown-linux-gnu-gcc' failed: No such file or directory: 'x86_64-unknown-linux-gnu-gcc'
  ----------------------------------------
  ERROR: Failed building wheel for gssapi
  Building wheel for krb5 (PEP 517) ... error
  ERROR: Command errored out with exit status 1:
   command: /bin/python3 /usr/lib/python3.7/site-packages/pip/_vendor/pep517/in_process/_in_process.py build_wheel /tmp/tmp63v6q_gf
       cwd: /tmp/pip-install-rgl5qe_t/krb5_1498451d36dc4f77a08707fabdd34711
  Complete output (76 lines):
  Using krb5-config at 'krb5-config'
  Using /usr/lib/libkrb5.so as Kerberos module for platform checks
  Compiling src/krb5/_ccache.pyx
  Compiling src/krb5/_ccache_mit.pyx
  Compiling src/krb5/_ccache_match.pyx
  Compiling src/krb5/_ccache_support_switch.pyx
  Compiling src/krb5/_cccol.pyx
  Compiling src/krb5/_context.pyx
  Compiling src/krb5/_context_mit.pyx
  Compiling src/krb5/_creds.pyx
  Compiling src/krb5/_creds_opt.pyx
  Skipping src/krb5/_creds_opt_heimdal.pyx as it is not supported by the selected Kerberos implementation.
  Compiling src/krb5/_creds_opt_mit.pyx
  Compiling src/krb5/_creds_opt_set_in_ccache.pyx
  Compiling src/krb5/_creds_opt_set_pac_request.pyx
  Compiling src/krb5/_exceptions.pyx
  Compiling src/krb5/_keyblock.pyx
  Compiling src/krb5/_kt.pyx
  Compiling src/krb5/_kt_mit.pyx
  Skipping src/krb5/_kt_heimdal.pyx as it is not supported by the selected Kerberos implementation.
  Compiling src/krb5/_kt_have_content.pyx
  Compiling src/krb5/_principal.pyx
  Skipping src/krb5/_principal_heimdal.pyx as it is not supported by the selected Kerberos implementation.
  Compiling src/krb5/_string.pyx
  Compiling src/krb5/_string_mit.pyx
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib.linux-x86_64-cpython-37
  creating build/lib.linux-x86_64-cpython-37/krb5
  copying src/krb5/__init__.py -> build/lib.linux-x86_64-cpython-37/krb5
  running egg_info
  writing src/krb5.egg-info/PKG-INFO
  writing dependency_links to src/krb5.egg-info/dependency_links.txt
  writing top-level names to src/krb5.egg-info/top_level.txt
  reading manifest file 'src/krb5.egg-info/SOURCES.txt'
  reading manifest template 'MANIFEST.in'
  warning: no previously-included files found matching '.coverage'
  warning: no previously-included files found matching '.gitignore'
  warning: no previously-included files found matching '.pre-commit-config.yaml'
  warning: no previously-included files matching '*.pyc' found under directory 'tests'
  adding license file 'LICENSE'
  writing manifest file 'src/krb5.egg-info/SOURCES.txt'
  copying src/krb5/_ccache.pyi -> build/lib.linux-x86_64-cpython-37/krb5
  copying src/krb5/_ccache_match.pyi -> build/lib.linux-x86_64-cpython-37/krb5
  copying src/krb5/_ccache_mit.pyi -> build/lib.linux-x86_64-cpython-37/krb5
  copying src/krb5/_ccache_support_switch.pyi -> build/lib.linux-x86_64-cpython-37/krb5
  copying src/krb5/_cccol.pyi -> build/lib.linux-x86_64-cpython-37/krb5
  copying src/krb5/_context.pyi -> build/lib.linux-x86_64-cpython-37/krb5
  copying src/krb5/_context_mit.pyi -> build/lib.linux-x86_64-cpython-37/krb5
  copying src/krb5/_creds.pyi -> build/lib.linux-x86_64-cpython-37/krb5
  copying src/krb5/_creds_opt.pyi -> build/lib.linux-x86_64-cpython-37/krb5
  copying src/krb5/_creds_opt_heimdal.pyi -> build/lib.linux-x86_64-cpython-37/krb5
  copying src/krb5/_creds_opt_mit.pyi -> build/lib.linux-x86_64-cpython-37/krb5
  copying src/krb5/_creds_opt_set_in_ccache.pyi -> build/lib.linux-x86_64-cpython-37/krb5
  copying src/krb5/_creds_opt_set_pac_request.pyi -> build/lib.linux-x86_64-cpython-37/krb5
  copying src/krb5/_exceptions.pyi -> build/lib.linux-x86_64-cpython-37/krb5
  copying src/krb5/_keyblock.pyi -> build/lib.linux-x86_64-cpython-37/krb5
  copying src/krb5/_kt.pyi -> build/lib.linux-x86_64-cpython-37/krb5
  copying src/krb5/_kt_have_content.pyi -> build/lib.linux-x86_64-cpython-37/krb5
  copying src/krb5/_kt_heimdal.pyi -> build/lib.linux-x86_64-cpython-37/krb5
  copying src/krb5/_kt_mit.pyi -> build/lib.linux-x86_64-cpython-37/krb5
  copying src/krb5/_principal.pyi -> build/lib.linux-x86_64-cpython-37/krb5
  copying src/krb5/_principal_heimdal.pyi -> build/lib.linux-x86_64-cpython-37/krb5
  copying src/krb5/_string.pyi -> build/lib.linux-x86_64-cpython-37/krb5
  copying src/krb5/_string_mit.pyi -> build/lib.linux-x86_64-cpython-37/krb5
  copying src/krb5/py.typed -> build/lib.linux-x86_64-cpython-37/krb5
  copying src/krb5/python_krb5.h -> build/lib.linux-x86_64-cpython-37/krb5
  running build_ext
  building 'krb5._ccache' extension
  creating build/temp.linux-x86_64-cpython-37
  creating build/temp.linux-x86_64-cpython-37/src
  creating build/temp.linux-x86_64-cpython-37/src/krb5
  x86_64-unknown-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -O2 -g -O2 -g -fPIC -Isrc/krb5 -I/usr/include/python3.7m -c src/krb5/_ccache.c -o build/temp.linux-x86_64-cpython-37/src/krb5/_ccache.o
  error: command 'x86_64-unknown-linux-gnu-gcc' failed: No such file or directory: 'x86_64-unknown-linux-gnu-gcc'
  ----------------------------------------
  ERROR: Failed building wheel for krb5
Failed to build gssapi krb5
ERROR: Could not build wheels for gssapi, krb5 which use PEP 517 and cannot be installed directly
root@vcenter [ ~/dehydrated ]#

The solution?

Use the REST API to install certificates. I figure that an additional benefit is that upgrades of vCenter won’t break or remove the ACME client. A downside is that the key will be stored on the IPA client machine.

Automating FreeIPA certificates on Palo Alto devices

 | 1 May 2023 10:12

pan_getcert is a script that uses ipa-getcert to request a certificate from FreeIPA CA and then uploads it via Palo Alto XML API to a Palo Alto firewall or Panorama, optionally updating one or two SSL/TLS Profiles with the new certificate and commits to activate the changes.

It’s hosted on GitHub: https://github.com/dmgeurts/getcert_paloalto


Introduction

Palo Alto SSL/TLS Profiles

Some uses of FreeIPA certificates on a Palo Alto firewall or Panorama:

  • Global Protect Gateway
  • Global Protect Portal
  • Management UI

Should one use an internal certificate for an external service?

There’s no need to get a publicly signed certificate as long as all Global Protect clients trust the FreeIPA (root) CA. A nice bonus is not having to permit inbound HTTP-01 traffic, which in Let’s Encrypt’s case is cloud-hosted (what else is hosted there?). Or exposing internal domains, see: Terence Eden’s Blog – Should you use Let’s Encrypt for internal hostnames?

FreeIPA CA

FreeIPA with Dogtag PKI supports certificate requests and renewals from Certmonger via ipa-getcert and since FreeIPA v4.9 also via ACME.

ACME vs Certmonger

Palo Alto firewalls nor Panorama natively support ACME, nor would I expect them to. For my lab environment, ipa-getcert is a natural choice as the server in use for certificate management is FreeIPA enrolled already, hence I have no need for anonymous ACME.

Prerequisites

pan_getcert uses ipa-getcert, the requirements are identical as far as FreeIPA is concerned:

  • An enrolled FreeIPA client with reachability to the Palo Alto firewall or Panorama.
    • Test with: nc -zv fw-mgmt.domain.local 443
    • pan-python installed
  • A manually added host (with the Service hostname).
    • The manual host must be ‘managed by’ the host on which pan_getcert will be executed.
  • A Service Principal for the service domain
    • The Service Principal must be ‘managed by’ the host on which pan_getcert will be executed.
  • An API key for the Palo Alto firewall or Panorama
    • Ideally, use a system account to tie the API key to, users tend to churn and break their API keys.
    • Store the API key in /etc/ipa/.panrc
  • IPA CA root certificate manually installed on the Palo Alto firewall or Panorama, as a trusted CA.

To install pan-python on Ubuntu 22.04:

sudo apt install python3-pip
sudo pip install pan-python

To generate the API key; first, create a user account for/on the Palo Alto and then run this from the Linux host:

panxapi.py -h PAN_MGMT_IP_OR_FQDN -l USERNAME:'PASSWORD' -k

Copy the key and paste it into /etc/ipa/.panrc as follows:

api_key=C2M1P2h1tDEz8zF3SwhF2dWC1gzzhnE1qU39EmHtGZM=

And secure the file:

sudo chmod 600 /etc/ipa/.panrc

To install getcert_paloalto:

wget https://github.com/dmgeurts/getcert_paloalto/edit/master/pan_{get,inst}cert
chmod +x pan_{get,inst}cert
sudo cp pan_{get,inst}cert /usr/local/bin/
rm pan_{get,inst}cert

Automating the two

Now that the scope is clear, it’s time to explain how getcert_paloalto automates certificate requests, deployment and renewals.

Certmonger supports has pre- and post-save commands, these can be used to run things like systemctl restart apache but also more complex commands like the name of a script with its various options. It’s this post-save option that automates the renewal process but also makes certificate deployment interesting as the same command is executed when the certificate is first created (the first save).

This is the reason there are two scripts for the solution. Also note that, as opposed to ACME, for example, no crontab entry is needed for the renewal of FreeIPA certificates. Cernmonger takes care of monitoring certificates and renewing them before they expire.

pan_getcert – Introduction

pan_getcert uses ipa-getcert (part of freeipa-client) to request a certificate from IPA CA and then sets the post-save command to pan_instcert with options based on the parameters parsed to pan_getcert. The nodes involved and the processes used are as follows:

Process overview: FreeIPA <-- Linux --> Palo Alto.

*) My OS of choice is Ubuntu hence the Ubuntu logo for the FreeIPA client machine.

pan_getcert – Options

The bare minimum options to parse are the certificate Subject (aka Common Name) [-c] and the Palo Alto device hostname.

Note: pan_getcert must be run as root, ideally using sudo. This enables admins to give certificate managers the delegated privilege to run pan_getcert as root rather than all the commands in it that require elevated privileges.

Optional arguments:

  • -n Certificate name in the Palo Alto configuration, if none is given the Certificate Subject will be used.
  • -Y Certificate name postfix of a four-digit year, prevents the existing certificate from being replaced. <certificate.name_2023>
  • -p Name of the ‘primary’ SSL/TLS Profile, will see the currently configured certificate replaced with the new certificate. if none is given, no SSL/TLS Profile will be updated.
  • -s Name of the ‘secondary’ SSL/TLS Profile, will see the currently configured certificate replaced with the new certificate. Requires [-p] to be set.

The secondary Profile option is useful in cases where the same certificate must be updated on two different SSL/TLS Profiles. It is not possible to request more than one certificate using pan_getcert from FreeIPA.

Usage: pan_getcert [-hv] -c CERT_CN [-n CERT_NAME] [-Y] [OPTIONS] FQDN
This script requests a certificate from FreeIPA using ipa-getcert and calls a partner
script to deploy the certificate to a Palo Alto firewall or Panorama.

    FQDN              Fully qualified name of the Palo Alto firewall or Panorama
                      interface. Must be reachable from this host on port TCP/443.
    -c CERT_CN        REQUIRED. Common Name (Subject) of the certificate (must be a
                      FQDN). Will also present in the certificate as a SAN.

OPTIONS:
    -n CERT_NAME      Name of the certificate in PanOS configuration. Defaults to the
                      certificate Common Name.
    -Y                Parsed to pan_instcert to append the current year '_YYYY' to
                      the certificate name.

    -p PROFILE_NAME   Apply the certificate to a (primary) SSL/TLS Service Profile.
    -s PROFILE_NAME   Apply the certificate to a (secondary) SSL/TLS Service Profile.

    -h                Display this help and exit.
    -v                Verbose mode.

pan_getcert – Actions

  1. Uses the privileges set in FreeIPA (managed by) to call ipa-getcert and request a certificate from FreeIPA.
  2. ipa-getcert will automatically renew a certificate when it’s due, as long as the FQDN DNS record resolves, and the host and Service Principal still exist in FreeIPA.
  3. Sets the post-save command to pan_instcert with the same parameters as issued to pan_getcert, for automated installation of renewed certificates.
    • Post-save will run on the first certificate save, using pan_instcert for certificate installation.

pan_instcert – Introduction

Uses panxapi.py from pan-python and can be used on its own. For example, if the certificate is created without pan_getcert. Or one might choose to use it as the post-save command of a certificate already monitored by Certmonger.

Note: pan_instcert must be run as root, ideally using sudo. This enables admins to give certificate managers the delegated privilege to run pan_instcert as root rather than all the commands in it that require elevated privileges.

pan_getcert – Options

pan_instcert options and arguments are deliberately identical to pan_getcert.

Usage: pan_instcert [-hv] -c CERT_CN [-n CERT_NAME] [OPTIONS] FQDN
This script uploads a certificate issued by ipa-getcert to a Palo Alto firewall
or Panorama and optionally adds it to up to two SSL/TLS Profiles.

    FQDN              Fully qualified name of the Palo Alto firewall or Panorama
                      interface. Must be reachable from this host on port TCP/443.
    -c CERT_CN        REQUIRED. Common Name (Subject) of the certificate, to find
                      the certificate and key files.

OPTIONS:
    -n CERT_NAME      Name of the certificate in PanOS configuration. Defaults to the
                      certificate Common Name.
    -Y                Append the current year '_YYYY' to the certificate name.

    -p PROFILE_NAME   Apply the certificate to a (primary) SSL/TLS Service Profile.
    -s PROFILE_NAME   Apply the certificate to a (secondary) SSL/TLS Service Profile.

    -h                Display this help and exit.
    -v                Verbose mode.

pan_instcert – Actions

  1. Randomly generates a certificate passphrase using “openssl rand”.
  2. Creates a temporary, password-protected PKCS12 cert file /tmp/getcert_pkcs12.pfx from the individual private and public keys issued by ipa-getcert.
  3. Uploads the temporary PKCS12 file to the firewall using the randomly-generated passphrase.
    • (Optionally) adds a year (in 4-digit notation) to the certificate name.
  4. Deletes the temporary PKCS12 certificate from the Linux host.
  5. (Optionally) applies the certificate to up to two SSL/TLS Profiles.
    • Single SSL/TLS Profile: For example for the Management UI SSL/TLS profile.
    • Two SSL/TLS Profiles: For example for GlobalProtect Portal and GlobalProtect Gateway SSL/TLS Profiles.
  6. Commits the candidate configuration (synchronously) and reports the commit result.
  7. Logs all output to `/var/log/pan_instcert.log`.

Command execution

The expected output when requesting a certificate with pan_getcert is:

$ sudo pan_getcert -v -c gp.domain.com -Y -p GP_PORTAL_PROFILE -s GP_EXT_GW_PROFILE fw01.domain.local
Certificate Common Name: gp.domain.com
  verbose=1
  CERT_CN: gp.domain.com
  CERT_NAME: gp.domain.com_2023
  PAN_FQDN: fw01.domain.local
Primary SSL/TLS Profile name: GP_PORTAL_PROFILE
Secondary SSL/TLS Profile name: GP_EXT_GW_PROFILE
New signing request "20230427151532" added.
Certificate requested for: gp.domain.com
  Certificate issue took 6 seconds, waiting for the post-save process to finish.
  Certificate install and commit by the post-save process on: fw01.domain.local took 84 seconds.
FINISHED: Check the Palo Alto firewall or Panorama to check the commit succeeded.

And pan_instcert will log to /var/log/pan_instcert.log.

[2023-04-27 15:15:33+00:00]: START of pan_instcert.
[2023-04-27 15:15:33+00:00]: Certificate Common Name: gp.domain.com
[2023-04-27 15:15:34+00:00]: XML API output for crt: <response status="success"><result>Successfully imported gp.domain.com_2023 into candidate configuration</result></response>
[2023-04-27 15:15:35+00:00]: XML API output for key: <response status="success"><result>Successfully imported gp.domain.com_2023 into candidate configuration</result></response>
[2023-04-27 15:15:35+00:00]: Finished uploading certificate: gp.domain.com_2023
[2023-04-27 15:15:37+00:00]: Starting commit, please be patient.
[2023-04-27 15:17:01+00:00]: commit: success: "Configuration committed successfully"
[2023-04-27 15:17:01+00:00]: The commit took 84 seconds to complete.
[2023-04-27 15:17:01+00:00]: END - Finished certificate installation to: fw01.domain.local

Both pan_getcert and pan_instcert will report back how long it took to do certain tasks:

  • pan_getcert
    • Time spent waiting for the certificate to be issued.
    • Time spent waiting for pan_instcert to complete.
  • pan_instcert
    • Time spent waiting for the commit to finish.

Logrotate for pan_instcert.log

This log file shouldn’t grow quickly unless there’s a problem or the number of monitored certificates grows very large. By default, IPA certificates have a two-year validity, thus monthly log lines will average at about 0.375 per certificate (9 lines / 24 months). Generally speaking, you should not expect this file to grow by more than a few lines a year.

The bigger risk is misconfiguration or changes of ipa-getcert and the Palo Alto API. And from experience, I can confirm that getcert is very persistent in retrying failed post-save commands. A few times while testing my code the log file grew to >20GB within minutes, containing only repeated pan_instcert usage instructions.

Count yourself warned! A suggestion for a logrotate.d file is:

# /etc/logrotate.d/pan_instcert
/var/log/pan_instcert.log { 
	missingok
	rotate 5
	yearly
	size 50M
	notifempty
	create
}

Verify Certificate Status and Post-save Command

Use ipa-getcert to check that the certificate ended up in status MONITORING and that the post-save command is set according to the parameters and values parsed to pan_getcert:

$ sudo ipa-getcert list
Number of certificates and requests being tracked: 2.
Request ID '20230427151532':
        status: MONITORING
        stuck: no
        key pair storage: type=FILE,location='/etc/ssl/private/gp.domain.com.key'
        certificate: type=FILE,location='/etc/ssl/certs/gp.domain.com.crt'
        CA: IPA
        issuer: CN=Certificate Authority,O=IPA.LOCAL
        subject: CN=gp.domain.com,O=IPA.LOCAL
        issued: 2023-04-27 16:15:33 BST
        expires: 2025-04-27 16:15:33 BST
        dns: gp.domain.com
        principal name: HTTP/gp.domain.com@MM.EU
        key usage: digitalSignature,nonRepudiation,keyEncipherment,dataEncipherment
        eku: id-kp-serverAuth,id-kp-clientAuth
        pre-save command:
        post-save command: /usr/local/bin/pan_instcert -c gp.domain.com -n gp.domain.com -Y -p GP_PORTAL_PROFILE -s GP_EXT_GW_PROFILE fw01.domain.local
        track: yes
        auto-renew: yes
Request ID '20230428001508':
        status: MONITORING
        stuck: no
        key pair storage: type=FILE,location='/etc/ssl/private/fw01.domain.local.key'
        certificate: type=FILE,location='/etc/ssl/certs/fw01.domain.local.crt'
        CA: IPA
        issuer: CN=Certificate Authority,O=IPA.LOCAL
        subject: CN=fw01.domain.local,O=IPA.LOCAL
        issued: 2023-04-27 16:15:33 BST
        expires: 2025-04-27 16:15:33 BST
        dns: fw01.domain.local
        principal name: HTTP/fw01.domain.local@IPA.LOCAL
        key usage: digitalSignature,nonRepudiation,keyEncipherment,dataEncipherment
        eku: id-kp-serverAuth,id-kp-clientAuth
        pre-save command:
        post-save command: /usr/local/bin/pan_instcert -c fw01.domain.local -n fw01.domain.local -Y -p MGMT_UI_PROFILE fw01.domain.local
        track: yes
        auto-renew: yes

Note the status, location of the saved files and the post-save command. Tracking and auto-renew are enabled by default by ipa-getcert.

pan_instcert will only log to stdout when executed directly. However, it will always log to /var/log/pan_instcert.log. When requesting certificates, it can be helpful to run a tail to see the post-save command logging in real-time. If the log file doesn’t yet exist the tail will fail.

sudo touch -a /var/log/pan_instcert.log
sudo tail -f /var/log/pan_instcert.log

Wrong SSL/TLS profile name

If the SSL/TLS Service Profile doesn’t exist it will be created, but the following error will be shown in /var/log/pan_instcert.log and the commit will fail:

commit: success: "Validation Error:
 ssl-tls-service-profile -> Test_profile  is missing 'protocol-settings'
 ssl-tls-service-profile is invalid"

Verify the API calls on the Palo Alto Firewall or Panorama

Check the following locations on the Palo Alto firewall for additional confirmation: Monitor >> Logs >> Configuration There should be 3-5 operations shown, depending on whether or not the SSL/TLS service profile(s) are being updated.

  1. A web upload to /config/shared/certificate.
  2. A web upload to /config/shared/certificate/entry[@name=’FQDN(_YYYY)’], under the FreeIPA root CA certificate.
  3. One or more web “set” commands to /config/shared/ssl-tls-service-profile/entry[@name=’YOUR_PROFILE(S)’]
  4. And a web “commit” operation.

To see all API actions, filter by the admin username used for API key: ( admin eq [api-admin] )

Under Device >> Certificate Management >> Certificates the new certificate should be shown with a valid status and under the manually imported IPA CA root certificate.

If any SSL/TLS Profiles were parsed, then under Device >> Certificate Management >> SSL/TLS Service Profile the respective profiles should show the new certificate has replaced the previous certificate.

Recent script changes

[2023-05-02] Looking into issuing a subordinate certificate from FreeIPA for a Palo Alto firewall, for user VPN certificates and ideally SSL interception. I found that I needed to specify a Certificate Profile in the ipa-getcert command, I thus went about adding option -T to pan_getcert.

Furthermore, I also added the following options:

  • -b Key bit length, 2048 is the default but it’s good to be able to request 3072 and 4096 length RSA certificates.`-b Key bit length, 2048 is the default but it’s good to be able to request 3072 and 4096 length RSA certificates.
  • -G Certificate type, currently only RSA is supported by FreeIPA. But it’s good to be ready for when EC and ECDSA certificates can be issued from FreeIPA. I don’t think this will be any time soon, but the code is there now.
  • -S Service type. The subordinate certificate isn’t for an HTTP service, in fact it’s best suited to being tied to a host rather than a service. So now there’s an option to specify a service, if omitted HTTP is assumed.

Palo Alto OSPF route filtering

 | 11 Apr 2023 14:41

To reduce the impact of lab testing bits and pieces, I decided to refresh my home network. The requirement I set myself is to reduce some of my family’s frustration when things go sideways, as it inevitably happens when trying out something new. Thus the point of the exercise is not just to refresh my memory or learn something new, but to create a clear separation between the home and the ‘lab’ networks.

The core of the domestic network will still be the same core switch and servers. But, any failures on the lab side should not affect the home network’s connection to the internet. Or as my kids would put it, the ‘WiFi’ must not go down. To my shame, and despite my many admonitions, our spawn persists in calling an internet connection “WiFi”.

In summary, my home network consists of a Ruckus ICX 7250 layer-3 capable switch. Not as polished as a Cisco but it’s an affordable second-hand switch with 48 POE ports and 8x 10GE SFP+ ports. The bulk of the compute and storage element is provided by two HPE DL380 servers, one gen8 (LFF with raid SAS storage) and the other gen9. I later added vSAN using Intel NVMe and 8TB SAS disks. It’s an organically grown mess, but it works and runs the latest version of vSphere 7 very well. Both servers are connected with 2x10GE (LACP LAG) for vSAN and VMs and 2x2GE for management.

Our FTTP terminates on a Palo Alto VM-50, and this is where vSAN and vMotion shine as this virtual firewall can move between the two ESXi hosts without missing a beat.

Initial setup and problem

At first, each VLAN was individually connected to the firewall, which worked great until the firewall went down (only for a software upgrade of course) and I could no longer connect to vCenter from my home network.

My solution was to introduce a Layer-3 link between the switch and the firewall. This made the switch the default gateway for the home network and I could then bring up a vlan interface on the switch to restore access to the management subnet if the firewall was unavailable. Routing between the switch and the firewall was static, which worked well enough, issues were only sporadic after all.

But then a DNS server on my network died and I figured I could do better, by removing lab/dev elements from the home network and giving myself direct access to that rather than traversing the firewall.

Current redesign

I had already used VRF-lite to configure the p2p routed link between the switch and the firewall for the home network. Doubling this for a work VRF was a simple progression while also ensuring that local traffic between work (access, server and management) subnets could flow freely without traversing the firewall. This should avoid management reachability issues for me the next time I upgrade the firewall. Additionally, I’ll be the only user connecting this way and there will be no direct wireless exposure of this VRF, limiting exposure of vSphere via potentially compromised clients on the home network.

I’m still undecided about which is harder to teach regarding cyber security, clients or family members…

Static or dynamic routing?

Moving subnets around becomes easier with dynamic routing protocols. This is what I needed to do and I figured it would be a good Palo Alto routing refresher. While the Ruckus switch supports RIP and OSPF, I went with OSPF as I figured it to be more relevant. Does anyone still use RIP these days? Please don’t answer that; there are too many things in tech that just won’t die…

OSPF

As I set out to add a second VRF to the switch and configure OSPF on it I found a couple of things that one should be aware of:

vSphere

  • When adding a new Distributed Port Group (DPG) on a vSwitch with a LAG uplink, ensure the advanced settings are changed to reflect this. This is obvious when initially configuring vSphere, but is easy to forget. So if you find that the switch doesn’t see an OSPF neighbor and the firewall does (stuck in init mode), check the settings of the DPG…

Ruckus

  • Simple straightforward configuration from the CLI, but there is no way to see or configure anything OSPF from the web interface.
  • With no need to filter advertised prefixes I didn’t check or test for prefix filtering. In fact I used area 0 (0.0.0.0) for all three routing instances.
    • Switch: Enabled OSPF on vrf LAN
    • Switch: Enabled OSPF on vrf WORK
    • Firewall: Enabled OSPF on the default virtual router
    • Set vlan ports to ospf-passive to include them in OSPF rather than redistribute connected.
      • This reduces network chatter, as no neighbor advertisements take place on passive ports.

Palo Alto

  • Only advertise a default route. As I’m using the backbone area (0.0.0.0) I can’t use a stub area for the switch, which would make it easier to advertise only a default route.
    • The solution is to use Redistribution Profiles. But, as there’s no comprehensive prefix-list with prefix length matching, one can’t match a default-route as follows:
      • 0.0.0.0/0 le 0
    • Instead, the 0.0.0.0/0 redist needs to be preceded (by priority) by a ‘no-redist’ object containing:
    128.0.0.0/1
    64.0.0.0/2
    32.0.0.0/3
    16.0.0.0/4
    8.0.0.0/5
    4.0.0.0/6
    2.0.0.0/7
    1.0.0.0/8
    Palo Alto – Virtual Router – Redistribution Profile – Redist Default

    The second requirement for filtering redistributed prefixes on Palo Alto is to add both Export Rules under the virtual router OSPF configuration. It took me some time to figure out that creating the Redistribution Profiles does nothing; applying them to the routing process does. It makes absolute sense once you notice.

    OSPF Ruckus configuration

    The following is the OSPF configuration of the Ruckus, including floating statics for the default routes (belts and braces):

    vrf LAN
     rd 65000:100
     ip router-id 192.168.100.1
     address-family ipv4
     ip route 0.0.0.0/0 192.168.199.0 15 distance 254 name fw01
     exit-address-family
    exit-vrf
    !
    vrf WORK
     rd 65000:101
     ip router-id 192.168.101.1
     address-family ipv4
     ip route 0.0.0.0/0 192.168.199.2 15 distance 254 name fw01
     exit-address-family
    exit-vrf
    !
    router ospf vrf WORK
    area 0.0.0.0
    !
    router ospf vrf LAN
    area 0.0.0.0
    !
    interface loopback 1
     vrf forwarding WORK
     ip address 192.168.199.253 255.255.255.255 ospf-passive
     ip ospf area 0.0.0.0
     ip ospf active
    !
    interface ve 100
     port-name LAN
     vrf forwarding LAN
     ip address 192.168.100.1 255.255.255.0
     ip ospf area 0.0.0.0
     ip ospf passive
    !
    interface ve 101
     port-name WORK-user-VLAN
     vrf forwarding WORK
     ip address 192.168.101.1 255.255.255.0 ospf-passive
     ip ospf area 0.0.0.0
     ip ospf passive
    !
    interface ve 198
     port-name WORK-Uplink-to-FW
     vrf forwarding WORK
     ip address 192.168.199.3 255.255.255.254
     ip ospf area 0.0.0.0
     ip ospf active
     ip ospf network point-to-point
    !
    interface ve 199
     port-name Uplink-to-FW
     vrf forwarding LAN
     ip address 192.168.199.1 255.255.255.254
     ip ospf area 0.0.0.0
     ip ospf active
     ip ospf network point-to-point
    !
    interface ve 400
     port-name Servers
     vrf forwarding WORK
     ip address 192.168.104.1 255.255.255.0 ospf-passive
     ip ospf area 0.0.0.0
     ip ospf passive
    !
    interface ve 700
     port-name Management
     vrf forwarding WORK
     ip address 192.168.107.1 255.255.255.0 ospf-passive
     ip ospf area 0.0.0.0
     ip ospf passive

    Checking OSPF

    Once the Palo Alto and both VRFs on the switch were configured and the vSphere Distributed Port Group issue was resolved both neighborships came up:

    Palo Alto – Virtual Router – Runtime Stats – OSPF Neighbors

    I left a couple of floating statics in place on the firewall, they’re shown below with metric 255.

    Palo Alto – Virtual Router – Runtime Stats – Route Table

    But as can be seen from the forwarding table, the OSPF learned prefixes are preferred:

    Palo Alto – Virtual Router – Runtime Stats – Forwarding Table

    And this is what things look like on the Ruckus:

    SSH@sw01#sh ip ospf vrf LAN neighbor
    Number of Neighbors is 1, in FULL state 1
    
    Port Address Pri State Neigh Address Neigh ID Ev Opt Cnt
    v199 192.168.199.1 1 FULL/OTHER 192.168.199.0 192.168.199.0 5 66 0
    
    SSH@sw01#sh ip ospf vrf WORK neighbor
    Number of Neighbors is 1, in FULL state 1
    
    Port Address Pri State Neigh Address Neigh ID Ev Opt Cnt
    v198 192.168.199.3 1 FULL/OTHER 192.168.199.2 192.168.199.0 5 66 0
    
    SSH@sw01#sh ip route vrf LAN
    Total number of IP routes: 9
    Type Codes - B:BGP D:Connected O:OSPF R:RIP S:Static; Cost - Dist/Metric
    BGP Codes - i:iBGP e:eBGP
    OSPF Codes - i:Inter Area 1:External Type 1 2:External Type 2
    Destination Gateway Port Cost Type Uptime
    1 0.0.0.0/0 192.168.199.0 ve 199 110/2 O2 5d4h
    2 192.168.100.0/24 DIRECT ve 100 0/0 D 13d4h
    3 192.168.101.0/24 192.168.199.0 ve 199 110/12 O 2m40s
    4 192.168.104.0/24 192.168.199.0 ve 199 110/12 O 2m40s
    5 192.168.107.0/24 192.168.199.0 ve 199 110/12 O 2m40s
    6 192.168.199.0/31 DIRECT ve 199 0/0 D 13d4h
    7 192.168.199.2/31 192.168.199.0 ve 199 110/11 O 5d4h
    8 192.168.199.253/32 192.168.199.0 ve 199 110/12 O 2m40s
    9 192.168.199.254/32 192.168.199.0 ve 199 110/11 O 5d4h
    
    SSH@sw01#sh ip route vrf WORK
    Total number of IP routes: 9
    Type Codes - B:BGP D:Connected O:OSPF R:RIP S:Static; Cost - Dist/Metric
    BGP Codes - i:iBGP e:eBGP
    OSPF Codes - i:Inter Area 1:External Type 1 2:External Type 2
    Destination Gateway Port Cost Type Uptime
    1 0.0.0.0/0 192.168.199.2 ve 198 110/2 O2 2m50s
    2 192.168.100.0/24 192.168.199.2 ve 198 110/12 O 2m50s
    3 192.168.101.0/24 DIRECT ve 101 0/0 D 4d21h
    4 192.168.104.0/24 DIRECT ve 400 0/0 D 5d4h
    5 192.168.107.0/24 DIRECT ve 700 0/0 D 5d3h
    6 192.168.199.0/31 192.168.199.2 ve 198 110/11 O 2m50s
    7 192.168.199.2/31 DIRECT ve 198 0/0 D 6d22h
    8 192.168.199.253/32 DIRECT loopback 1 0/0 D 5d21h
    9 192.168.199.254/32 192.168.199.2 ve 198 110/11 O 2m50s

    Summary

    Though I am now violating a stated requirement by advertising connected prefixes between the two VRFs, I’m not that bothered by this. The firewall filters traffic between the VRFs and matching my requirement would have meant creating two stub area networks, adding complexity. While possible, for my home network the current setup will do just fine.

    Equally, any prefixes which do not need internet connectivity are simply not added to the OSPF process and thus remain unknown outside the VRF. And prefixes which need more security like the DMZ, guest WiFi and IoT remain directly connected to the firewall.

    Conclusion

    All in all, deploying OSPF between Palo Alto and a Ruckus ICX 7250 proved to be an easy and successful exercise and no licenses were required on either device to support dynamic layer-3 routing. Both devices were at their latest available firmware.

    • Palo Alto: 11.0.0
    • Ruckus ICX 7250: 8.0.90j
    SSH@sw01&gt;sh ver
      Copyright (c) Ruckus Networks, Inc. All rights reserved.
        UNIT 1: compiled on Jan  5 2021 at 21:08:45 labeled as SPR08090j
          (33554432 bytes) from Primary SPR08090j.bin (UFI)
            SW: Version 08.0.90jT213
          Compressed Primary Boot Code size = 786944, Version:10.1.18T215 (spz10118)
           Compiled on Mon Jul 13 09:53:15 2020
    
      HW: Stackable ICX7250-48-HPOE
    ==========================================================================
    UNIT 1: SL 1: ICX7250-48P POE 48-port Management Module
          Serial  #:DUK3849N0JS
          Software Package: ICX7250_L3_SOFT_PACKAGE   (LID: fwmINJOpFlu)
          Current License: l3-prem-8X10G
          P-ASIC  0: type B344, rev 01  Chip BCM56344_A0
    ==========================================================================
    UNIT 1: SL 2: ICX7250-SFP-Plus 8-port 80G Module
    ==========================================================================
     1000 MHz ARM processor ARMv7 88 MHz bus
     8192 KB boot flash memory
     2048 MB code flash memory
     2048 MB DRAM
    STACKID 1  system uptime is 13 day(s) 7 hour(s) 45 minute(s) 46 second(s)

    Cisco LACP config for Aruba AP

     | 8 Jul 2015 17:30

    Aruba LogoDon’t we all love it when we find that a standard requirement states one thing and what to date is implemented elsewhere doesn’t comply? Dual active uplinks for a premium office standard is one of those requirements I found. Now I haven’t seen the standard Cisco wireless deployment for premium sites, but in light of vendor ‘diversity’ Aruba is deployed instead of Cisco.

    Motivation aside, the dual uplink raises an interesting question for lightweight access points (LWAPs). Aruba (by default) GRE tunnels all client traffic to the wireless LAN controller (WLC) for processing, filtering and forwarding there, like Cisco and common in corporate environments. The alternatives are split tunneling or no tunneling, which normally comes at the cost of losing corporate controls. The QoS trade-off and headaches of tunneling WLAN traffic to WLCs is food for another post entirely.

    LACP

    Using AP225 APs, I found I had LACP at my disposal. Cheaper models (< AP220) don’t do LACP and only have STP for redundancy. Some of my first concerns:

    • Standard Cisco LACP is mostly configured unconditional, which means the ports don’t come up if LACP isn’t detected on the link. How is an AP meant to get its profile from a WLC if it can’t get there. Remember I don’t want to reconfigure the switch ports after an AP has connected and obtained its profile (configuration) from the WLC.
    • Aruba documentation and forums (Airheads) didn’t list much configuration about Cisco switch port configuration. What I did find was that LACP is supported and needs switch configuration for it to work.
    • A single GRE tunnel using 2 etherchannel members?! LACP uses an IP hash table to select which member link to forward packets on. An AP only has a single IP address and without LACP the WLC also only has a single IP address for termination of LWAP GRE tunnels. Surely all GRE tunnels would only use a single LACP bundle-member, restricting maximum throughput to 1 Gbps. If so, what’s the point?

    Reading up I found the following helpful information:

    • Aruba solves the LACP IP hash table problem by using a second WLC IP address to terminate a second GRE tunnel. This second tunnel uses the 2nd member-link. Each GRE tunnel serves a radio, 2.4GHz and 5GHz, this does not enable more than 1 Gbps for 5GHz but at least 2.4GHz traffic won’t eat into the uplink speed available to 5GHz traffic. The Aruba config for LACP centres around “AP LACP GRE striping IP” (see Google for more info).
    • “no port-channel standalone-disable”, this port-channel configuration gem permits link members to come up as individual links. This allows a LWAP to connect to the network, get an IP via DHCP, find the WLC and pull its configuration. Once provisioned by the WLC LACP kicks in.

    Caveats

    Beware of the LACP hash algorithm, Cisco switch default is src-mac. In an edge-routed design the source-mac will be the mac of the switch SVI towards the WLC. The Switch terminating the LWAPs is the same as the one terminating the WLC and the WLC also uses LACP to connect to the LAN. For my deployment the solution was src-ip as the GRE sessions towards the LWAPs have a distinct WLC IP address (must be odd/even). Traffic destined for the WLC is also src-ip based, which is good as the load-balancing will then be based on the targets of the clients whether internet or LAN based it works as long as corporate clients don’t all hit the same target at the same time. I think is most situations the resulting total bandwidth restriction of a single LAN source towards wireless clients at 1 Gbps is beneficial to the fair sharing of bandwidth between LAN based services.

    The AP225 only pulls PoE over a single link. If the link providing PoE goes down it will reboot and come up one the remaining link.

    Though the dual links provide extra bandwidth, if the your NOC doesn’t monitor these links either via WLC management or switch trap/port monitoring, a single link failure won’t be noticed. I think this is no different to the issue of APs losing their physical link and continuing in mesh connectivity, which is great as a last resort but not when the situation isn’t resolved before things get really bad.

    Cisco config

    This is the LWAP switch port config that worked for me:

    WLAN-SW01(config)#int range g1/0/1,g2/0/1
     description WLAN-AP01
     switchport access vlan 4
     switchport mode access
     channel-group 1 mode active
     !
    WLAN-SW01(config)#int po1
     description WLAN-AP01
     switchport access vlan 4
     switchport mode access
     no port-channel standalone-disable
     !
     exit
    WLAN-SW01#sh eth 1 sum
    Flags: D - down P - bundled in port-channel
           I - stand-alone s - suspended
           H - Hot-standby (LACP only)
           R - Layer3 S - Layer2
           U - in use f - failed to allocate aggregator
    
           M - not in use, minimum links not met
           u - unsuitable for bundling
           w - waiting to be aggregated
           d - default port
    ...
    Group  Port-channel  Protocol    Ports
    ------+-------------+-----------+-----------------------------------------------
    1      Po1(SU)         LACP      Gi1/0/1(P) Gi2/0/1(P)

    When the LWAP hasn’t fetched it’s configuration the Flags show either (D) for down or (I) when the port is up but LACP is inactive. As long as LACP is inactive the APs MAC address will hop between the two ports and a MAC flap warning is reported by the switch.

    Jul  8 2015 08:33:59.259 UTC: %SW_MATM-4-MACFLAP_NOTIF: Host 94b4.0f50.47f0 in vlan 4 is flapping between port Gi2/0/1 and port Gi1/0/1

    Another error I’ve seen is about PoE. What happens is that both member ports offer PoE but the AP only signals acceptance on a single port. The switch doesn’t seem to understand the lack of response, calls the AP rude, turns off PoE on that port and logs the ‘error’.

    Jul 8 2015 17:08:39.030 UTC: %ILPOWER-7-DETECT: Interface Gi1/0/2: Power Device detected: IEEE PD
    Jul 8 2015 17:08:41.202 UTC: %ILPOWER-5-IEEE_DISCONNECT: Interface Gi2/0/2: PD removed
    Jul 8 2015 17:08:41.203 UTC: %ILPOWER-3-CONTROLLER_PORT_ERR: Controller port error, Interface Gi2/0/2: Power given, but Power Controller does not report Power Good
    Jul 8 2015 17:08:41.885 UTC: %ILPOWER-7-DETECT: Interface Gi2/0/2: Power Device detected: IEEE PD
    Jul 8 2015 17:08:42.995 UTC: %ILPOWER-5-POWER_GRANTED: Interface Gi2/0/2: Power granted
    Jul 8 2015 17:08:50.035 UTC: %LINK-3-UPDOWN: Interface GigabitEthernet2/0/2, changed state to up
    Jul 8 2015 17:08:50.187 UTC: %LINK-3-UPDOWN: Interface GigabitEthernet1/0/2, changed state to up
    Jul 8 2015 17:08:55.025 UTC: %ILPOWER-5-IEEE_DISCONNECT: Interface Gi1/0/2: PD removed
    
    WLAN-SW01#sh power inline
    Module Available Used Remaining
     (Watts) (Watts) (Watts)
    ------ --------- -------- ---------
    1 1110.0 200.2 909.8
    Interface Admin  Oper       Power   Device              Class Max
                                (Watts)
    --------- ------ ---------- ------- ------------------- ----- ----
    Gi1/0/1   auto   on         15.4    Ieee PD             4     30.0
    Gi2/0/1   auto   off        0.0     n/a                 n/a   30.0

    Check LACP from the WLC

    Some great LACP related WLC CLI tools I found on Airheads:

    Check if GRE striping IP has been set: “show ap system-profile ”

    (WLAN-WLC01) #show ap system-profile LACP
    
    AP system profile "LACP"
    ------------------------
    Parameter Value
    --------- -----
    RF Band g
    RF Band for AM mode scanning all
    ...
    LMS IP 10.20.30.10
    Backup LMS IP N/A
    LMS IPv6 N/A
    Backup LMS IPv6 N/A
    LMS Preemption Disabled
    LMS Hold-down Period 600 sec
    LMS ping interval 20
    GRE Striping IP 10.20.30.11

    Check the if an APs LACP has come up: “show ap debug lacp ap-name ”

    (WLAN-WLC01) #show ap debug lacp ap-name WLAN-AP01
    
    AP LACP Status
    --------------
    Link Status  LACP Rate  Num Ports  Actor Key  Partner Key  Partner MAC
    -----------  ---------  ---------  ---------  -----------  -----------
    Up           slow       2          17         1            88:90:8d:d9:b8:00
    Slave Interface Status
    ----------------------
    Slave I/f Name  Permanent MAC Addr  Link Status  Member of LAG  Link Fail Count
    --------------  ------------------  -----------  -------------  ---------------
    eth0            94:b4:0f:c2:83:b2   Up           Yes            0
    eth1            94:b4:0f:c2:83:b3   Up           Yes            0
    ...
    

    Check if GRE tunnels are being created to both the switch IP address and the GRE stripping IP address configured in the AP system profile: “show datapath session | include ”

    (WLAN-WLC01) #show datapath session | include "30.29   10.20.30.1"
    ...
    10.20.30.29   10.20.30.10   17   4500  4500   0/0  0    0   0   pc1         119  70         71872      FC           
    10.20.30.29   10.20.30.11   47   0     0      0/0  0    0   1   pc1         c    0          0          FC           
    ...

    There you have it, LACP between an Aruba AP and a Cisco switch. Kudos to Abi’s over at Airheads for this article about LACP on the Aruba AP225 and AirOS 6.3. I was working on 6.4, ymmv with different versions.

    Temporary London CCIE LABs

     | 25 Jun 2014 15:23

    Just received an email about Cisco’s mobile CCIE LAB coming to London:

    Mobile CCIE Lab Available in London, United Kingdom, from October 6 to October 14, 2014

    To address the urgent need for certified IT professionals, and to offer more convenient testing, Cisco has developed the Mobile CCIE Lab for qualified candidates who are ready to take their CCIE Routing and Switching exam or CCIE Security exam.

    We encourage you to take advantage of the mobile lab scheduled in London, United Kingdom, from October 6 to October 14, 2014. The CCIE mobile testing lab will allow qualified candidates to more easily and quickly take the exam, reducing the waiting time, effort, and costs accrued by having to travel to take the exam. The eight-hour lab exam tests your ability to configure actual equipment and get the network running in a timed-test situation. There will be 42 seats available for the CCIE Routing and Switching exam and 7 seats available for the CCIE Security exam.

    Apart from the Cisco Certified Architect (CCAr) certification, the Cisco CCIE certifications are the highest level of achievement for network professionals. Less than 3 percent of all Cisco certified professionals earn their CCIE certification.

    Click here to register for the CCIE Routing and Switching exam or CCIE Security exam in London.

    For information on registering for a Mobile CCIE Lab event or for additional information about the Mobile CCIE Lab program, visit the Cisco Learning Network.

    SSL Intercept headaches

     | 17 Jun 2014 22:50

    BlueCoat Logo

    A recent proxy upgrade, has seen me working many hours – fixing things that weren’t broken before. It was intended to be a drop-in replacement, but somebody couldn’t resist the opportunity to specify ‘a few minor’ new requirements.

    • 1 year log retention of all traffic
    • SSL interception to enable data leakage protection for all traffic types

    The first doesn’t sound like a big issue, however it turned our we underestimated the logging volume for 8000 concurrent users. Additionally the reseller hadn’t flagged the issue either, I’m ‘sure’ they’ll pay more attention next time… As for SSL interception. It broke a host of things. Some lessons learned:

    • Bluecoat ProxySG devices come with root CA certificates installed. Many site-admins using SSl fail to install the intermediate certs which slows down session set-up but also means we had to install many intermediates as the proxy does not go looking for them. This means manually finding and installing certs based on users calling the help-desk because they weren’t allowed to access sites with untrusted certificates.
    • Commercial sites using self signed certificates. Bad practise, but sadly it’s not always up to engineers/consultants whether or not such a site should be honoured with a business critical status or not.
    • Applications tunneling proprietary protocols over TCP:443. Some encrypted, some not so much. The ProxySG was configured to detect the protocol and to deny all unrecognised traffic. This breaks Adobe Creative Cloud for example. Skype is another hot-potato.

    Skype in particular proved to be a big time-waster. As you may well know Skype uses proprietary protocols and tries very hard to remain hidden from prying eyes. As Skype was an application that was in use before the migration and the ‘as-is’ rule lingered, there was some pressure to get Skype working. The short is that I got it working without globally turning off SSL Intercept, well – to a degree anyway…

    Show me more… »

    Cisco Voice-VLAN (VVLAN) inconsistencies

     | 12 Nov 2012 12:41

    First off I’d like to say that this is just a minor issue, more relating for routers versus switch, I’m still a lot happier at how Cisco implements config and features as opposed to most if not all of their competitors…

    At a customer I’ve recently had to commit a grave operational sin; to connect a small switch at the end of a floor patch. These things are normally operational nightmares as they have a tendency to quickly bring an entire LAN environment down to its knees when such a ‘switch’ is connected to the network twice. Always by accident but having management kick you for something someone else did is not anyone’s idea of fun. I won’t go into the underlying principles here as I’m assuming most who frequent my blog will know about broadcast storms, their causes and the tools and solutions available to mitigate the risks.

    Our justification to operations was that we wanted a few more local LAN ports to test VoIP devices on than we had available through floor patches. As such I reasoned with Operations that this was a calculated choice to segregate our testing from the rest of the LAN but still make it as realistic as possible. Using the means available meant that I had to make do with a Cisco 1801. Single routed and 8 switched interfaces. Think of it as a router with one Ethernet interface and an 8 port HWIC-ESW nailed to it. Didn’t need the ATM or WiFi it has.

    So I set out, disabling IP routing, admin down all non-Ethernet ports. set up the vlan database -old style, remember?-; I did not want this baby to participate in VTP, in fact I don’t think it even can! It’s limited to 8 vlans. Pulled two cables to it. One switched port as trunked with some data and voice vlans and configured the routed interface for management access.

    All sweet and dandy, tested the BPDU-guard functionality prior to installation by connecting an access-port to the LAN. Clunk! it went down as desired, result I thought… Then when installing the LAN wouldn’t bring up the LAN port. Doh! I’d missed that the 1801 doesn’t send BPDU’s until a VLAN becomes active. I’d checked if spanning-tree was operational, and it wasn’t until I brought an interface up. So I disabled STP for all vlans in the VLAN database. Now my laptop received an IP address and the data VLANs all worked.

    So, time to connect a Mitel phone. No dice, it received it’s first DHCP response with VLAn information, then it would just sit ennuncing it was waiting for a DHCP response. Dang, I’d configured the voice vlan so why did the switch not detect the phone, enable trunking so that the phone could send it’s DHCP request on the voice VLAN?

    It was only when I started reading up on HWIC-ESW voice-VLAN config I noticed that Cisco hasn’t implemented the auto enable of dot1q trunking when a phone is detected… The solution is to add two lines of code; “switchport truck native vlan xyz” and “switchport mode trunk”. The crux is that this platform is at heart a router, not a native switch…

    Cisco documentation

    Twittertools is dead

     | 14 Jun 2012 16:12

    Long live Social http://alexking.org/blog/2012/05/22/social-2-5

    Testing automatic posting to Twitter and Facebook from my blog, sorry if you consider this spam.

    To CCIE or not to CCIE?

     | 15:30

    This one has been coming for a while… Last time I went for my CCIE was Q3 2007 and it’s been pretty quiet here since.

    In the mean time I’ve moved country, changed jobs and moved again, though not as far. So how is my CCIE doing? Well to be honest, it’s not. You see, I came from a sponsored certification track -which incidentally I forged with my employer at the time- and thus had to sort out a transfer sum from my new employer. No problem for the Dutch employment market, at least at the time. But here in the UK I had no such luck, suffice to say that financially it was not a good idea to move country. My new employer expressed that they would be happy to support attaining CCIE status. Little did I know that we had completely different concepts of ‘support’.

    Loving a new challenge I quickly settled into my new position, only to find two years had gone before the question of attaining CCIE popped up again. No wonder really when you have a 100% billable target and there’s plenty of over time involved in the projects at hand. Then when things settled down and I moved to another customer I only found myself further and further away from (CCIE related) technology.

    Yes, I still design and do have a consulting role but I’m in no way challenged and kept on my toes protocol wise. It took a few technical interviews to fully realise the impact of this. Which brought me to this point: Either, I work and retrain myself towards CCIE in my own time, hope my employer will pay for the lab and favour me with a mere week off for full-time study. Or do I let go of the desire to attain CCIE and trust that there will be employers out there who are smart enough not to be blind sided by the highly praised numbers of my peers.

    My choice has been to no longer pursue CCIE. I can’t ask my family for a long and hefty sacrifice once again, we’ve been there done that. I’m now one kid and two cancers (promisestoday.com /@maizymoo tweet) further and value my own time more than my career, if this means UK employers don’t like me any more than so be it. I know what I’m capable of, just wish I was better at convincing prospective employers…

    [Above edited, below added – Jan 27th 2014]

    Sad thing is that the CCIE we find ourselves hiring. often do not have the consultancy skills needed to satisfy our customer’s needs. To the HR managers out there: Hiring a CCIE does not mean, you get good communication skills, customer facing skills or even basic networking experience for that matter. Trust me I know, I’ve had to replace a number of (single, dual and a quad) CCIEs on various projects where things had gone so bad we were about to lose all future business.

    Don’t get me wrong, I value certifications and see them as a good means for career progression. I do have issue though with the UK market putting so much pressure on individuals to develop themselves, alienating them completely from a corporate drive to improve through a shared responsibility. UK employer – employee relationships have become too one sided… For more on this see a recent article of mine.

    Alcatel 7210 port mirroring

     | 12:52

    Recently I’ve been doing more on Alcatel as I’m working in O2’s test-bed down in Slough, slaving away at testing aspects of their new LLU broadband core and new BT 21CN wholesale connectivity. Although I’ve not been able to write a lot in recent years due working for an integrator rather than an ISP; I’m mostly not allowed or it’s unwise for me to divulge what I’m working on…

    However, it’s common knowledge that many providers use Alcatel and they seem to do pretty well in the ‘booming’ broadband market. Hence I thought I’d share a little snippet of an annoyance I recently encountered.

    When using an Alcatel 7210 to sniff traffic and interconnect different media; 1Gbps copper and 10Gbps fibre. I found that sniffing is counter intuitive to people only trained on Cisco. A few pointers:

    1. Port mirror destinations are defined in configuration
    2. Port mirror sources are set through debug commands
    3. When mirroring VPLS ports (I needed an e-pipe/Layer-2 tunnel) I found that egress sources did not work, only ingress did and only one ingress port can be set per mirror session. It did not matter if I use the port or the SAP as source.

    I was left to sniff in two places to capture both up- & down-stream traffic. YMMV as a 7750 will be different, but I don’t have one available to me to test on…

    Commands used:

    #--------------------------------------------------
    echo "Mirror Configuration"
    #--------------------------------------------------
      mirror
        mirror-dest 4 create
          sap 1/1/4 create
          exit
          no shutdown
        exit
        mirror-dest 11 create
          sap 1/1/11 create
          exit
          no shutdown
        exit
      exit

    And the debug command:

    *A:<hostname># debug mirror-source 4 port ?
    - no port ...
    - port <port-id> egress ingress
    - port <port-id> egress
    - port <port-id> ingress
    - port lag ...
    
    *A:<hostname># debug mirror-source 4 sap ?
    - no sap <sap-id> [ingress]
    - sap <sap-id> {[ingress] }

    As can be seen above capturing by SAP is only supported at ingress. Using port and SAP yielded the same result, only ingress packets were ever sent to the destination port. Despite show mirror stating both Egr & Ing.

    *A:<hostname># show mirror mirror-dest 11
    ===============================================================================
    Mirror Service
    ===============================================================================
    Service Id       : 11                   Type          : Ether
    Description      : (Not Specified)
    Admin State      : Up                   Oper State    : Up
    Forwarding Class : be                   Remote Sources: No
    Slice            : 0
    Destination SAP  : 1/1/11               Egr QoS Policy: 1
    -------------------------------------------------------------------------------
    Local Sources
    -------------------------------------------------------------------------------
    Admin State      : Up
    -Port                                   1/1/26                          Egr Ing
    ===============================================================================

    Python rocks

     | 2 Mar 2009 11:21

    Doing a network upgrade of 28 sites, 60 services and roughly 160 old devices to just 60 devices. Python has become my friend for reading configs, creating csv files and verification. Will be posting my scripts later. I’m sure there are clever people out there who can tell me where I went wrong or what I could be doing better. However time fails me to post them right now.

    F5 certification

     | 11 Feb 2008 11:32

    Not posted much recently, it’s time to add some more meat…

    Last week I passed the F5 BIG-IP LTM v9.0 essentials exam. Was easier than I thought, however tomorrow I may be humbled further as I go for the advanced version of the F5 BIG-IP LTM v9.0 test. Studying with the training books from 4 days worth of training but without further hands on is not what I call fun. Admittedly it’s a lot easier than CCIE so what really am I moaning about. I suppose it the fact that I’m not studying for my next lab…

    Humbling experience

     | 11:27

    Ouch. Humility is a painful process…

    I’ve just been taught a lesson in humility and authorship. My post about Cisco’s NTP authentication implementation received a comment from a certain Frank. I’ve added my comment and need to verify my statment as soon as I can manage to get some hands-on in a lab again (not got Dynamips set-up yet)…

    Changing jobs

     | 26 Nov 2007 21:19

    Finally some news to write. Not that I haven’t been busy but just not with CCIE. Have had too much on my mind and finally I can write about it.

    Today I’ve resigned from my position at Easynet and signed with nscglobal as a Consultant. First thing while writing here is to thank Easynet for the time there, the opportunities and trust given to me. Despite moving my desk from Rotterdam to Amsterdam within six months of employment, I have thoroughly enjoyed working for Easynet. Seeing it mature from a internationally fragmented company into an upcoming global enterprise player has been both challenging and inspiring.

    For those who don’t know nscglobal, they’re a UK integrator and I’ll be working in one of their London offices. This means moving from The Netherlands to the United Kingdom and we’ll be doing so physically in January. We’re looking for a rental house for multiple reasons but for this site the relevant one is my CCIE. Once in the UK I need to start focussing on my CCIE again and I hope to be able to do so with the support of some of the CCIE’s nscglobal has. So be ready for some new CCIE updates starting January.

    Till then I’ll probably blog about whatever technology comes my way during the move. For example the UK mobile operator 3 has an offer with free Skype calls so I might be looking into UMTS coverage in London, Hitchin and the rest of the extended North of London. Plus I’ll be looking for a hosting location for my two servers, offers are welcome. Offers for dedicated hosting too, might help the uptime of things… My VoIP setup will have to change as well, only slightly as I’ll only be adding an FXO port at home.

    Anyway just keep reading and you’ll find out, the great thing is that I don’t have to keep silent any more about what is going on and that we’re looking forward to new things.

    No IP unreachables (and Cacti)

     | 11 Oct 2007 15:49

    *Sigh* Took me an hour or two to figure this one out. Cacti now does a ping before actually polling a device for stats. I’m running a small cacti site which had been neglected for a long time. After updating cacti and cleaning up some mess I was confused why one router did get polled and the other’s graphs remained a dumb “nan”.

    I debugged and pinged, even installed hping3 to do UDP pings. I don’t want to run cacti as root, especially not on a vhost. So the UDP ping had to work. The pings arrived but still no replies.

    Getting sidetracked I noticed that the one router that did work was being hit by SSH login attempts and it’s cpu was spiking. An ACL took care of the break-in attempts but then I noticed that directed broadcasts were made to my server’s segment. So I nailed that down plus proxy-arps when I noticed that the router which had worked before now was causing errors in Cacti as well.

    Tracking back I noticed that the UDP ping ‘replies’ were unreachables rather than ICMP replies (doh, how obvious!) . I enabled IP unreachables on both routers again and I was done. It’s amazing how blind one can be at times to the blatantly obvious…