Palo Alto Troubleshooting
Examples of common errors you may face when having to work on the Palos
Problem - Not able to connect to ansible host
Example error
##[error]Terraform command 'apply' failed with exit code '1'.: timeout - last error: dial tcp 51.137.145.88:22: i/o timeout
Troubleshooting tip
This is most likely down to retrying steps in the pipeline. Each run a firewall rule is added to allow the Azure DevOps agent to connect to the ansible agent and at the end of the pipeline it is removed, so when you re-run failed steps in the pipeline you are not adding that rule back in, therefore the agent is unable to connect and you get the above error.
Problem - Ansible has failed to apply the palo configuration
This error is usually caused by the Palo’s XML config not applying due to malformed xml or incorrect references to objects etc.
Example error
module.hub-infra.module.firewall.null_resource.ansible-runs (remote-exec): fatal: [51.11.xxx.xxx]: FAILED! => {
...
...
...
module.hub-infra.module.firewall.null_resource.ansible-runs (remote-exec): "msg": "Failed commit: Commit failed"
module.hub-infra.module.firewall.null_resource.ansible-runs (remote-exec): }
Troubleshooting tip
Follow the steps in the connecting guide and log into one of the Palos in the environment where your changes are failing to apply.
Once you’re logged in you will see a commit button in the top right-hand side of your screen like below:
Commit Button
A dialog box will pop up giving you the option to validate the most recent configuration, click on validate commit.
Validate Button
This will run for a few seconds, once it’s complete you’ll be presented with a status page giving you a clearer idea of what the problem is.
In this case we can see that the configuration is in fact invalid. The errors tell us that the
log-setting
valueazure_log_analytics_out
is incorrect, after looking through the code theazure_log_analytics_out
log setting is a nonprod setting that isn’t available in Production and just removing the line fixed the issue.Validate Example
Make your changes in the rdo-terraform-hub-dmz repository and run the pipeline again.
Problem - debugging connectivity issues
You may come across issues with connections being dropped and need to investigate where and why its happening.
Troubleshooting tip
The Palos have a built in traffic monitoring tool that could help you find the issue and tell you what the reason is for the drop.
- Log into both of the Palo Alto’s for the environment
- Click on the monitoring tab at the top of the page
- Start with the
( action eq deny )
filtering rule to view connections being dropped by the Palo. - Add any extra filters such as
(addr.src in a.a.a.a)
or(addr.dst in b.b.b.b)
to help filter out logs you don’t need to see. - Make sure you check both Palo Alto’s as there’s a load balancer in front, traffic could go to either or both.
Filtering Example
For more information on types of filters you can apply check out Basics of traffic monitor filtering.
Disk space full messages
If the management UI becomes unresponsive, there is a chance the VM has ran out of disk space and is struggling. You may
see errors in the System
log such as Disk usage for / exceeds limit, 100 percent in use, cleaning filesystem
the /
means
root. Some other path may be presented in the error not always /
.
Disk usage for / exceeds limit
When this happens then there needs to be a cleaning exercise or a hard reboot as a last resort.
Troubleshooting tip
- Log unto the VPN and
ssh
into the vm in questions. Good chance that at this point the management UI is inaccessible - Check that there aren’t any disk partition maxed out
cmd show system disk-space
Disk usage for / exceeds limit
- Verify the aggressive clean up is enabled
cmd show system state | match aggressive-cleaning
At the moment, setting is only available via the terminal. There is a possibility that this gets lost between major version upgrades or vm restarts - In above command give no output then there is need to enable it per vm. chose
y
option to remove all old file. SysLog information would have already been sent to Panorama log collector, run the below command in the terminalcmd debug software disk-usage aggressive-cleaning enable
Enable cleanup
- Set to threshold to
90%
or lower e.g.85%
cmd debug software disk-usage cleanup deep threshold 90
Disk cleanup setting
You should see info stating usage has been adjusted
- Perform other clean up tasks described in the documentation links below
Link with further details