Palo Alto Troubleshooting

Examples of common errors you may face when having to work on the Palos

Problem - Not able to connect to ansible host

Example error

##[error]Terraform command 'apply' failed with exit code '1'.:  timeout - last error: dial tcp 51.137.145.88:22: i/o timeout

Troubleshooting tip

This is most likely down to retrying steps in the pipeline. Each run a firewall rule is added to allow the Azure DevOps agent to connect to the ansible agent and at the end of the pipeline it is removed, so when you re-run failed steps in the pipeline you are not adding that rule back in, therefore the agent is unable to connect and you get the above error.

Problem - Ansible has failed to apply the palo configuration

This error is usually caused by the Palo’s XML config not applying due to malformed xml or incorrect references to objects etc.

Example error

module.hub-infra.module.firewall.null_resource.ansible-runs (remote-exec): fatal: [51.11.xxx.xxx]: FAILED! => {
...
...
...
module.hub-infra.module.firewall.null_resource.ansible-runs (remote-exec):     "msg": "Failed commit: Commit failed"
module.hub-infra.module.firewall.null_resource.ansible-runs (remote-exec): }

Troubleshooting tip

Follow the steps in the connecting guide and log into one of the Palos in the environment where your changes are failing to apply.
Once you’re logged in you will see a commit button in the top right-hand side of your screen like below:

Commit Button
A dialog box will pop up giving you the option to validate the most recent configuration, click on validate commit.

Validate Button
This will run for a few seconds, once it’s complete you’ll be presented with a status page giving you a clearer idea of what the problem is.

In this case we can see that the configuration is in fact invalid. The errors tell us that the log-setting value azure_log_analytics_out is incorrect, after looking through the code the azure_log_analytics_out log setting is a nonprod setting that isn’t available in Production and just removing the line fixed the issue.

Validate Example
Make your changes in the rdo-terraform-hub-dmz repository and run the pipeline again.

Problem - debugging connectivity issues

You may come across issues with connections being dropped and need to investigate where and why its happening.

Troubleshooting tip

The Palos have a built in traffic monitoring tool that could help you find the issue and tell you what the reason is for the drop.

Log into both of the Palo Alto’s for the environment
Click on the monitoring tab at the top of the page
Start with the ( action eq deny ) filtering rule to view connections being dropped by the Palo.
Add any extra filters such as (addr.src in a.a.a.a) or (addr.dst in b.b.b.b) to help filter out logs you don’t need to see.
Make sure you check both Palo Alto’s as there’s a load balancer in front, traffic could go to either or both.

Filtering Example

For more information on types of filters you can apply check out Basics of traffic monitor filtering.

Disk space full messages

If the management UI becomes unresponsive, there is a chance the VM has ran out of disk space and is struggling. You may see errors in the System log such as Disk usage for / exceeds limit, 100 percent in use, cleaning filesystem the / means root. Some other path may be presented in the error not always /.

Disk usage for / exceeds limit

When this happens then there needs to be a cleaning exercise or a hard reboot as a last resort.

Troubleshooting tip

Log unto the VPN and ssh into the vm in questions. Good chance that at this point the management UI is inaccessible
Check that there aren’t any disk partition maxed out cmd show system disk-space

Disk usage for / exceeds limit

Verify the aggressive clean up is enabled cmd show system state | match aggressive-cleaning At the moment, setting is only available via the terminal. There is a possibility that this gets lost between major version upgrades or vm restarts
In above command give no output then there is need to enable it per vm. chose y option to remove all old file. SysLog information would have already been sent to Panorama log collector, run the below command in the terminal cmd debug software disk-usage aggressive-cleaning enable

Enable cleanup

Set to threshold to 90% or lower e.g. 85% cmd debug software disk-usage cleanup deep threshold 90

Disk cleanup setting

You should see info stating usage has been adjusted

Perform other clean up tasks described in the documentation links below

Link with further details

This page was last reviewed on 12 July 2024. It needs to be reviewed again on 12 January 2025 by the page owner platops-build-notices .

This page was set to be reviewed before 12 January 2025 by the page owner platops-build-notices. This might mean the content is out of date.