Skip to main content

Troubleshooting

Examples of common errors you may face when having to work on the Palos

Problem - Not able to connect to ansible host

Example error

##[error]Terraform command 'apply' failed with exit code '1'.:  timeout - last error: dial tcp 51.137.145.88:22: i/o timeout

Troubleshooting tip

This is most likely down to retrying steps in the pipeline. Each run a firewall rule is added to allow the Azure DevOps agent to connect to the ansible agent and at the end of the pipeline it is removed, so when you re-run failed steps in the pipeline you are not adding that rule back in, therefore the agent is unable to connect and you get the above error.

Problem - Ansible has failed to apply the palo configuration

This error is usually caused by the Palo’s XML config not applying due to malformed xml or incorrect references to objects etc.

Example error

module.hub-infra.module.firewall.null_resource.ansible-runs (remote-exec): fatal: [51.11.xxx.xxx]: FAILED! => {
...
...
...
module.hub-infra.module.firewall.null_resource.ansible-runs (remote-exec):     "msg": "Failed commit: Commit failed"
module.hub-infra.module.firewall.null_resource.ansible-runs (remote-exec): }

Troubleshooting tip

  1. Follow the steps in the connecting guide and log into one of the Palos in the environment where your changes are failing to apply.

  2. Once you’re logged in you will see a commit button in the top right-hand side of your screen like below:

    Commit Button Commit Button
  3. A dialog box will pop up giving you the option to validate the most recent configuration, click on validate commit.

    Validate Button Validate Button
  4. This will run for a few seconds, once it’s complete you’ll be presented with a status page giving you a clearer idea of what the problem is.

    In this case we can see that the configuration is in fact invalid. The errors tell us that the log-setting value azure_log_analytics_out is incorrect, after looking through the code the azure_log_analytics_out log setting is a nonprod setting that isn’t available in Production and just removing the line fixed the issue.

    Validate Example Validate Example
  5. Make your changes in the rdo-terraform-hub-dmz repository and run the pipeline again.

Problem - debugging connectivity issues

You may come across issues with connections being dropped and need to investigate where and why its happening.

Troubleshooting tip

The Palos have a built in traffic monitoring tool that could help you find the issue and tell you what the reason is for the drop.

  1. Log into both of the Palo Alto’s for the environment
  2. Click on the monitoring tab at the top of the page
  3. Start with the ( action eq deny ) filtering rule to view connections being dropped by the Palo.
  4. Add any extra filters such as (addr.src in a.a.a.a) or (addr.dst in b.b.b.b) to help filter out logs you don’t need to see.
  5. Make sure you check both Palo Alto’s as there’s a load balancer in front, traffic could go to either or both.
Filtering Example Traffic Filtering

For more information on types of filters you can apply check out Basics of traffic monitor filtering.

Disk space full messages

If the management UI becomes unresponsive, there is a chance the VM has ran out of disk space and is struggling. You may see errors in the System log such as Disk usage for / exceeds limit, 100 percent in use, cleaning filesystem the / means root. Some other path may be presented in the error not always /.

Disk usage for / exceeds limit Disk usage

When this happens then there needs to be a cleaning exercise or a hard reboot as a last resort.

Troubleshooting tip

  1. Log unto the VPN and ssh into the vm in questions. Good chance that at this point the management UI is inaccessible
  2. Check that there aren’t any disk partition maxed out: bash show system disk-space

Disk usage for / exceeds limit Disk usage result

  1. Verify the aggressive clean up is enabled: bash show system state | match aggressive-cleaning At the moment, setting is only available via the terminal. There is a possibility that this gets lost between major version upgrades or vm restarts
  2. If the above command gives no output then it needs to be enabled per vm. chose y option to remove all old file. SysLog information would have already been sent to Panorama log collector, run the below command in the terminal: debug software disk-usage aggressive-cleaning enable

Enable cleanup Disk usage result

  1. Set to threshold to 90% or lower e.g. 85%. The following command will set to 90%: bash debug software disk-usage cleanup deep threshold 90

Disk cleanup setting Disk usage result

You should see info stating usage has been adjusted

  1. Perform other clean up tasks described in the documentation links below

Link with further details

Support Tickets with Palo Alto

Pre-req

You need to have a Palo Alto Customer Support Portal (SCP) account already setup, if not then you can create one in the CSP. An already existing user will need to assign you to the account, reach out to the team on the #platform-operations Slack channel to get this done.

You will also need Google Authenticator or similar 2FA app to login to the CSP.

Creating a new ticket

Follow the steps Support -> Get Help then select what type of ticket you want to create and provide the necessary information.

You can also add attachments to the ticket after creating it, so if you have a tech support file you can attach it here or any other relevant information requested.

Step one Step one
Step two Step two
Step three Step three

Things to Note

  • There is a good chance they will ask for the tech support file, so make sure you have that ready See the generating a tech support file section in the Palo Alto Software Upgrade Prerequisite guide
  • You will need to factor timezone differences when arranging a call with them, make sure you are specific about your timezone
  • They may ask for details like serial number, software version, model etc. Make sure you have that information to hand. See Prerequisite for how to get that information
  • You can only see your case history, to be notified or participate in an open case you need to be added to the case by the person who opened it by adding you to the Subscribers list
  • You can replay via the email you get in your inbox and that will get added to the case history
  • Sometimes you are raising a case for the entire infrastructure, when prompted for a serial number you can only provide one, pick any one so you can progress with the ticket creation, then provide more context in the ticket itself

Support escalation and contacts

contact support details

Escalation contacts within the MoJ account:

Service Delivery Leader - Charles Kingston (ckingston@paloaltonetworks.com)

Major Account Manager - Lauren Verby (lverby@paloaltonetworks.com & 07833247488) & David Woods (dawoods@paloaltonetworks.com)

Technical Solutions Consultant - Lee Harrigan-Green (lharrigangre@paloaltonetworks.com)

Focused Service Engineer - Maria Vasileiadi

This page was last reviewed on 12 July 2024. It needs to be reviewed again on 12 January 2025 by the page owner platops-build-notices .
This page was set to be reviewed before 12 January 2025 by the page owner platops-build-notices. This might mean the content is out of date.