Troubleshooting
Examples of common errors you may face when having to work on the Palos
Problem - Not able to connect to ansible host
Example error
##[error]Terraform command 'apply' failed with exit code '1'.: timeout - last error: dial tcp 51.137.145.88:22: i/o timeout
Troubleshooting tip
This is most likely down to retrying steps in the pipeline. Each run a firewall rule is added to allow the Azure DevOps agent to connect to the ansible agent and at the end of the pipeline it is removed, so when you re-run failed steps in the pipeline you are not adding that rule back in, therefore the agent is unable to connect and you get the above error.
Problem - Ansible has failed to apply the palo configuration
This error is usually caused by the Palo’s XML config not applying due to malformed xml or incorrect references to objects etc.
Example error
module.hub-infra.module.firewall.null_resource.ansible-runs (remote-exec): fatal: [51.11.xxx.xxx]: FAILED! => {
...
...
...
module.hub-infra.module.firewall.null_resource.ansible-runs (remote-exec): "msg": "Failed commit: Commit failed"
module.hub-infra.module.firewall.null_resource.ansible-runs (remote-exec): }
Troubleshooting tip
Follow the steps in the connecting guide and log into one of the Palos in the environment where your changes are failing to apply.
Once you’re logged in you will see a commit button in the top right-hand side of your screen like below:
Commit Button
A dialog box will pop up giving you the option to validate the most recent configuration, click on validate commit.
Validate Button
This will run for a few seconds, once it’s complete you’ll be presented with a status page giving you a clearer idea of what the problem is.
In this case we can see that the configuration is in fact invalid. The errors tell us that the
log-setting
valueazure_log_analytics_out
is incorrect, after looking through the code theazure_log_analytics_out
log setting is a nonprod setting that isn’t available in Production and just removing the line fixed the issue.Validate Example
Make your changes in the rdo-terraform-hub-dmz repository and run the pipeline again.
Problem - debugging connectivity issues
You may come across issues with connections being dropped and need to investigate where and why its happening.
Troubleshooting tip
The Palos have a built in traffic monitoring tool that could help you find the issue and tell you what the reason is for the drop.
- Log into both of the Palo Alto’s for the environment
- Click on the monitoring tab at the top of the page
- Start with the
( action eq deny )
filtering rule to view connections being dropped by the Palo. - Add any extra filters such as
(addr.src in a.a.a.a)
or(addr.dst in b.b.b.b)
to help filter out logs you don’t need to see. - Make sure you check both Palo Alto’s as there’s a load balancer in front, traffic could go to either or both.
Filtering Example
data:image/s3,"s3://crabby-images/5d6ec/5d6ecdfb0013c3f5a349849cf5b7844672ba5073" alt="Traffic Filtering"
For more information on types of filters you can apply check out Basics of traffic monitor filtering.
Disk space full messages
If the management UI becomes unresponsive, there is a chance the VM has ran out of disk space and is struggling. You may
see errors in the System
log such as Disk usage for / exceeds limit, 100 percent in use, cleaning filesystem
the /
means
root. Some other path may be presented in the error not always /
.
Disk usage for / exceeds limit
data:image/s3,"s3://crabby-images/370d4/370d473d12a4a923277a4375871eb1cf26ea7853" alt="Disk usage"
When this happens then there needs to be a cleaning exercise or a hard reboot as a last resort.
Troubleshooting tip
- Log unto the VPN and
ssh
into the vm in questions. Good chance that at this point the management UI is inaccessible - Check that there aren’t any disk partition maxed out:
bash show system disk-space
Disk usage for / exceeds limit
- Verify the aggressive clean up is enabled:
bash show system state | match aggressive-cleaning
At the moment, setting is only available via the terminal. There is a possibility that this gets lost between major version upgrades or vm restarts - If the above command gives no output then it needs to be enabled per vm. chose
y
option to remove all old file. SysLog information would have already been sent to Panorama log collector, run the below command in the terminal:debug software disk-usage aggressive-cleaning enable
Enable cleanup
- Set to threshold to
90%
or lower e.g.85%
. The following command will set to90%
:bash debug software disk-usage cleanup deep threshold 90
Disk cleanup setting
You should see info stating usage has been adjusted
- Perform other clean up tasks described in the documentation links below
Link with further details
Support Tickets with Palo Alto
Pre-req
You need to have a Palo Alto Customer Support Portal (SCP) account already setup, if not then you can create one in the CSP. An already existing user will need to assign you to the account, reach out to the team on the #platform-operations Slack channel to get this done.
You will also need Google Authenticator or similar 2FA app to login to the CSP.
Creating a new ticket
Follow the steps Support
-> Get Help
then select what type of ticket you want to create and provide the necessary information.
You can also add attachments to the ticket after creating it, so if you have a tech support file you can attach it here or any other relevant information requested.
Step one
data:image/s3,"s3://crabby-images/e6c23/e6c23f9a18b1ca8ddd0dd1179e8293de5e2ec9a5" alt="Step one"
Step two
data:image/s3,"s3://crabby-images/23905/239051343c1e4b574c1f02157022d3294b45226e" alt="Step two"
Step three
data:image/s3,"s3://crabby-images/06cb5/06cb505f43021cd455d83cd3895372143484c497" alt="Step three"
Things to Note
- There is a good chance they will ask for the tech support file, so make sure you have that ready See the generating a tech support file section in the Palo Alto Software Upgrade Prerequisite guide
- You will need to factor timezone differences when arranging a call with them, make sure you are specific about your timezone
- They may ask for details like serial number, software version, model etc. Make sure you have that information to hand. See Prerequisite for how to get that information
- You can only see your case history, to be notified or participate in an open case you need to be added to the case by the person who opened it by adding you to the Subscribers list
- You can replay via the email you get in your inbox and that will get added to the case history
- Sometimes you are raising a case for the entire infrastructure, when prompted for a serial number you can only provide one, pick any one so you can progress with the ticket creation, then provide more context in the ticket itself
Support escalation and contacts
Escalation contacts within the MoJ account:
Service Delivery Leader - Charles Kingston (ckingston@paloaltonetworks.com)
Major Account Manager - Lauren Verby (lverby@paloaltonetworks.com & 07833247488) & David Woods (dawoods@paloaltonetworks.com)
Technical Solutions Consultant - Lee Harrigan-Green (lharrigangre@paloaltonetworks.com)
Focused Service Engineer - Maria Vasileiadi