Atlassian Infrastructure
Atlassian Apps - Jira, Confluence and Crowd
New infrastructure was deployed for all Atlassian apps as part of work to migrate from single server to flexible server. The atlassian-infrastructure has all the IaC as well as automation scripts.
environments built * Production - this replaced “PRD” * Staging - this replaced “PRP”
Virtual Machines
Please take a look at this documentation on confluence which has detailed list of VMs and existing setup and the diagram.
There are total 9 VMs in the production environment, 3 VMs for Jira, 2 VMs for Confluence, 1 VM for Crowd and 3 VMs for GlusterFS.
All resources are stored in either atlassian-prod-rg or atlassian-nonprod-rg
Prod
#### Jira VMs
* atlassian-prod-jira-01
* atlassian-prod-jira-02
* atlassian-prod-jira-03
#### Confluence VMs
* atlassian-prod-confluence-02
* atlassian-prod-confluence-04
#### Crowd VM - this is used to handle the authentication.
* atlassian-prod-crowd-01
#### GlusterFS VMs - GlustFS is used to store file attachments.
* atlassian-prod-gluster-01
* atlassian-prod-gluster-02
* atlassian-prod-gluster-03
#### Postgres Flexible Server
atlassian-prod-flex-server
staging / non prod VM's following the same naming convention but with "nonprod" instead of "prod"
Please note that there are other important resources like Application Gateways, Load Balancer, NSGs, VNETs, SendGrid etc. which are not mentioned here.
PlatOps responsibility
The internal tooling team have access to the servers and databases in both production and staging, as well as in-depth knowledge of how the Atlassian apps work. This means that self-service is normally possible.
What cant the internal team do?
The team does not have access to Azure, or the automation scripts. This means that PlatOps will need to provide assistance in these spaces. This includes any network related issues or features.
Accessing the VMs
Connect to the F5 VPN - available at https://portal.platform.hmcts.net/
Download the private key “test-private-key” from the key vault atlassian-nonprod-kv, and save it to a file.
Set the correct permissions on the file and
bash run chmod 600 <privatekeyfilename>
SSH to the desired VM.
bach ssh -i <privatekeyfilename> atlassian-admin@<VM-IP>
NOTE: as the VM’s are restores of a previous production environment, the local hostname will be different from the Azure display name. Verify you are on the right host by its local IP address!
For example Jira node 1 in staging and production will both the same hostname as “PRDATL01AJRA01.cp.cjs.hmcts.net”
Auto SSL renewal process
The environments have been built to use Lets Encrypt SSL certificates via the acmebot
This means that the certificates renew regularly (every 3 months) so automation has been created to automatically renew these certs within the environment.
The bash function responsible for this is check_and_replace_cert function
- Query if the latest cert from the KeyVault has a newer expiry date than the cert already on the servers.
- If it does, it will update the certificate on the servers
- if the current time is within an agreed window (before 8am), the services will be restarted automatically to pick up the new cert.
- If the latest cert has the same expiry date as the current cert, no action will be taken.
The Atlassian pipeline runs every Monday morning at 7am, this is to poll for any certificate updates.
FAQ
How are services restarted?
You can restart the services when logged into the servers with
systemctl restart jira
replace “Jira” with the service you want to restart, such as “Crowd” or “Confluence”
You can also set the automation to restart the services by adjusting this environment var
What do the automation scripts do?
On each VM, there is a terraform remote-exec to trigger bash scripts which complete various bits of a setup on the VMs. These scripts complete the following: * Configure staging with a different base URL - staging.tools.hmcts.net * Grants permissions to app users, such as the jira user. * Updates the local hosts file * Runs SSL renewal checks * Points the app the correct database * Sets the content of a robots.txt file * optionally restarts the app service
How was the staging env built?
Troubleshooting
1. If you are unable to access the VMs, please make sure you are connected to F5 VPN and using the correct private key and the private key is in the correct format.
2. For some reason, if you see errors on the application, please make sure GlusterFS shares are mounted correctly on the VMs.
e.g jira_shared
should be mounted here /var/atlassian/application-data/jira/shared
Please use mount -a
command to mount them correctly.
There was problem where the share was not mounted correctly after auto shutdown, we have got cronjob on the staging VMs to run the mounting every hour, the script checks if the Share is mounted or not and if not, it will attempt to mount it.