How to reindex when Elastic Search and Database are out of sync

This provides instructions on how to complete a reindex of an elastic search index when data between elastic search and the database is out of sync.

This will be made as generic as possible with places to enter the values as needed

A change request should be raised for production, which will be scheduled for out of hours. As part of the change get confirmation of case_type_id and the name of the index. The requesting service team will need to raise a change request on Halo

Example implementation plan

Example index name: civil_cases-000001 In this runbook where you see something similar to this is expected.

Example case_type_id: CIVIL In this runbook where you see something similar to this is expected.

Example temporary table name: civil_tmp_09102025 In this runbook where you see tmp something similar to this is expected.

Some changes will include both non-production and production environments.

Prerequisite steps

Most changes in production will have a few prerequisite steps to complete before starting the reindex.

1) Increase the logstash pod replica count to 2

This is done to assist with logstash stability and performance during the reindex. Make sure this is done in hours before the change as CCD team need to approve.

This can be done by raising a PR to change the number of replicas in the relevant environment within the cnp-flux-config repo. Some teams have their own instance of logstash so confirm with teams if they have a specific instance that needs to be changed.

2) Once pods are increased raise the PR for lowering the pod replicas back to 1.

Again because CCD team need to approve its best to raise this in hours before the change. So it can be merged once the change is completed.

3) Take a count of the DLQ for the case_type_id to be reindexed

curl --header "Content-Type: application/json" ' http://localhost:9200/.logstash_dead_letter/_count' -d '{"query": {"match_phrase": {"failed_case": "\"case_type_id\":\"<your case type id>\""}}}'

4) Get a count of the index for the jurisdiction to be reindexed

curl --silent localhost:9200/_cat/indices | awk '{print $3, $7}' | grep <your index name>

5) Get a count of the cases in the database for the case_type_id

SELECT count(*) FROM case_data WHERE case_type_id = '<your case type id>';

The counts of the Database, index and DLQ are usually good to have for estimating the time the reindexing will take.

Steps to complete reindex

1) Request JIT access for the relevant environments bastion. (https://myaccess.microsoft.com/)

2) Also request administrative access to the relevant database (https://myaccess.microsoft.com/)

Documentation for requesting access can be found here

3) Make sure you are connected to the F5 VPN

4) Connect to the bastion server

We recommend that you have two terminal sessions open during the change, one for the Database instance and another for the elastic search instance.

This is because we need to jump between the database and Elastic search VM during the change.

az ssh config --ip \*.platform.hmcts.net --file ~/.ssh/config

Non-prod: ssh bastion-nonprod.platform.hmcts.net Prod: ssh bastion-prod.platform.hmcts.net

Additional instructions for accessing the bastions can be found here: Bastion Documentation

5) Find the IP address for the service you wish to connect to and connect to Elastic Search node: For a list of IP addresses for each env, go to:

Connect to Elastic Search node from relevant bastion: ssh -p22 elkadmin@<ip address> -i <logstash ssh key>

6) Service team shutter the service being impacted. This step will be covered in the implementation plan for the change.

7) Get a count of the index

curl --silent localhost:9200/_cat/indices | awk '{print $3, $7}' | grep <your index name>

8) Get a count of the Dead Letter Queue on elastic search for the case_type_id

curl --header "Content-Type: application/json" ' http://localhost:9200/.logstash_dead_letter/_count' -d '{"query": {"match_phrase": {"failed_case": "\"case_type_id\":\"<your case type id>\""}}}'

9) Change the index to read only for cloning in case of roll back.

curl -X PUT "localhost:9200/<Your index name>/_settings?pretty" -H 'Content-Type: application/json' -d' {"settings": {"index.blocks.write": true}}'

10) Clone the index for backup in case of rollback

When naming the clone please use the normal index name with “cloned-” in front of it for ease of use and consistency.

curl -X POST "localhost:9200/<Your index name>/_clone/cloned-<Your index name>?pretty"

11) Set original index back to write mode

curl -X PUT "localhost:9200/<Your index name>/_settings?pretty" -H 'Content-Type: application/json' -d' {"settings": {"index.blocks.write": false}}'

12) Check DB count of service

Connect to ccd-data-store from the relevant Bastion (live or non live)

To connect to the database use the following commands:

DB_HOST=ccd-data-store-api-postgres-db-v15-<environment you are reindexing>.postgres.database.azure.com
DB_NAME=ccd_data_store
DB_USER="DTS\ Platform\ Operations\ PostgreSQL\ Admin\ Access\ SC"
PGPASSWORD=$(az account get-access-token --resource-type oss-rdbms --query accessToken -o tsv)
psql "sslmode=require host=${DB_HOST} dbname=${DB_NAME} user=${DB_USER} port=5432 password=${PGPASSWORD}"

Once connected run the following command to get a count of cases to be re indexed.

SELECT count(*) FROM case_data WHERE case_type_id = '<your case type id>';

13) Create temp table as the source for re indexing

The temp table step is only needed if we are above 10K cases but its still good practice to follow in smaller case counts.

create table <your case type id>_tmp_<date> as select id from case_data where case_type_id = '<your case type id>' and marked_by_logstash = true;

Running the below command should allow you to confirm the tmp table created as expected.

SELECT count(*) from <your case type id>_tmp_<date>;

14) Delete the index ahead of re indexing

curl -X DELETE localhost:9200/<Your index name>;

15) Service team will Merge CCD Definition File and creating new index.

16) Take another count of the index to confirm it is re uploaded

curl --silent localhost:9200/_cat/indices | awk '{print $3, $7}' | grep <your index name>

17) Start the re index process by running the following command on the database

UPDATE case_data SET marked_by_logstash = false WHERE id IN (SELECT id FROM <your case type id>_tmp_<date> order by 1 offset 0 limit 10000);

This will update 10,000 cases at a time from the temporary table to be re indexed. For jurisdictions with larger/more complex cases the 10K may need to be reduced to 5,000 or even 1,000.

The service team should be able to advise on the size/complexity of the cases.

The first batch of 10,000 will usually take slightly longer than average completing within 15-20 minutes.

18) Check the status of the re index on elastic search

curl --silent localhost:9200/_cat/indices | awk '{print $3, $7}' | grep <your index name>

Running this on the elastic search node should show the index count increasing as the re index progresses. It will likely take about 10-15 minutes for the cases to update in elastic search after the update command has completed on the database.

19) Repeat the update command increasing the offset each time until all cases have been re indexed, the offset will be the size of the previous batch.

So if the batch was 10,000 you would increase the offset by 10,000.

Do not kick off the next batch until you have confirmed the previous batch has been indexed in elastic search.

UPDATE case_data SET marked_by_logstash = false WHERE id IN (SELECT id FROM <your case type id>_tmp_<date> order by 1 offset 10000 limit 10000);

This example assumes batches of 10,000 but note that will need to be adjusted if smaller batches are required.

20) Once all cases have been re-indexed confirm counts match between the database and elastic search.

Database count: SELECT count(*) FROM case_data WHERE case_type_id = '<your case type id>';

ElasticSearch count: curl --silent localhost:9200/_cat/indices | awk '{print $3, $7}' | grep <your index name>

If the counts do not match check the DLQ for any cases that may have failed to index using the commands in the troubleshooting section.

21) Service team complete checks and unshutter the service.

Post Implementation steps

1) Lower the logstash pod duplicates back to 1 by merging the PR raised in the prerequisite steps.

2) Clean up the temp table and cloned index.

Temp table clean up command:

DROP TABLE IF EXISTS <your case type id>_tmp_<date>;

Cloned index clean up command:

Ensure the cloned index is deleted and not the live index.

curl -X DELETE localhost:9200/cloned-<Your index name>;

Rollback steps

If at any point you need to roll back the changes the following steps should be followed:

1) Delete the new index that was created as part of the re index

curl -X DELETE localhost:9200/<Your index name>;

2) Clone the backup index back to the original name

curl -X POST "localhost:9200/cloned-<Your index name>/_clone/<Your index name>?pretty"

3) Verify that the index has been cloned back correctly

curl --silent localhost:9200/_cat/indices | awk '{print $3, $7}' | grep <your index name>

You should see the clone alongside the original index with the original count.

4) Set the index to write mode

curl -X PUT "localhost:9200/<Your index name>/_settings?pretty" -H 'Content-Type: application/json' -d' {"settings": {"index.blocks.write": false}}'

If you run this command afterwards it should show the blocks write is set to false.

curl -X GET http://localhost:9200/<Your index name>/_settings?pretty

5) Set the index alias of our restore index

Check with service team what the alias should be if unsure.

curl -X POST "localhost:9200/_aliases?pretty" -H 'Content-Type: application/json' -d' { "actions": [ { "add": { "index": "<your index name>", "alias": "<your index alias>" } } ] } '

Run this command to verify the alias has been set correctly

curl -X GET "localhost:9200/_alias/<your index alias>?pretty"

6) Service teams to verify if the rollback has been successful

7) If rollback is successful clean up the cloned index and temp table.

Temp table clean up command:

DROP TABLE IF EXISTS <your case type id>_tmp_<date>;

Cloned index clean up command:

curl -X DELETE localhost:9200/cloned-<Your index name>;

Troubleshooting

Cases are missing from the re index

If after running the update command on the database and waiting the expected time for cases to appear in the index but a section are still missing. They may have fallen into the dead letter queue. Below are commands to check the dead letter queue and pull cases if necessary.

Count the DLQ for specific jurisdictions:

curl --header "Content-Type: application/json" ' http://localhost:9200/.logstash_dead_letter/_count' -d '{"query": {"match_phrase": {"failed_case": "\"case_type_id\":\"<your case type id>\""}}}'

Pull cases from the DLQ for specific jurisdictions:

curl --header "content-type: application/JSON" 'localhost:9200/.logstash_dead_letter/_search' -d '{  "query":{  "query_string":{  "query":"<case_type_id>"}}}' -o output.txt

If there is a large amount of cases in the DLQ you may need to limit the output to a smaller number, the following command should work for this purpose.

curl --header "content-type: application/JSON" 'localhost:9200/.logstash_dead_letter/_search' -d '{"from":0, "size":Y, "query":{  "query_string":{  "query":"<case_type_id>"}}}' -o output.txt

Where Y is the number of cases you want to pull from the DLQ.

Cases are taking longer than expected to re index

If cases are taking too long to reindex theres a chance that the logstash instance has hit an out of memory error. While unlikely thanks to the limits on the updates it can happen. Running the following commands should let you know if this is the case.

Kubectl get pods -n ccd | grep logstash

All of our logstash pods live in the ccd namespace so this should show all logstash pods.

In the event one of the pods has recently restarted you can describe the pod to check why it restarted, if it was out of memory lower the batch size for the update command. This command should work for describing the pod:

kubectl describe pod <logstash pod name> -n ccd

Our logstash pods are setup to hit the DB with queries constantly so to confirm if the pod is still working as expected you can check the logs with the following command.

kubectl logs <logstash pod name> -n ccd

If you see something like this the pod is healthy.

[2025-10-16T09:13:22,101][INFO ][logstash.inputs.jdbc     ][main][eb3d7b3b9b9a180615b49300b1c73b9aa68faa44355c7a8f4e27e7c00dc6acdf] (0.009306s) UPDATE case_data SET marked_by_logstash = true WHERE marked_by_logstash = false AND jurisdiction != 'DIVORCE' AND jurisdiction != 'CIVIL' AND jurisdiction != 'PUBLICLAW' AND jurisdiction != 'CMC' AND jurisdiction != 'PROBATE' AND jurisdiction != 'SSCS' AND jurisdiction != 'EMPLOYMENT' RETURNING id, created_date, last_modified, jurisdiction, case_type_id, state, last_state_modified_date, data::TEXT as json_data, data_classification::TEXT as json_data_classification, reference, security_classification, supplementary_data::TEXT as json_supplementary_data

The pod will auto restart if an out of memory error is hit, so all you need to do is lower the batch size for the update command.

This page was last reviewed on 9 October 2025. It needs to be reviewed again on 9 October 2026 by the page owner platops-build-notices .

This page was set to be reviewed before 9 October 2026 by the page owner platops-build-notices. This might mean the content is out of date.