Skip to content

Fixing invalid HTML Wagtail pages that error

John Carroll edited this page Aug 4, 2022 · 14 revisions

After upgrading from Wagtail 1.x to 2.x, we started to have issues with the new RTE editor, Draftail, in which it will not understand HTML that is invalid. Therefore, there are times where it is necessary to shell into a particular page in order to fix those errors so that the page may be restored for editing in Wagtail again. Always perform these commands within dev first and once successful, do the same thing to prod.

Separate documentation about how to shell into a cloud.gov app can be found here: https://cloud.gov/docs/apps/using-ssh/

Below are instructions on how to access and replace page data with either the python shell(Option 1) or PSQL(Option 2)

# Target the space

cf target -s [environment]


# Shell into the app

cf ssh [app]


# Configure ssh session environment to run python

export DEPS_DIR=/home/vcap/deps
for f in /home/vcap/profile.d/*.sh; do source "$f"; done

# Go to the app fec directory

cd app/fec

Option 1) # Use Django python API to access models

 ./manage.py shell


# From the custom home models and import the model you would like to edit. Look at models here https://github.com/fecgov/fec-cms/blob/develop/fec/home/models.py

from home.models import [ModelClassName]


# Get the page object by ID

page = [ModelClassName].objects.get(id=[ID#])


# Access the page's body raw_data

page.body.raw_data
(raw_data is read-only, so it can't be used to write data to the a field)


# Clear the field so the page can be opened in Wagtail editor and recreated manually (Or see below on how to replace the field's data with corrected JSON/HTML)

page.body = []

# Save the page

page.save()

# Exit python shell

Control-D 

----------------

# Option 2) Or replace the field's data with corrected JSON/HTML in PSQL CLI:
(Note: "raw_data" exported above in the Python shell often must have single quotes either escaped or converted to double-quotes. So it's better to export the data in PSQL as shown below, so you don't have to do any extra formatting beyond correcting the offending HTML.

# Access PSQL

./manage.py dbshell 
(./manage.py dbshell will only work locally. To access PSQL in one of the cloud environments, see: https://docs.google.com/document/d/12PrFIq8EEP7Ws35T24JAmVB6Vntl4Y6PZCuj089X9Dw/edit#heading=h.miokqzykqbmw)

# Once in PSQL, your command prompt should look something like this:
   cfdm_cms_test=# 

# Copy the body field's data to a file:
\copy (SELECT body from public.home_examplepage where page_ptr_id=9868) To 'body_9868.csv';

# To copy the entire page record (with CSV headers for insert statement):
\copy (SELECT * from public.home_resourcepage  where page_ptr_id=11182) To '11182.csv' csv header;

# To replace  with corrected data:

 - BEGIN;

 -  update public.home_example
    set body='[{"type": "paragraph", "value": "<p>A joint fundraising committee is 
    ...etc etc...
    }]'
    where page_ptr_id=9868;

-  If successful, you should get this confirmation: 
   UPDATE 1

- COMMIT;

- If you get an error you can type ABORT; and then start again with BEGIN;

# Exit PSQL

Control-D