In June 2017, we (Go Free Range) spiked on the idea of converting the app to use the Publishing API as its canonical/only persistence mechanism. This work was captured in the spike-fully-migrated-wrt-publishing-api
branch. The following notes record our discoveries and highlight potential problems.
- Working OK
- Working OK
- Working OK
- This had to change a bit, because the attachments are embedded in sections in the local database. When we switch to using the Publishing API for persistence, we can no longer rely on the local database, and so we have to save the attachment's section to the Publishing API.
-
Currently never read from the Publishing API so no changes needed.
-
Currently uses custom attribute in details hash passing through Publishing API - does not use either of the official Publishing API mechanisms.
-
We've solved this in the spike when loading all manuals by using
get_content_items
in conjunction with thelink_organisations
option and when loading a single manual by usingget_content
and checking the manual's organisation matches that of the user. -
It's not completely obvious where this logic should live, but we're confident we can make it work.
-
It looks as if we can keep this as it is for now, although that will mean keeping the local database in order to persist the instances of
ManualPublishTask
which represent the publish jobs queued byManual::QueuePublishService
and processed byPublishManualWorker
. -
We think this is currently done asynchronously because we need to call
patch_links
,put_content
&publish
for the manual and each of its sections which could easily be a lot of calls to the Publishing API. -
Once we're using the Publishing API for persistence, we suspect we may not need to call
patch_links
orput_content
, onlypublish
, and this may mean it's no longer so important to do this asynchronously. But it's probably simplest to keep it as it is, although see the issue aboutVersionMismatchError
below.
-
We think
Manual#version_number
is always set to 0 at the moment in the spike, i.e. we don't think it's being set from the Publishing API. This is incorrect and the only reason we don't see aVersionMismatchError
is because0 == 0!
-
It looks as if there is a bug in
master
which means that this exception is never raised, so in a way we don't need to do anything in the spike to replicate this behaviour! -
If we need to, we can obtain the version number from the Publishing API via the
state_history
by finding the highest version number key in the Hash. However, it's worth noting that it looks as if the version number is not incremented if you keep saving a draft. We don't know whether it's a coincidence, but this behaviour actually matches what's happening inmaster
with the version number stored in the local database. -
As we mentioned in the GitHub issue (see above), we think the best way to fix this is to prevent edits to a draft which is queued for publication.
-
Are we loading these timestamps correctly and/or do we need to load them correctly?
-
Publishing API:
first_published_at
,public_updated_at
-
Manual:
originally_published_at
,use_originally_published_at_for_public_timestamp?
-
As far as we can see, although at one point the app used Kaminari to do pagination, it doesn't currently.
-
The
get_content_items
method we plan to use for the manuals index page paginates results by default (50 per page). -
At a minimum, we will need to ensure that all manuals are listed, even if it is more than 50, because that is the current behaviour.
-
The simplest way to do this may be to set the
per_page
optionget_content_items
on to a very large number; much bigger than the number of manuals in production. we have verified that this works and there doesn't appear to be any maximum value as far as the Publishing API is concerned. -
We've tried to load the manuals index page with recent production data and it times out because the SQL query in Publishing API takes ~9 seconds. So we think we're going to need to re-think this and probably introduce pagination of some sort.
-
It looks as if Specialist Publisher does pagination using Kaminari and passes the various pagination options on to the Publishing API
-
As mentioned above, when we run the whole system with production-like data, we start running into timeouts with calls to the Publishing API. We've seen this when trying to load a lot of manuals on the index page (see above), but also when trying to create a new section on a draft manual. In this case, at first glance, the slowest queries we could see were ones to do with loading or updating links.
-
One optimisation we can imagine is if we stop putting the association between manuals & sections and vice versa in the details hash, and instead rely entirely on the links in the Publishing API, we could avoid some operations, e.g. no need to save manual content when adding or removing a section; we could just update the links.
-
However, this could all be a bit of a blocker. We can imagine we might need to tune the queries in the Publishing API which we can imagine taking a lot of time.
- An alternative solution might be to add more attributes to the details hash so we can avoid/reduce calls to get links.
- Also we might be able to avoid reloading a bunch of data when processing a form submission if we store extra data as hidden fields in the form.
-
Given that we'll be building the same domain model objects in memory even though we'll have read the data from the Publishing API, we think the existing validation will continue to work.
-
However, we suspect it could probably be improved in order to ensure we avoid as many Publisher API exceptions as possible.
-
Alternatively we might want to remove all validation from Manuals Publisher and instead rely on exceptions from the Publishing API.
-
We'd prefer keeping the duplication of having the validation in the Manuals Publisher app because we think it will give a better user experience.
-
Since we should only be adding methods which read the Publishing API, we think it's safe to assume that we won't encounter any API errors unless something very out of the ordinary has occurred, in which case an un-handled exception is probably OK and will be seen in Errbit.
-
The only possible exception to this is when content items are not found.
-
Including "published with new draft" & "withdrawn with new draft".
-
We've implemented this in the spike based on this document in the Specialist Publisher repo. In particular we've copied the
StateHelper
class over in its entirety, because we're using almost all of its methods. TheStateHelper#state_for_frontend
method does what we need as long as we supply it with a document with astate_history
which is available from calls toget_content_items
andget_content
.
-
For now we've implemented this using the existing
PublishingAdapter#all
method and then checking all the slugs in memory. -
It would probably be more efficient to do the query as part of a custom call to
get_content_items
, i.e. using thesearch_in
andq
options for thebase_path
attribute.
- We've successfully updated the implementation of
PublishingAdapter#find
to load data for all the sections in a single call toget_content_items
and then use thesection_uuids
obtained fromget_links
to order them appropriately
-
Via the current user interface, this is only allowed for a section, not a manual. The way it currently seems to work is that
discard_draft
is called for the section on the Publishing API. Our understanding is that this will only have an effect if there a draft of the section exists and it will replace the draft section with the latest published section in the Draft Content Store. When the manual is next published, theunpublish
method is called on the Publishing API for sections which have been withdrawn (Manual#removed_sections
). -
It's not going to be straightforward to replicate this behaviour when using the Publishing API for persistence. In order to do so, we think we'd need to somehow mark the sections as "removed" in the Publishing API, perhaps using the details hash and then only actually unpublish them when publishing their manual. It might make sense for this behaviour to live within the Publishing API but then that's going to be even more work.
-
An alternative is to un-publish sections immediately on withdrawal, i.e. don't wait until we next publish the manual. However, we suspect this might not be acceptable to users, because a manual might be in an inconsistent state over an extended period.
-
Before switching over to using the Publishing API as the canonical persistence mechanism it would be advisable to check that all the data in the local database for existing content is already in the Publishing API.
-
We don't think it's safe to assume that older content has been saved to the Publishing API using the same attribute hash and it's possible some content items pre-date the Publishing API and were never added. Thus it would probably be sensible to re-publish all documents before switching over.
-
This is not currently possible via the user interface, although it is probably one of the Rake tasks which we will need to keep in case e.g. the content schema changes.
-
At first glance, the only complication we can see is how to implement
Manual#current_versions
which finds the latest published and latest draft versions if either exists, but we're fairly confident that should be possible. -
We can't find any documentation for the
update_type
supplied toput_content
orpublish
, especially what it means to set it to "republish". Our best guess is that it just means used the timestamps from the previous edition so that it looks as if it was saved/published at the time it was originally saved/published, but we're not sure about that.
-
We would suggest removing the code from these and simply displaying a message suggesting the user contact whoever is switching the app over to use the Publishing API as the canonical/only persistence mechanism.
-
As mentioned above, we probably need to retain some kind of republishing capability. We believe Specialist Publisher has a Rake task for republishing everything.
-
This could involve quite a bit of work, although it should mostly be a matter of removing database setup and replacing it with stubs on the
PublishingAdapter
. -
No attempt was made to do this in the spike branch.
-
In parallel with the spike we made a start on actually implementing the switchover for attachments. We tried to do this in incrementally by firstly changing the app to read from the Publishing API instead of the database, but still saving the attachments to the database; and secondly removing the code which saves the attachments to the database. The idea was that this would reduce the size of the commits, particularly the changes needed in the specs/features. However, it turned out that this introduced complications and we came to the conclusion it would probably be better to make the change in a single step.
-
Although it should be relatively straightforward, there would be quite a bit of persistence-related code to remove. Essentially the domain models should end up as
ActiveModel
classes with little, if any, behaviour i.e. mainly just attributes & validation. Also there will probably be no need to model manual/section editions separately. -
Once the switchover has happened, it would be good to remove the relevant tables from the database.