Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forms migration not updating form 10-334 #18898

Closed
4 tasks
jilladams opened this issue Aug 9, 2024 · 6 comments
Closed
4 tasks

Forms migration not updating form 10-334 #18898

jilladams opened this issue Aug 9, 2024 · 6 comments
Assignees
Labels
Defect Something isn't working (issue type) Drupal engineering CMS team practice area Find a form CMS managed product, owned by Public Websites team Public Websites Scrum team in the Sitewide crew sitewide

Comments

@jilladams
Copy link
Contributor

jilladams commented Aug 9, 2024

User Story or Problem Statement

July 18 we identified that Form 10-334 was displaying a future "Form updated" date on VA.gov: April 2028

CMS data

Form node: https://prod.cms.va.gov/find-forms/about-form-10-334
Revision date: April 2028
Revision logs: no recent revision logs from the CMS migrator

CMS screenshot showing Forms DB data with future revision date

Screenshot 2024-08-09 at 4 24 40 PM

Front end

FE page: https://www.va.gov/find-forms/about-form-10-334/

FE screenshot

Screenshot 2024-08-09 at 4 26 07 PM

Data updates / migration

On July 31, the VHA forms manager said they had updated the future date in the Forms DB for both the internet and intranet.
When I checked the page on 8/9, the future date still appears and the node still doesn't show any CMS migrator activity.

However, the Forms migration from 8/9 shows a Revision date for this form of Apr-23 (4/1/2023).
va_forms_data (8).csv
Screenshot 2024-08-09 at 4 23 17 PM

I don't know what the InstallationDate field means in the migration (field map says "Sometimes corresponds to issue_date (2016-09-20 08:07:10"), but it's timestamped 7/30/2024 2:31:10 PM, which is about the timeframe that the Forms manager said they'd updated the Forms DB for the internet (vs. intranet which came later but we don't really care about).

The mystery: why doesn't this revision date update appear on the CMS form node?

Acceptance Criteria

  • Latest migration data for VA forms is correctly migrated and saved on VA form nodes, including:
    • Issue date
    • Revision date
    • Title
@jilladams jilladams added Needs refining Issue status Defect Something isn't working (issue type) Public Websites Scrum team in the Sitewide crew sitewide Drupal engineering CMS team practice area Find a form CMS managed product, owned by Public Websites team labels Aug 9, 2024
@jilladams
Copy link
Contributor Author

We do have evidence that the Migration is making CMS migrator revision logs on other nodes:
https://prod.cms.va.gov/find-forms/about-form-21p-509 - 8/9
https://prod.cms.va.gov/find-forms/about-form-21-526ez - 8/9
https://prod.cms.va.gov/find-forms/about-form-22-10215 - 8/8

@FranECross FranECross removed the Needs refining Issue status label Aug 14, 2024
@jilladams
Copy link
Contributor Author

Daniel still digging, we can't reproduce the problem so far.

@dsasser
Copy link
Contributor

dsasser commented Sep 9, 2024

Summary of Findings: Drupal Migration Bug Investigation

Overview

Over the past few days, I have been investigating a potential bug in the VA Forms migration process. The bug is related to detecting changes in CSV files fetched over HTTP, specifically those that contain form data with a revision_date field change. The issue was originally noted in the production environment, but I was unable to reproduce the problem locally or in Tugboat.

Key Findings

HTTP Fetch Consistency: The migration process reliably fetched the updated CSV files over HTTP, and changes to the revision_date, issue_date, and title fields were reflected in both environments. This suggests that the HTTP data fetcher is functioning as expected in both environments.

Cache Behavior: I initially suspected a caching issue in production, where the migration might be pulling cached versions of the CSV file, preventing updates from being applied. However, after clearing caches, resetting migration statuses, and closely monitoring cache behavior, I ruled this out as the cause.

Migration Map Tables: The migration map tables (stored in the migrate_map_* tables) store state information about previously processed rows, which prevents Drupal from reprocessing unchanged data. These tables were correctly reset during testing, and data was consistently updated when triggered by upstream changes.

Reproducibility: Despite extensive testing, I could not reproduce the bug locally. All changes to the CSV file, including revision_date, were detected without issue when testing in my local environment.

Issue Description

The migration process is intended to update VA Forms in Drupal whenever changes are made to upstream Forms Database CSV files, which are fetched from a remote URL using the http data fetcher. The focus was on changes in the revision_date field, which should trigger updates to the associated content in Drupal, though there are other fields that should do this as well including but not limited to the issue_date and title.

Modifications to the revision_date and other fields in the upstream Form database CSVs, Drupal's migration did not consistently reflect those updates in production. However, when I replicated the environment locally, the migration process worked as expected, detecting and applying changes to the revision_date field without issue.

Environment Details

Production Environment: The bug was observed in the live environment where the CSVs are hosted on an local filesystem, but the files are still fetched over an HTTP server. The data is fetched using Drupal's url source plugin and the http data fetcher from Migrate Plus.

Local Environment: When replicating the production setup locally, the migration consistently detected changes in the CSV file, including updates to the revision_date field.

Potential Causes

While I was not able to fully reproduce the issue locally, there are a few potential causes that could explain the inconsistency:

Environment-Specific Factors: The issue could be tied to environmental differences between production and other environments, such as file system caching, network latency, or configuration differences in server settings.

Intermittent Connectivity Issues: In the production environment, temporary connectivity issues between the HTTP server hosting the CSV and the Drupal instance could result in failed updates, but this would not consistently explain the behavior.

File Integrity: There could be differences in file integrity between what is fetched locally and what is fetched in production, though no concrete evidence of this was observed during testing.

Resolution Steps

Network and Server Checks: Investigate any network or server configuration issues that may prevent proper fetching or caching in production.

Continue monitoring the migration process in production and compare logs with the local environment to pinpoint any subtle differences.

If the issue persists, consider setting up a staging environment more closely mirroring production to try to reproduce the issue under more controlled conditions.

Conclusion

While the exact cause of the bug in the production environment remains elusive, my investigation ruled out several potential issues, including caching, Drupal migration processing logic, and fetching inconsistencies if the problem is persistent/pervasive. Further investigation is required to determine whether environment-specific factors are contributing to the problem which would itself require someone on the CMS team to be involved.

Appendix

Overwrite Properties - Fields that when updated overwrite existing Drupal field data. These fields are:
- changed
- field_va_form_administration
- field_va_form_deleted
- field_va_form_deleted_date
- field_va_form_issue_date
- field_va_form_num_pages
- field_va_form_number
- field_va_form_revision_date
- field_va_form_row_id
- field_va_form_title
- field_va_form_url
- new_revision
- revision_default
- revision_log
- revision_timestamp
- revision_uid
- title
- uid

@jilladams
Copy link
Contributor Author

These notes are excellent, thanks so much for showing your work. Sounds like we might end up hitting up the CMS team for info on devops / env stuff -- holler if you need any support, if you go that route.

@jilladams
Copy link
Contributor Author

BLUF: We can't repro the bug, and should close this as can't repro. But: we can opt to add some logging or do some stakeholder outreach with Forms Managers to rule out a more system issue, short term. Notes below that need a product call. cc @FranECross @mmiddaugh for product input.

From 1:1 with @dsasser :

  • He was able to test on Tugboat: provide an alternate CSV, make a CSV change, run the migration, and see that change get picked up by Drupal CMS successfully. We officially cannot repro this issue.
  • We've effectively ruled out everything from Forms, CSV, migration, caching (file system, Drupal).
  • We believe that this may have been a one-time fluke potentially at the network layer. We can't prove that without a history of the migration CSV, to compare times / network logs, etc.
  • One thing we will document: There is an override from the migration, in PHP settings files. That's not obvious up front, so Daniel will update https://github.com/department-of-veterans-affairs/va.gov-cms/blob/main/READMES/migrations-forms.md to note it for future.

Some ideas about what to do next:

  1. We could ask the Forms Managers to let us know when they update forms for a period of time, to be able to monitor whether Drupal picks up changes we have been notified about. I could send that email, if we think this is an ok minimal approach. Looking at the history of Form edits, it looks like we get updates from the migrator every couple days, so it might not take long to increase our confidence that we are getting most changes successfully in prod.
  2. We could store some history of CSVs (1 week / 1 month?) in S3, in a bucket where we are only charged for accessing data to minimize cost, in order to have a record of prior Forms DB changes. That would allow us to programmatically compare the contents of CSVs from different days, to know when a change was introduced and should have been made by the migration, and then inspect if any network logs / issues might be affecting the CSV migration. We would likely need some CMS permissions / buy in to go this route.

Either of those should likely be a new ticket, depending on how we want to approach it. Leaving this ticket open til we get a chance to discuss / decide if we'll follow up in a new ticket.

@FranECross
Copy link

FranECross commented Sep 17, 2024

@jilladams @dsasser Let's go with #1 for now. Closing ticket and will create a new ticket! ticket created for this option. If nothing shakes out, and this occurs again in the future, we can then explore #2. cc @mmiddaugh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Defect Something isn't working (issue type) Drupal engineering CMS team practice area Find a form CMS managed product, owned by Public Websites team Public Websites Scrum team in the Sitewide crew sitewide
Projects
None yet
Development

No branches or pull requests

3 participants