Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow entities to be deleted/cleared locally #6012

Closed
6 tasks done
Tracked by #5991
seadowg opened this issue Mar 12, 2024 · 10 comments · Fixed by #6311
Closed
6 tasks done
Tracked by #5991

Allow entities to be deleted/cleared locally #6012

seadowg opened this issue Mar 12, 2024 · 10 comments · Fixed by #6311
Assignees
Milestone

Comments

@seadowg
Copy link
Member

seadowg commented Mar 12, 2024

Blocked by #6011
Blocked by #5623
Blocked by #6245

Entities that are created/locally should be deletable like other data in the app. Users should not be able to delete an instance that has created/updated an entity (to prevent data inconsistencies with the server) until it has been sent, but should be able to clear all saved forms/entities out at once from settings.

Acceptance

  • Given I've created an entity locally
    When I reset "Saved forms and entities" (in "Project management" settings)
    Then the entity no longer appears in follow-up forms
    And the entity does not appear in "View local entities"

  • Given I've created an entity locally
    When I navigate to "Delete saved form"
    Then the instance that created the entity does not appear in the list

  • Given I've created an entity locally
    And submitted the form that created it
    When I delete the instance that created it via "Delete saved forms"
    Then the entity still appears in follow-up forms

  • Given I've updated an entity locally
    And a previous version of the entity was in an entity list
    When I reset "Saved forms and entities" (in "Project management" settings)
    Then the previous version is shown in follow-up forms
    And the previous version appears in "View local entities"

  • Given I've updated an entity locally
    When I navigate to "Delete saved form"
    Then the instance that updated the entity does not appear in the list

  • Given I've updated an entity locally
    And submitted the form that updated it
    When I delete the instance that updated it via "Delete saved forms"
    Then the entity still appears in follow-up forms

Questions

  1. Where/how should we store entities? This feels like the right place to make that decision.
    • We're going to use a SQLite DB stored in metadata alongside the other databases.
@seadowg seadowg mentioned this issue Mar 12, 2024
24 tasks
@seadowg seadowg changed the title Entities can't be cleared and aren't deleted when deleting a project Allow entities to be deleted/cleared locally Mar 12, 2024
@seadowg seadowg added this to the v2024.2 milestone Mar 12, 2024
@seadowg
Copy link
Member Author

seadowg commented Mar 12, 2024

@lognaturel what happens on instance deletion isn't something we've actually discussed before! It feels fairly clear to me how it should behave, but have a look over the acceptance to check that you agree with the scenarios or if you think I'm missing something.

@lognaturel
Copy link
Member

Given I've updated an entity locally
And the entity had not appeared in an entity list
When I delete the instance that updated it
Then the entity no longer appears in follow up forms
And the entity does not appear in "View local entities"

Let's say I created EntityA locally with SubmissionA. Then I updated EntityA locally with SubmissionA1. If I delete SubmissionA1, at most I would expect the update to be rolled back, not the whole entity creation. If we delete the entity, then no further follow up is possible. I think this violates some of the goals we described for entity updates: we'd like users in the field to continue being able to make progress even if they make a mistake.

Given I've updated an entity locally
And a previous version of the entity was in an entity list
When I delete the instance that updated it
Then the previous version of the entity appears in follow up forms

I think this is going to be difficult to do well. What if you had 3 offline updates and you delete the submission responsible for the second one? I don't think it's worth the complexity to keep track of all those revisions and come up with a process for rolling back one in the middle.

This is a tricky area indeed!

I don't think we want to enable individual entity deletion ever. It will eventually be possible for form submissions to archive entities which will have the effect of hiding them locally but I don't think we want to do anything like that outside of form submissions.

The issue is that form submissions are the way we apply changes to entities and those can currently be deleted before they're sent. If a user deletes a submission without that having any impact on the local entity representation, then the server and client can get irrecoverably out of sync.

Here are some directions we could go in:

  • Filter finalized and unsent entity-related submissions out of the delete view. Group deleting local/"dirty" entities (or maybe more simply all entities) with the option to delete "instance data" from settings. That means the only way you can delete entity-related submissions is to also delete the updated entities.
  • Like in Central, document that once an entity change is applied, the submission-entity link is severed. That means if you delete a submission that had an effect on an entity, that is not expected to have an effect on the local entity representation. The problem with this is that it leaves the door open to clients having "ghost" entities that can never exist on the server because the corresponding submissions will never make it there. But maybe we also document that we strongly recommend hiding submission deletion when using entities and/or we do it for Central QR codes.

@seadowg
Copy link
Member Author

seadowg commented Mar 22, 2024

@lognaturel ooooft good thinking!

Filter finalized and unsent entity-related submissions out of the delete view. Group deleting local/"dirty" entities (or maybe more simply all entities) with the option to delete "instance data" from settings. That means the only way you can delete entity-related submissions is to also delete the updated entities.

This feels like a good starting point and would allow us to layer on more complex "instance with entity" deletion behaviour later. I think your point about the "ghost" entities convinces me we don't want to allow entities to exist after their instance is deleted.

If you're happy with that I'll update the acceptance.

@lognaturel
Copy link
Member

I'll update the acceptance.

Sounds good!

@seadowg
Copy link
Member Author

seadowg commented Mar 22, 2024

@lognaturel I've updated the acceptance so all "Saved forms and entities" can be reset from project settings (replacing the current "Saved forms" option), but saved forms that have created an entity that's currently dirty/offline can't be deleted.

@lognaturel
Copy link
Member

I'm liking allowing deletes only as all-or-nothing and grouping form instances and entities! It feels easy to explain and I'm not finding any obvious way to get into a bad state.

Given I've updated an entity locally
And a previous version of the entity was in an entity list
When I reset "Saved forms and entities" (in "Project management" settings)
Then the previous version is shown in follow up forms
And the previous version appears in "View local entities"

I want to explicitly call out that this implies that conceptually, "offline/dirty" entities live alongside the online entities they're based on. This potentially adds quite a bit of complexity. As we drive out the offline data model, I think we should carefully consider whether there are other user-facing requirements that lead to this need. For this need only, we might be able to do something like delete all entities from the entity list and reload from CSV (and then we'd need another answer when we add progressive updates).

@seadowg
Copy link
Member Author

seadowg commented Mar 26, 2024

As we drive out the offline data model, I think we should carefully consider whether there are other user-facing requirements that lead to this need. For this need only, we might be able to do something like delete all entities from the entity list and reload from CSV (and then we'd need another answer when we add progressive updates).

Yeah I agree. I've been mulling this over as it feels like a big add to complexity, yet I'm not clear on what the alternative would be. I don't think we can end up in a scenario where the updated entity is missing until the next update. Is there another possibility you can think of?

I think your idea on implementation is going to be what we end up with: we delete our entity storage and "reingest" from the CSVs.

@seadowg
Copy link
Member Author

seadowg commented Jul 8, 2024

I've been having a think about different ways to store entities, and far as I can tell we're definitely looking at a SQLite database (per project). Given that, it feels like there are two options for how to structure it: a single table for all entities (with a string column for the entity list and another for properties that uses JSON), or a multi table version with a table for entity lists and then a table for each entity list (with a column for each property and a foreign key linking it to the list). The former is definitely simpler as we avoid anything "dynamic": we don't have to create tables on the fly or migrate them when new properties appear.

We haven't actually done it yet, but we will probably need to query properties in the future (for #5623 for instance). This makes the single table option risky as we don't know how performant the SQLite JSON functions will be. Given that I'm going to start #5623 next (probably to just get to a place where understand how it will be structured) and then do an experiment to investigate the JSON performance after. My reasoning for not going straight into the JSON experiment is that there's still an open question around how exactly we'll deal with the fact that we need to be able to query datasets and return references rather than data. That should give us more info to continue discussing this. In the meantime, let me know if there are any alternatives I might have missed or anything that doesn't make sense.

cc/ @grzesiek2010 @lognaturel

@lognaturel
Copy link
Member

lognaturel commented Jul 16, 2024

In Collect on a real Android device, do some quick checks with jsonb column vs normalized:

  • 1 million rows, 20 properties with uuid values, 1 property has the same value 100 times (or maybe generate numbers between 1-100000 and naturally get duplicates)
    • Query for a repeated value
    • Sort on one of the properties
    • With and without indexes

@seadowg
Copy link
Member Author

seadowg commented Jul 18, 2024

It looks like the JSON functionality is not available in the version of SQLite that ships with Android until Android 14 which makes using it larger hurdle (even if the performance is good enough). Given that, I reckon we're most likely looking at using the "multi table" implementation. Here's my dodgy sketch of how that could look:

Image

And just to clarify, I'm implying that entities would be linked to datasets (which we'd probably refer to as "lists" now) via a basic foreign key relationship. A few things to point out about this kind of implementation:

  • We'll need to perform CREATE TABLE (creating the entity fields and properties as columns) when a call to EntitiesRepository#save is made with an entity which has a list that doesn't exist yet.

  • We'll need to perform an ALTER TABLE ADD COLUMN whenever we encounter a property that doesn't currently have a column in #save as well.

  • Querying entities is going to be a little complex due to the needs of index (although we should be able to use row_number() on Android 11+). We'll likely need a hacky nested query like:

    SELECT (SELECT COUNT(*) FROM entities r WHERE e.index > r._id) rowNumber, * 
    FROM entities e
    

I think another thing to decide is where to actually store the .db file we'll end up with for this. I'm thinking that for the moment we should just store it in metadata with the other DB files and then look at having a discussion about moving all of those to internal storage further down the line. This will mean that the user can potentially tamper with the file, but I think we should treat that as "voiding the warranty" and not account for it.

Let me know if there are any thoughts on any of that!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: done
Development

Successfully merging a pull request may close this issue.

2 participants