Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(app): Update robots from USB flash drive #13923

Merged
merged 8 commits into from
Nov 7, 2023

Conversation

sfoster1
Copy link
Member

@sfoster1 sfoster1 commented Nov 6, 2023

This PR adds the ability to update a Flex's software from a USB flash drive inserted in the front (or I guess rear) port containing an update file. This closes a functionality gap where Flexes can be really hard to update with absolutely no app connectivity, given that the OT-2's fallback of "pull and reflash the SD card" isn't available.

It also fixes a bug where updates downloaded from the internet on the ODD wouldn't work because of old code we'd added when

These are a lot of changes, so I separated them by commit:

  • a235409 introduces the filesystem watch for USB mass storage devices
  • b976819 adds error handling for the readstream we use to post files from the app shell
  • 7981c06 is a minor app-side fix to actually add update polling to the ODD
  • 87fd827 is the node side of providing system updates from USB, with a lot of changes to app-shell-odd/system-updates (and a bonus bugfix for an issue that caused the ODD to never in practice prompt for an update (RQA-1602)
  • 574a1d9 is the react side and is actually mostly about disentangling the robot update code from the onboarding flow

Risk assessment

Medium; this is a little-used flow on the ODD but it is pretty annoying to test, and it did touch onboarding

Testing

  • Updates during onboarding should still work
    • You should still go to the check-update screen from both the wifi connection successful screen and the ethernet connection successful screen
    • If no update is available (do this by running onboarding on the most recent version) then that screen should move you to estop connection
    • If the update succeeds, you should come up after reboot to the estop screen
    • If you cancel the update or there's an error you should go to the estop screen
  • Updates not during onboarding should never accidentally drop you into onboarding
    • If you update, when the robot boots you should not be in the estop screen
    • If you cancel the update, you should end up back at the robot settings screen
  • Updates should still, like, function
  • If no update is available from the internet, no prompt should be shown
    • If you then insert a USB drive, you should get a prompt for the update on the drive
      • even if it's for a lower version than you're currently running
    • If you then remove the USB drive the prompt should go away
      • If you remove the USB drive while viewing the update details screen the prompt should go away
  • If an update is available from the internet, a prompt should be shown (this is the little bonus bugfix)
    • The update's details screen should now actually have release notes
    • If you plug in a USB drive you should still have a notification, and the details screen should say it's from USB
    • If you remove the USB drive you should be back at the notification and release notes for the internet update
    • If you do an update with a USB drive you should get the update from the USB drive
  • If you start an update from a USB drive and then pull it out while the update is "downloading", you should more or less immediately get an error that says something about os errors (this is the http.ts thing)
  • If you start an update from a USB drive and then pull it out while the update is "validating" or later it should be fine

Closes RAUT-829
Closes RQA-1602

@sfoster1 sfoster1 requested a review from a team November 6, 2023 19:51
@sfoster1 sfoster1 requested review from a team as code owners November 6, 2023 19:51
@sfoster1 sfoster1 requested review from shlokamin and removed request for a team November 6, 2023 19:51
Copy link

codecov bot commented Nov 6, 2023

Codecov Report

Merging #13923 (c3a6a0f) into edge (52dc84a) will decrease coverage by 0.31%.
Report is 1 commits behind head on edge.
The diff coverage is 0.00%.

❗ Current head c3a6a0f differs from pull request most recent head c269170. Consider uploading reports for the commit c269170 to get more accurate results

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             edge   #13923      +/-   ##
==========================================
- Coverage   70.82%   70.51%   -0.31%     
==========================================
  Files        2478     1604     -874     
  Lines       69729    53501   -16228     
  Branches     8469     3686    -4783     
==========================================
- Hits        49384    37728   -11656     
+ Misses      18261    15125    -3136     
+ Partials     2084      648    -1436     
Flag Coverage Δ
app 38.84% <0.00%> (-29.96%) ⬇️
components 63.34% <ø> (ø)
labware-library 49.17% <ø> (ø)
protocol-designer 45.63% <ø> (ø)
step-generation 84.95% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
app-shell-odd/src/system-update/release-files.ts 18.75% <0.00%> (+0.38%) ⬆️
app-shell-odd/src/main.ts 0.00% <0.00%> (ø)
app-shell-odd/src/update.ts 36.58% <0.00%> (-1.88%) ⬇️
app-shell-odd/src/http.ts 23.07% <0.00%> (-1.93%) ⬇️
app-shell-odd/src/usb.ts 0.00% <0.00%> (ø)
app-shell-odd/src/system-update/index.ts 0.00% <0.00%> (ø)

... and 875 files with indirect coverage changes

Copy link
Contributor

@mjhuff mjhuff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome! Thanks for fixing the update check logic, too!

The Flex operating system automatically mounts the filesystems of
well-formatted USB drives (FAT and ext4 and maybe ntfs but that's a bit
iffy) to /media when those USB drives are inserted on the robot. In
theory it will in fact do this for _any_ kind of media that presents a
filesystem interface.

To that end, add a node task that will use a node filesystem watch to
keep an eye on /media, and
- when something that looks like a USB drive (/media/sd\w\d+) appears,
notify via redux actions
   - then enumerate all the files on it and notify those via redux
   actions
- when something we were keeping an eye on disappears, notify via
redux actions

The redux actions don't alter state and so don't need new reducers or
selectors; they exist because it's a handy mechanism to talk between our
components.

This code is very tightly coupled to the way the node fs interfaces work
and so I don't see a lot of point in unit tests for it; it's almost
entirely fs calls originating everything and providing all of the data,
and all the complexity is from working around weirdnesses in those calls
and in the underlying system. For instance,
- There's a little bit of time in between when the fs watch on /media
fires and when you can actually find the contents of the newly-present
directory; if you readdir before that you'll get an empty list, so we
wait a second
- The node fs.watch interface looks very fully features but is
absolutely chock-full of warnings about various features not being
reliable. A lot of that unreliability is _probably_ across systems and
everything works as we expect on linux, but just in case we have a lot
of fallbacks for if our callback doesn't get filepaths, etc
We have our custom http interface that wraps around node-fetch that
provides things like "doing your own read stream when posting a file",
and "mapping everything into the promise interface", which is nice,
but has an issue specifically for that read stream: we don't monitor
errors on it. Read streams surface errors by emitting an 'error' event;
we hook up a listener to that error event _while we're creating the
stream_, but then we disconnect it. So if you have an error in the
stream - for instance, you're reading from a file on a USB flash drive
and the user unplugs the flash drive - then the error will never get
surfaced.

Unfortunately the fix to this is a bit fiddly. We can hook up an error
listener fine, but it needs to do something; specifically, it needs to
turn the error from a callback into a promise rejection. That means it
needs to have a promise to reject that has the same lifetime as the
stream itself. http.post didn't provide that because it returns a whole
big promise chain, and each time you move a link in that chain the old
promise is gone and a new one happens, so we'd need to move the listener
around.

Since promises are monadic, a better fix is to have post return a single
promise and do all the promise chaining _inside_ that promise; then, the
read stream error handler can reject the outer promise directly, while
relying on promises bubbling up rejections to preserve error handling
capability for the promises in the internal chain.
Though we have everything set up to automatically fetch, prompt for, and
execute robot updates from the ODD, we weren't actually _checking_ for
those updates except once on boot (which then wouldn't work if the robot
wasn't internet-connected during boot). This means in particular that
the software updates during onboarding were guaranteed to fail.

We can use the same hook in the ODD app root that we do in the desktop
app route, but if we're going to do that then we better remove a log
message that suddenly becomes extremely spammy.
Adds the capability to provide system updates from flash drives to the
ODD app-shell.

These are "system updates" in that the app-shell determines their
availability and provides it to the app, rather than the user indicating
the presence of a file alongside their intent to update. The app-shell
will advertise the flash drive updates in the same way it advertises
internet-discovered updates, with a RobotUpdateInfo redux message; since
those now provide the path to the file they mean, it will be easy for
the app to specify the system update to load.

We can duplicate the logic that we use for system updates by adding a
second let cache for the "current update"; the system-updates code will
then prefer an update in the mass storage update cache to an update in
the old system updates cache, and send new robot update info messages in
all the state changes between neither cache being full; either cache
being full; and both caches being full.

The determination that a flash drive system update is present is
triggered by a mass storage enumerated message; when that flash drive
gets removed, we'll get a removal message.

To figure out whether updates are actually present, we can the list of
files that just got enumerated for things that end with .zip, and then
try to open them as zip files and read the VERSION.json information out
of them. This is a somewhat fraught process; the file could not be a zip
file, it could be a zip file but corrupted, it could be a zip file but
not an update, it could be an update but it's for an OT-2,  and we need
to handle all that, so there's a pretty excessive amount of error
handling in here. Once we're sure that there are one or more zip files
containing robot system updates, we can provide something to redux; we
provide the highest-version update present.

There is one way in which updates from flash drives differ from system
updates found on the internet, however: plugging in a flash drive
requires user intent, while checking for updates on the internet
doesn't. Therefore, if the user plugs in a flash drive with an update
file, we always want to make that update file available no matter the
relative versions of the robot and the update file. So we can add a bool
to the system update message (and then to the update state) that shows
that this is a "forced notification" update, and the app can know to
display it without caring about the upgrade/downgrade/reinstall state.

Since there's a lot of duplication, we can also factor out some common
logic to make it feel a little better.

That process of duplication also fixes a bug that would have prevented
the ODD from ever prompting for updates. The function that gets
information about updates used the same promise to read the release
notes and provide the update information; but we overrode the downloaded
release files to null out the release notes, meaning that promise would
always fail, and we'd never get the notification. We no longer override
the release notes to be null, and we also treat reading the release
notes separately from reading the rest of the update.
Now that the odd app-shell provides us with notifications of updates
from USB flash drives, we can allow the user to install them. While the
redux mechanisms allow this pretty easily - a system update is a system
update, after all, and with the force mechanism the app wouldn't even
know if the update was a downgrade or anything - we ran into a problem
where the general robot update machinery in the ODD was very tightly
bound with the onboarding experience for the ODD, since that's the
context in which it was developed.

This commit extracts the robot update mechanisms from onboarding by
- Hoisting onboarding-related logic out of lower level components and
instead injecting that logic into the organisms code from the top level
page
- Moving the current update page to a new one that is focused on
onboarding at a new route, and copying just the update-related code to
a generic RobotUpdate page

This means that the two pages - RobotUpdate and
RobotUpdateDuringOnboarding - share most of the same code but are bound
to different routes and can have different top level behavior by
injecting different contexts to the finish and error handling behaviors
of the update. RobotUpdateDuringOnboarding sets the unfinished
onboarding page breadcrumbs appropriately, and uses display language
appropriate to the update being just a component of the larger workflow,
and moves on to estop handling when cancelled; RobotUpdate doesn't touch
any of that, and goes back to the settings page when cancelled, and uses
wording more appropriate to being its own topline flow.

Closes RAUT-829
- Fix a state issue where we didn't clear sessions; we now clear them on
retry or cancel
- Handle surprise unmounts that leave /media populated by also watching
/dev
- Fix a bug where the initial usb scan used full paths instead of device
names; now everything uses full paths
@sfoster1 sfoster1 force-pushed the raut-829-add-update-from-usb-stick branch from 07994f9 to c3a6a0f Compare November 7, 2023 16:05
@sfoster1 sfoster1 merged commit 30425f7 into edge Nov 7, 2023
20 checks passed
@sfoster1 sfoster1 deleted the raut-829-add-update-from-usb-stick branch November 7, 2023 16:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants