Spreadsheet viewer has trouble displaying large tabular files #20

jggautier · 2023-01-24T14:47:53Z

A depositor reported last week that the spreadsheet viewer is having trouble viewing the CSV file they uploaded to the Harvard Dataverse Repository.

Because the file is not published, I can't share it publicly, but the depositor said I could share it privately with any colleagues who want to do more digging. In the meantime, the depositor wrote that they'll add a note in the dataset or file metadata to explain the situation with the file previewer.

The file is 17.4 MB, with 10 columns and 134 rows. The cells in one of the columns has a lot of text. Once the spreadsheet viewer is able to load the preview, it doesn't display all of the columns right away and there's no indication that the viewer is still trying to load parts of the file. This made the depositor think that the viewer would never display all of the columns.

Questions
How quickly the viewer can show the entire tabular file depends at least partly on the user's internet speed and/or computer. Is those two factors?

Recommendations

Let users know that the previewer is trying to display the file. This way users know if the viewer has finished trying and has failed to display all or parts of the file. Sometimes I do see an error graphic in the Preview tab indicating that the preview failed to load, but I don't with the 17.4 MB file and other larger files I've looked at.
Let installation's set a byte size limit specifically for the spreadsheet viewer.
- If each installation is allowed to set the byte size limit for the spreadsheet viewer, then installation admins would have to figure out which limit to set, maybe by doing some performance testing to answer questions like what's the largest tabular file that the spreadsheet viewer can display using the "average" computer and "average" internet speed (assuming that those are factors in how quickly the previewer can display tabular files).
Make the spreadsheet viewer show only a certain number of rows and columns, and let users know that only a certain number of rows and columns are being shown.
- This way the size of the file doesn't matter as much.
Let depositors turn off the previewer for certain files
- This solution might scale best if enough depositors are aware of the previewer and aware of how to turn it off when they don't like how it displays their files. So in addition to testing this functionality with users before it's implemented, after it's implemented we would need to review the number of depositors who turn off the previewer versus the number of files that are too large to display quickly, to see if most depositors have turned off the previewer for files that cannot be displayed on "average" computers and over "average" internet speeds (if those are two factors)

claudiodsf · 2023-01-25T10:09:44Z

Hi, I was going to post on this same problem today, when I saw this new issue 🙃

We have the same problem on a not-yet-published dataset, which I cannot share, but I found an example on Harvard Dataverse (89.5 MB - 145 Variables, 56200 Observations).

https://dataverse.harvard.edu/file.xhtml?persistentId=doi:10.7910/DVN/D1N0GO/3NK9D8

I agree on the proposition that there should be a limit on file size (bytes or number of observations) which the admin could configure at install time.

pdurbin · 2023-01-25T13:33:50Z

It's a somewhat longstanding problem so thank you to @jggautier and @claudiodsf for getting the discussion going here. 😄

My first thought is that the next version of Dataverse (5.13 probably) will include a new feature for the external tools framework whereby tools can express "requirements" that they need to operate. Here's an example...

  "requirements": {
    "auxFilesExist": [
      {
        "formatTag": "NcML",
        "formatVersion": "0.1"
      }
    ]
  }

... from this pull request:

add NcML previewer #17 #18

What's going on here is that the NcML preview tool has a requirement that a certain auxiliary file be present for the eyeball to show up (to offer a preview, that is).

Perhaps, like @jggautier suggested with "let installations set a byte size limit specifically for the spreadsheet viewer" each tool could express a size limit, something like this:

  "requirements": {
    "sizeLimitInBytes": 8388608
  }

The idea would be to simply not show the eyeball for large files.

We could get fancier, of course, as suggested above (preview only some rows) and maybe the logic should be in the spreadsheet viewer itself, but I thought I'd at least mention this new "requirements" feature.

For now, docs are here (look for "requirements"): http://preview.guides.gdcc.io/en/develop/api/external-tools.html

It was added in this PR:

extract metadata (NcML XML) from NetCDF/HDF5 files, new "requirements" option for external tools IQSS/dataverse#9239

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spreadsheet viewer has trouble displaying large tabular files #20

Spreadsheet viewer has trouble displaying large tabular files #20

jggautier commented Jan 24, 2023 •

edited

Loading

claudiodsf commented Jan 25, 2023

pdurbin commented Jan 25, 2023

Spreadsheet viewer has trouble displaying large tabular files #20

Spreadsheet viewer has trouble displaying large tabular files #20

Comments

jggautier commented Jan 24, 2023 • edited Loading

claudiodsf commented Jan 25, 2023

pdurbin commented Jan 25, 2023

jggautier commented Jan 24, 2023 •

edited

Loading