Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Reporting] Not able to generate CSV report on object fields; Discover is able #25068

Closed
tsullivan opened this issue Nov 2, 2018 · 15 comments · Fixed by #88303
Closed

[Reporting] Not able to generate CSV report on object fields; Discover is able #25068

tsullivan opened this issue Nov 2, 2018 · 15 comments · Fixed by #88303
Labels
bug Fixes for quality problems that affect the customer experience Feature:Reporting:CSV Reporting issues pertaining to CSV file export impact:high Addressing this issue will have a high level of impact on the quality/strength of our product. needs-team Issues missing a team label

Comments

@tsullivan
Copy link
Member

Kibana version: First reported with 6.2.2, reproducible in 6.4.2

Describe the bug:

Steps to reproduce:

  • Create an object field, either using repro from the bug above, or mapping a field as disabled
  • CSV report will fail with error field [fieldName] isn't a leaf field

Representing the field value works in Discover, presumably because Discover has a request body that uses the source option. CSV generation fails, presumably because the request body uses stored_fields in the request body.

Asking for stored fields is currently potentially problematic because not all fields have stored fields. To quote @colings86

Asking for stored fields on an object field itself will never work because you can't enable a stored field at that level hence why we throw an error.

@tsullivan tsullivan added bug Fixes for quality problems that affect the customer experience (Deprecated) Feature:Reporting Use Reporting:Screenshot, Reporting:CSV, or Reporting:Framework instead labels Nov 2, 2018
@tsullivan tsullivan self-assigned this Nov 2, 2018
@tsullivan
Copy link
Member Author

It looks like Discover uses a completely different search (_msearch) to render the table and doesn't make any filters on which fields to retrieve:

POST /_msearch
{"index":"test_index","ignore_unavailable":true,"timeout":30000,"preference":1541460732434}
{"version":true,"size":500,"sort":[{"_score":{"order":"desc"}}],"_source":{"excludes":[]},"stored_fields":["*"],"script_fields":{},"docvalue_fields":[],"query":{"bool":{"must":[{"match_all":{}}],"filter":[],"should":[],"must_not":[]}},"highlight":{"pre_tags":["@kibana-highlighted-field@"],"post_tags":["@/kibana-highlighted-field@"],"fields":{"*":{}},"fragment_size":2147483647}}

That means that the columns are shown/hidden possibly in some part of the browser memory, getting intialized from the visState data for saved searches. That would not work for CSV export.

CSV export gets the its search request body passed from the Discover app (the getSharingData() method), which [gets it from the courier SearchSource](: https://github.com/elastic/kibana/blob/d7da13b/src/core_plugins/kibana/public/discover/controllers/discover.js#L366)

Here's an example of the search request body generated for a test index, when the Discover table only has the goofy field picked:

POST /test_index/_search
{
  "stored_fields": [
    "goofy"
  ],
  "query": {
    "bool": {
      "filter": [],
      "must_not": [],
      "should": [],
      "must": [
        {
          "match_all": {}
        }
      ]
    }
  },
  "script_fields": {},
  "_source": {
    "excludes": [],
    "includes": [
      "goofy"
    ]
  },
  "docvalue_fields": [],
  "sort": [
    {
      "_score": {
        "order": "desc"
      }
    }
  ],
  "version": true
}

The bad part is the stored_fields. Just removing it fixes the problem, since goofy is already in the _source.includes part.

Really ugly workaround: remove the stored_fields part after copying the POST URL. Example, where the part to remove is highlighted:
image

@tsullivan
Copy link
Member Author

@gammon @kobelb @elastic/kibana-discovery I've been going back where stored_fields is used and it looks like it goes way, way back in git history. Any thoughts about the impact of removing it from

It seems like it's enough to just use _source.includes in this context, and that's already getting added to the request.

@kobelb
Copy link
Contributor

kobelb commented Nov 8, 2018

I unfortunately don't have any memory of this aspect of CSV reporting, so I'm afraid I won't be much help here.

@Bargs
Copy link
Contributor

Bargs commented Nov 8, 2018

I don't know what all the ramifications of removing it would be, but we do present stored fields to the user so removing it would likely be a breaking change.

@tsullivan
Copy link
Member Author

@Bargs is there a way to test the code you linked against data with an object field, like the kind described in #23611 ?

@rayafratkina rayafratkina added the Team:Visualizations Visualization editors, elastic-charts and infrastructure label Dec 16, 2018
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-app

@Bargs
Copy link
Contributor

Bargs commented Dec 19, 2018

@tsullivan maybe if one of the sub-fields in the object is marked as stored:true in the mappings.

@tsullivan
Copy link
Member Author

@joelgriffith this should go into the test plan for #34571

@timroes timroes added Team:Stack Services and removed Team:Visualizations Visualization editors, elastic-charts and infrastructure labels Jul 18, 2019
@lukasolson
Copy link
Member

I took a look at this today. I just wanted to jot down my findings as well as my suggested solution.

Currently, Discover doesn't do any sort of "filtering" which fields are queried and returned from Elasticsearch. Discover requests all of the stored fields (those inside _source as well as those which have "stored": true in their mapping). This happens because SearchSource, by default, sets the stored_fields property in the request to ['*']:

flatData.body.stored_fields = computedFields.storedFields;

Then, in Discover, we build the field list/selector based on those found inside _source or those which are stored.

However, when you generate a CSV report, things work a little differently. Since we don't need to display any sort of field list/selector (and the fields are already determined), we explicitly call setField('fields', fields) on the SearchSource to filter the list of fields that are returned from Elasticsearch. There is some additional logic that handles what to set stored_fields, as well as docvalue_fields, script_fields, and _source.includes:

// if we only want to search for certain fields
const fields = flatData.fields;
if (fields) {
// filter out the docvalue_fields, and script_fields to only include those that we are concerned with
flatData.body.docvalue_fields = _.intersection(flatData.body.docvalue_fields, fields);
flatData.body.script_fields = _.pick(flatData.body.script_fields, fields);
// request the remaining fields from both stored_fields and _source
const remainingFields = _.difference(fields, _.keys(flatData.body.script_fields));
flatData.body.stored_fields = remainingFields;
_.set(flatData.body, '_source.includes', remainingFields);
}

The bug is here inside these lines of code. The code currently looks at which fields are scripted fields, and which are docvalue fields, then takes the remaining list of fields and sets both _source.includes and stored_fields to this list. This causes problems for two possible scenarios:

  1. When the field is inside _source and not inside a stored_field and it is a field that is an array inside an object that is not a leaf node (CSV reports fail with no error if a saved search has only a single array property #23611).
  2. When _source is disabled and the field has "store": true set in the mapping (see https://gist.github.com/lukasolson/fab166393fab86243dfa0736f046085d).

The only place this code is currently called happens is inside getSharingData (which is used for generating CSV):

searchSource.setField('fields', searchFields);

My proposed solution is, when we fetch the field list when creating an index pattern or refreshing an index pattern, to add additional metadata to the field about whether or not it is stored. (This is available via the mapping as the store property.) Then, we can add fields that are stored to the list of stored_fields to query, and others to the _params.include, similarly to how we recently added the esTypes metadata to the field.

Elasticsearch documentation:

@tsullivan tsullivan removed their assignment Sep 30, 2019
@bmcconaghy bmcconaghy added Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) and removed Team:Stack Services labels Dec 12, 2019
@bmcconaghy bmcconaghy added Team:Reporting Services and removed Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) labels Dec 20, 2019
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-reporting-services (Team:Reporting Services)

@tsullivan
Copy link
Member Author

tsullivan commented Jan 8, 2020

I'm coming back to this after responding to https://discuss.elastic.co/t/cannot-generate-csv-on-nested-field/214062/2

My proposed solution is, when we fetch the field list when creating an index pattern or refreshing an index pattern, to add additional metadata to the field about whether or not it is stored. (This is available via the mapping as the store property.) Then, we can add fields that are stored to the list of stored_fields to query, and others to the _params.include, similarly to how we recently added the esTypes metadata to the field.

I am in favor of seeing this bug solved however necessary. @lukasolson when you find that the CSV export flow uses a lot of code that isn't used elsewhere (getSharingData), that seems like a problem to me.

I have a big concern that, we're using Courier helper code to generate the job parameters created for the report generation request, and that code has a chance of going away or being repurposed with any future changes to Discover's architecture.

This bug is related to that concern, so I'd like to ask if there is a way we could simplify the work being done in the browser for these parameters, and move more of that work towards common utilities available on the server?

I'm brainstorming ideas about how we would build a CSV export architecture that is API powered so it can be used in automation? If we have the most modern architecture for Discover that we need, what would this integration look like?

It has been hard for me to come up with ideas and chase bug fixes on this, because so much of the work needed to make the data formatted properly and mapped into a data table is only accessible in UI code, which is unavailable for automation purposes.

@tsullivan
Copy link
Member Author

I commented on this @elastic/kibana-app-arch issue: #65069 (comment)

Reporting needs the enhancement of having the SearchSource module available on the server, so that our generation API can simply include the ID of the saved search. We should be able to generate the same query that is used in the browser without using any helper code in Reporting. It seems like that issue will start to get us there.

@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-app-services (Team:AppServices)

@tsullivan
Copy link
Member Author

Reporting needs the enhancement of having the SearchSource module available on the server, so that our generation API can simply include the ID of the saved search. We should be able to generate the same query that is used in the browser without using any helper code in Reporting. It seems like that issue will start to get us there.

This is in-progress: #88303

@tsullivan tsullivan added Team:Reporting Services impact:high Addressing this issue will have a high level of impact on the quality/strength of our product. labels Feb 25, 2021
@tsullivan
Copy link
Member Author

This will be fixed in #88303 but only when "Read fields from _source" is OFF

image

@sophiec20 sophiec20 added Feature:Reporting:CSV Reporting issues pertaining to CSV file export and removed (Deprecated) Feature:Reporting Use Reporting:Screenshot, Reporting:CSV, or Reporting:Framework instead (Deprecated) Team:Reporting Services labels Aug 21, 2024
@botelastic botelastic bot added the needs-team Issues missing a team label label Aug 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience Feature:Reporting:CSV Reporting issues pertaining to CSV file export impact:high Addressing this issue will have a high level of impact on the quality/strength of our product. needs-team Issues missing a team label
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants