Provide header level details for a scan: Enhance scancode to include a log or history with useful statistics #211

DennisClark · 2016-02-24T00:56:40Z

A run of scancode should generate a log file with meaningful statistics, including such things as:

the version number of scancode that was executed
start date/time
end date/time
elapsed time
name of the library that was scanned
number of files scanned
{{anything else that makes sense and is useful}}

balusarakesh · 2016-03-01T23:07:22Z

This log file can also contain the reason for the scan failure if the scan is interrupted in the middle due to any particular reasons.

pombredanne · 2017-10-04T22:03:52Z

See also aboutcode-org/aboutcode#7

sschuberth · 2017-11-20T09:08:23Z

As discussed in #840, having a summary of errors (e.g. in the header of the regular JSON output file) would also be beneficial.

pombredanne · 2018-02-07T15:38:57Z

There is some improvements in develop post #885. More work is needed towards this though

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>

* This is a new data structure as designed in aboutcode-org/aboutcode#7 * For now, the old header-level data have been kept Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>

* This is a new data structure as designed in aboutcode-org/aboutcode#7 * For now, the old header-level data have been kept Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>

This is the original attribute name we had agreed to Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>

@sschuberth

As suggested by @sschuberth in aboutcode-org/aboutcode#7 (comment) Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>

* Remove the top level attributes scancode_notice, scancode_version, etc... And move the tope level files_count as an extra_data header attribute. * Update all outputs and tests accordingly * other minor refactorings * rename plugincode.output.OutputPlugin.get_results to get_files * remove scancode.resource.Codebase.get_headings, now obsolete Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>

pombredanne · 2018-11-29T07:26:26Z

GH closed this automatically.... reopening!

pombredanne · 2018-11-29T07:26:43Z

Here is what we have now:
$ ./scancode -clip -n4 --summary --json-pp j.son samples

{
  "headers": [
    {
      "tool_name": "scancode-toolkit",
      "tool_version": "2.9.7.post183.795fcc4",
      "options": {
        "input": "samples",
        "--copyright": true,
        "--info": true,
        "--json-pp": "j.son",
        "--license": true,
        "--package": true,
        "--processes": "4",
        "--summary": true
      },
      "notice": "Generated with ScanCode and provided on an \"AS IS\" BASIS, WITHOUT WARRANTIES\nOR CONDITIONS OF ANY KIND, either express or implied. No content created from\nScanCode should be considered or used as legal advice. Consult an Attorney\nfor any legal advice.\nScanCode is a free software code scanning tool from nexB Inc. and others.\nVisit https://github.com/nexB/scancode-toolkit/ for support and download.",
      "start_timestamp": "2018-11-29T072242.399469",
      "end_timestamp": "2018-11-29T072250.772025",
      "message": null,
      "errors": [],
      "extra_data": {
        "files_count": 33
      }
    }
  ],
...

and then the same file reprocessed

$ ./scancode --from-json j.son --only-findings --json-pp j2.son --csv j2.csv

{
  "headers": [
    {
      "tool_name": "scancode-toolkit",
      "tool_version": "2.9.7.post183.795fcc4",
      "options": {
        "input": "samples",
        "--copyright": true,
        "--info": true,
        "--json-pp": "j.son",
        "--license": true,
        "--package": true,
        "--processes": "4",
        "--summary": true
      },
      "notice": "Generated with ScanCode and provided on an \"AS IS\" BASIS, WITHOUT WARRANTIES\nOR CONDITIONS OF ANY KIND, either express or implied. No content created from\nScanCode should be considered or used as legal advice. Consult an Attorney\nfor any legal advice.\nScanCode is a free software code scanning tool from nexB Inc. and others.\nVisit https://github.com/nexB/scancode-toolkit/ for support and download.",
      "start_timestamp": "2018-11-29T072242.399469",
      "end_timestamp": "2018-11-29T072250.772025",
      "message": null,
      "errors": [],
      "extra_data": {
        "files_count": 33
      }
    },
    {
      "tool_name": "scancode-toolkit",
      "tool_version": "2.9.7.post183.795fcc4",
      "options": {
        "input": "j.son",
        "--csv": "j2.csv",
        "--from-json": true,
        "--json-pp": "j2.son",
        "--only-findings": true
      },
      "notice": "Generated with ScanCode and provided on an \"AS IS\" BASIS, WITHOUT WARRANTIES\nOR CONDITIONS OF ANY KIND, either express or implied. No content created from\nScanCode should be considered or used as legal advice. Consult an Attorney\nfor any legal advice.\nScanCode is a free software code scanning tool from nexB Inc. and others.\nVisit https://github.com/nexB/scancode-toolkit/ for support and download.",
      "start_timestamp": "2018-11-29T072338.656792",
      "end_timestamp": "2018-11-29T072338.691748",
      "message": null,
      "errors": [],
      "extra_data": {
        "files_count": 0
      }
    }
  ],
...

pombredanne · 2018-11-29T07:36:01Z

The only question left is about ordering: for now the top most header item is the oldest, not the newest. It might be better to have the ordering done the other way?

pombredanne · 2018-11-29T07:37:36Z

@sschuberth also we now have a global errors attribute in it too. See this example:
$ ./scancode -clipeu --json-pp - --timeout 0.000001 --verbose tests/scancode/data/failing/patchelf.pdf

{
  "headers": [
    {
      "tool_name": "scancode-toolkit",
      "tool_version": "2.9.7.post183.795fcc4",
      "options": {
        "input": "tests/scancode/data/failing/patchelf.pdf",
        "--copyright": true,
        "--email": true,
        "--info": true,
        "--json-pp": "-",
        "--license": true,
        "--package": true,
        "--timeout": "1e-06",
        "--url": true,
        "--verbose": true
      },
      "notice": "Generated with ScanCode and provided on an \"AS IS\" BASIS, WITHOUT WARRANTIES\nOR CONDITIONS OF ANY KIND, either express or implied. No content created from\nScanCode should be considered or used as legal advice. Consult an Attorney\nfor any legal advice.\nScanCode is a free software code scanning tool from nexB Inc. and others.\nVisit https://github.com/nexB/scancode-toolkit/ for support and download.",
      "start_timestamp": "2018-11-29T073446.380441",
      "end_timestamp": "2018-11-29T073448.617701",
      "message": null,
      "errors": [
        "Path: patchelf.pdf\n  ERROR: for scanner: info:\n  ERROR: Processing interrupted: timeout after 0 seconds.\n  ERROR: for scanner: licenses:\n  ERROR: Processing interrupted: timeout after 0 seconds.\n  ERROR: for scanner: copyrights:\n  ERROR: Processing interrupted: timeout after 0 seconds.\n  ERROR: for scanner: packages:\n  ERROR: Processing interrupted: timeout after 0 seconds.\n  ERROR: for scanner: emails:\n  ERROR: Processing interrupted: timeout after 0 seconds.\n  ERROR: for scanner: urls:\n  ERROR: Processing interrupted: timeout after 0 seconds."
      ],
      "extra_data": {
        "files_count": 1
      }
    }
...

sschuberth · 2018-11-29T08:07:38Z

The only question left is about ordering

I don't really think it matters, as to be on the safe side you should always sort by start_timestamp anyway. But what about the data following the header? How do you know which data belongs to which header? Or will there always only be data from the last run in the file?

pombredanne · 2018-11-29T08:40:23Z

@sschuberth re

But what about the data following the header? How do you know which data belongs to which header? Or will there always only be data from the last run in the file?

This is the data as it is from the last run in the file. Tracking actual changes is something to do outside.
Here the headers is just a way to document the fact a file was created through multiple tools touching it, such as multiple scancode runs, editing in aboutcode manager, matching against an index, etc

sschuberth · 2018-11-29T09:27:29Z

Here the headers is just a way to document the fact a file was created through multiple tools touching it

I see. Another idea to make this more clear would be to always only keep one top-level header, and move headers from previous processing steps e.g. to the existing extra_data field.

pombredanne · 2018-12-11T08:28:34Z

@sschuberth re

Another idea to make this more clear would be to always only keep one top-level header, and move headers from previous processing steps e.g. to the existing extra_data field.

I am not inclined to go that way: this would mean that each tool that updates the header would need to move several data bits around instead of just appending a whole new record. I would prefer keep this simpler way unless you feel strongly about it

sschuberth · 2018-12-11T08:31:20Z

I would prefer keep this simpler way unless you feel strongly about it

No, not strongly enough 😉

pombredanne · 2018-12-11T08:33:57Z

@sschuberth thanks!

pombredanne · 2018-12-11T10:51:36Z

I am closing at last as this is now merged in develop.
Thank you all for the help and review

DennisClark added the enhancement label Feb 24, 2016

DennisClark assigned jdaguil Feb 24, 2016

jdaguil added this to the v2.0 milestone Mar 2, 2016

pombredanne modified the milestones: v2.0, v2.1 Aug 5, 2016

pombredanne added the easy label Feb 28, 2017

pombredanne modified the milestones: v2.1, v2.3 Oct 4, 2017

pombredanne changed the title ~~Enhance scancode to generate a log file with useful statistics~~ Provide header level details for a scacn: Enhance scancode to generate a log file with useful statistics Oct 4, 2017

pombredanne changed the title ~~Provide header level details for a scacn: Enhance scancode to generate a log file with useful statistics~~ Provide header level details for a scan: Enhance scancode to generate a log file with useful statistics Oct 17, 2017

pombredanne mentioned this issue Oct 30, 2017

Adding "stats" to "results" in ScanCode output #828

Closed

pombredanne changed the title ~~Provide header level details for a scan: Enhance scancode to generate a log file with useful statistics~~ Provide header level details for a scan: Enhance scancode to include a log or history with useful statistics Oct 30, 2017

pombredanne mentioned this issue Nov 1, 2017

RFC: Should the next version be 3.0 or 2.3? #832

Closed

sschuberth mentioned this issue Nov 20, 2017

--only-findings should not omit scan_errors from JSON output #840

Closed

pombredanne added a commit that referenced this issue Jul 11, 2018

Add new codebase-level log entries #211

9f35fbf

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>

pombredanne modified the milestones: v2.3, v3.0 Nov 4, 2018

pombredanne unassigned jdaguil Nov 4, 2018

pombredanne mentioned this issue Nov 6, 2018

Identify that information are from scancode or another tool #513

Closed

pombredanne added must have Priority: high and removed easy labels Nov 11, 2018

pombredanne mentioned this issue Nov 14, 2018

Add new "headers" top level attribute #1285

Merged

pombredanne added a commit that referenced this issue Nov 14, 2018

Correct failiing tests on Windows and macOS #211

72165b9

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>

pombredanne added a commit that referenced this issue Nov 27, 2018

Correct failiing tests on Windows and macOS #211

feb4b3e

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>

pombredanne added a commit that referenced this issue Nov 27, 2018

Use headers attribute, not history_log #211

bda62f7

This is the original attribute name we had agreed to Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>

pombredanne added a commit that referenced this issue Nov 27, 2018

Rename headers.tool to tool_name #211

8858d5c

As suggested by @sschuberth in aboutcode-org/aboutcode#7 (comment) Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>

pombredanne closed this as completed in #1285 Nov 28, 2018

pombredanne reopened this Nov 29, 2018

pombredanne closed this as completed Dec 11, 2018

mjherzog mentioned this issue Jul 20, 2021

RFC: Specify how header-level data are returned in ABCD aboutcode-org/aboutcode#7

Closed

pombredanne mentioned this issue Feb 2, 2022

Create common header for ABC Data. #2841

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide header level details for a scan: Enhance scancode to include a log or history with useful statistics #211

Provide header level details for a scan: Enhance scancode to include a log or history with useful statistics #211

DennisClark commented Feb 24, 2016

balusarakesh commented Mar 1, 2016

pombredanne commented Oct 4, 2017

sschuberth commented Nov 20, 2017

pombredanne commented Feb 7, 2018

pombredanne commented Nov 29, 2018

pombredanne commented Nov 29, 2018

pombredanne commented Nov 29, 2018

pombredanne commented Nov 29, 2018

sschuberth commented Nov 29, 2018

pombredanne commented Nov 29, 2018

sschuberth commented Nov 29, 2018

pombredanne commented Dec 11, 2018

sschuberth commented Dec 11, 2018

pombredanne commented Dec 11, 2018

pombredanne commented Dec 11, 2018

Provide header level details for a scan: Enhance scancode to include a log or history with useful statistics #211

Provide header level details for a scan: Enhance scancode to include a log or history with useful statistics #211

Comments

DennisClark commented Feb 24, 2016

balusarakesh commented Mar 1, 2016

pombredanne commented Oct 4, 2017

sschuberth commented Nov 20, 2017

pombredanne commented Feb 7, 2018

pombredanne commented Nov 29, 2018

pombredanne commented Nov 29, 2018

pombredanne commented Nov 29, 2018

pombredanne commented Nov 29, 2018

sschuberth commented Nov 29, 2018

pombredanne commented Nov 29, 2018

sschuberth commented Nov 29, 2018

pombredanne commented Dec 11, 2018

sschuberth commented Dec 11, 2018

pombredanne commented Dec 11, 2018

pombredanne commented Dec 11, 2018