Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Clearly define "entity" in common principles #947

Merged
merged 10 commits into from
Jun 8, 2022

Conversation

Lestropie
Copy link
Collaborator

Just realised why the "entity" terminology when referring to key-values in file names hadn't clicked for me: it's missing from the list of definitions!

While this is a relatively trivial addition, I am nevertheless listing as a draft PR. I have noticed that in subsequent text, both "entity / entities" and "key-value(s) / key-value pairs" are used interchangeably. If the definition of "entity" is presented unambiguously in the specification, then it is worth considering the prospect of enforcing this nomenclature throughout the specification, and I think this is an appropriate location for that discussion.

Copy link
Collaborator

@Remi-Gau Remi-Gau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am generally in favor of this addition.

But I think it would be worth to link to the appendix:

Copy link
Member

@sappelhoff sappelhoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this proposal, I am also in favor, and I agree with Remi that links to the entity pages should be supplied.

If the definition of "entity" is presented unambiguously in the specification, then it is worth considering the prospect of enforcing this nomenclature throughout the specification

I think replacing mentions of "key value pairs" with entity (and back referencing to the definition) is a good idea.

@tsalo
Copy link
Member

tsalo commented Dec 7, 2021

Just a thought, but I think it's probably worth it to have a schema file for the common principles, which could then be rendered into a list with a macro.

@Lestropie
Copy link
Collaborator Author

But I think it would be worth to link to the appendix:

Done (00f9cea); but feel free to propose alternative phrasing.

I think replacing mentions of "key value pairs" with entity (and back referencing to the definition) is a good idea.

I can give that a go, but I might try it in a separate branch based off of this one and generate a dedicated PR for that proposal.

it's probably worth it to have a schema file for the common principles, which could then be rendered into a list with a macro.

Were you thinking of such for specifically the definitions within the common principles? Regardless, might be better documented as a standalone Issue so it doesn't get lost in here.

@Lestropie
Copy link
Collaborator Author

For anyone interested in the prospect of adopting the "entity" term more broadly across the specification, please see Lestropie#1.

@Remi-Gau
Copy link
Collaborator

Remi-Gau commented Dec 8, 2021

Actually read this too quickly and now I have an existential question.

Is an entity, just the key or is is the key-value pair?

I had pretty much understood entity as being the key and would use label when referring to the value.

Seems that defining entity will be good thing if even a maintainer is not clear on its meaning.

@sappelhoff
Copy link
Member

Is an entity, just the key or is is the key-value pair?

I understood entity as the key value pair :-D

Copy link
Member

@sappelhoff sappelhoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, even better together with Lestropie#1

although as you mention in #950, the

allowed, OPTIONAL or REQUIRED

should maybe be amended

@Lestropie
Copy link
Collaborator Author

Is an entity, just the key or is is the key-value pair?

Yeah, crossed my mind too.

Given it is used strictly in reference to file names, where keys cannot appear in the absence of a value, I think that from a utilitarian perspective having the term refer to the pair is preferable. It does make things slightly muddy when e.g. referring to appendices that relate to entities; but that documentation doesn't only specify a list of keys, they also specify for each the requirements for the corresponding value in order to form a valid entry in the file name. So one can kinda still point to a "list of entities", even if the list is indexed by "entity keys", as the expectations for the entity values in order to form a valid entity is still there in the list content.

Is a GitHub poll needed? :-P

@sappelhoff
Copy link
Member

Is a GitHub poll needed? :-P

maybe we first ask @bids-standard/maintainers who have not seen this, and then decide whether we need to discuss this more widely.

[FIX] Standardise use of "entity" definition across specification
@Remi-Gau
Copy link
Collaborator

I think the only thing that feels weird by defining the key-value pair as an entity is that it makes weird to read the schema that seems to imply that it is the key. Or it is just my reading of it.

Either way I am sure I can adjust my mental model but some clarification would be good.

@VisLab
Copy link
Member

VisLab commented Dec 13, 2021

Sorry, I have always been confused about entities, and I am more confused after the discussion.

The section called Entity table might be helped by some clarification.

Are T1w and eeg entities? They don't appear as key value pairs in the file names. They are parsed as "suffixes" for files. Yet, the first line of the appendix says entities (key-value pairs).

As far as I can tell T1w never is part of a key-value pair. Another issue is that events are also suffixes but they are not treated in the same way as modalities such as eeg.

@tsalo
Copy link
Member

tsalo commented Dec 13, 2021

FWIW, I've always considered the entity to be the key only (e.g., "run entity"), with the combination of entity and label/index being the "key-value pair" we refer to in the specification, but I'm not heavily invested in that distinction. I will note that I think folks say "the run entity" more often than they say "the run entity key" or "the run-index entity", though.

I think the only thing that feels weird by defining the key-value pair as an entity is that it makes weird to read the schema that seems to imply that it is the key.

@Remi-Gau I mostly took descriptions from the specification, but I can't guarantee that my own interpretation didn't bleed into any changes I made in the schema and in the appendices. I think saying "entity keys" in the Entities appendix would make things clearer in any case.

Are T1w and eeg entities? They don't appear as key value pairs in the file names. They are parsed as "suffixes" for files. Yet, the first line of the appendix says entities (key-value pairs).

@VisLab, T1w and eeg are both modalities (and thus suffixes), rather than entities. Also, eeg (when it's the folder, rather than part of the filename) is a datatype as well.

Another issue is that events are also suffixes but they are not treated in the same way as modalities such as eeg.

Could you expand on this? Some modalities/suffixes are sort of "auxiliary" in that they are associated with some other file containing imaging data, such as events files or physio files associated with a BOLD scan, but they can also exist on their own. In the case of events files, if you wanted to have those files in the absence of imaging files, you would have to have them in the beh folder. When they're in that folder, I think they operate just like other "primary" suffixes.

@VisLab
Copy link
Member

VisLab commented Dec 13, 2021

The Entity table lists the things that I thought of as "modalities" (or "suffixes") in the Entity column. I think that re-titling and re-captioning this table as well as the appendix itself could improve clarity.

@tsalo: This was a useful observation:

In the case of events files, if you wanted to have those files in the absence of imaging files, you would have to have them in the beh folder. When they're in that folder, I think they operate just like other "primary" suffixes.

@Lestropie
Copy link
Collaborator Author

There's actually three options here, not two.

Which of the following is an example of an entity?

  1. "Subject"
  2. sub
  3. sub-01

From subsequent text in "Common principles" - "File name structure":

Note that sub-<label> corresponds to the subject entity because it has the sub- "key" and<label> "value"

This implies option 1, which has been thus far omitted from the discussion...

@VisLab
Copy link
Member

VisLab commented Dec 14, 2021

Note: Option 1 seems to be what the title of the Entity Table in the appendix is using. (Though I think that most people reading the table would think that the Entity title refers to items in column 1 and not to the meaning of the other titles in the header.)

@VisLab
Copy link
Member

VisLab commented Dec 14, 2021

One other comment.... when I think of "entity", I can see "subject" and "task" being entities. I find it hard to think of a key or a key-value pair having some grounding in reality, which I think the word "entity" implies. This may be part of the source of the confusion.

@satra
Copy link
Collaborator

satra commented Dec 14, 2021

at a generic level, one could borrow the description of entity from the prov ontology: https://www.w3.org/TR/prov-dm/#term-entity . as we are working on a prov extension for bids, aligning with such a description, and potentially augmenting it with bids/neuroimaging examples, may also lay some semantic groundwork for the future.

Copy link
Collaborator

@effigies effigies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good. Minor suggestion.

RE @satra's suggestion that we derive from "An entity is a physical, digital, conceptual, or other kind of thing with some fixed aspects; entities may be real or imaginary." I don't see how that fits with our usage, which is closer to the concept of "morpheme", IMO.

@@ -105,6 +105,12 @@ misunderstanding we clarify them here.
The modality may overlap with, but should not be confused with
the **data type**.

1. **Entity** - a portion of a file name, consisting of a **key** and corresponding
**value** separated by a hyphen. Supported entities are defined in
[Appendix IX](99-appendices/09-entities.md); further, whether they are allowed,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't think of a case that is allowed but neither OPTIONAL or REQUIRED, so we should avoid introducing a term that implies a distinction.

Suggested change
[Appendix IX](99-appendices/09-entities.md); further, whether they are allowed,
[Appendix IX](99-appendices/09-entities.md); further, whether they are

@Lestropie
Copy link
Collaborator Author

And the existential descent continues... 😬

"Morpheme" is rather obscure and doesn't convey the 2-tuple nature. "Attribute" may be closer? Would however require fairly extensive changes to substitute such a core terminology entirely.

@VisLab
Copy link
Member

VisLab commented Dec 14, 2021 via email

@Lestropie
Copy link
Collaborator Author

Lestropie commented Dec 21, 2021

Following from my prior comment:

  • Appendix IV implies that "Subject" is an entity, with "sub-<label>" being the "format" for that entity.

  • Appendix IX implies that "sub" is an entity, with "Subject" being the "full name" of that entity.

I'm happy to have a go at redressing this myself, but it's going to be relatively laborious to adopt a common definition across the entire specification, so I don't really want to be generating hypothetical PRs for all of the different alternatives.


Let's see if this works...




Edit: I think you just click on your preferred option and the poll result updates, but I've never used it before...

@Lestropie
Copy link
Collaborator Author

Bumping seeking maintainer feedback on alternative proposal in comment above; I'm unable to progress on this proposal based on poll results.

@sappelhoff
Copy link
Member

I like the proposal from #947 (comment). Let's collect some more opinions

cc @bids-standard/maintainers @hoechenberger @adam2392

@Lestropie
Copy link
Collaborator Author

Invested / interested parties please see Lestropie#4, which is intended as an augmentation of this PR based on prior discussion here.

Lestropie and others added 4 commits April 26, 2022 13:56
Requests made in: #4.
Note that this is a repeat of commit 3972dd7, but with corrected multi-author attribution.

Co-authored-by: Stefan Appelhoff <stefan.appelhoff@mailbox.org>
Common principles: More elaborate "entity" definition
Resolves bids-standard#947 against:
- bids-standard#962 (21b7725)
- bids-standard#1044 (d9eeb6d)
- bids-standard#867 (5fe26e1)
- bids-standard#998 (e47283a)

Conflicts:
	src/02-common-principles.md
	src/schema/README.md
	src/schema/objects/entities.yaml
	src/schema/objects/metadata.yaml
@Lestropie Lestropie marked this pull request as ready for review April 29, 2022 00:53
@codecov
Copy link

codecov bot commented Apr 29, 2022

Codecov Report

Merging #947 (fd3682c) into master (a4a03dd) will increase coverage by 0.96%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master     #947      +/-   ##
==========================================
+ Coverage   70.75%   71.71%   +0.96%     
==========================================
  Files           9        9              
  Lines         937      937              
==========================================
+ Hits          663      672       +9     
+ Misses        274      265       -9     
Impacted Files Coverage Δ
tools/schemacode/schemacode/_version.py 38.90% <0.00%> (+2.59%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9a146fa...fd3682c. Read the comment docs.

@Lestropie
Copy link
Collaborator Author

Lestropie#4 has been merged, and 699b74e brings this up-to-date with master (see commit message for conflicts). PR is no longer flagged as draft.

@sappelhoff sappelhoff dismissed Remi-Gau’s stale review April 29, 2022 06:43

requested changes were addressed

@sappelhoff sappelhoff requested review from effigies and removed request for Remi-Gau May 2, 2022 09:47
Copy link
Collaborator

@ericearl ericearl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

@sappelhoff sappelhoff merged commit 5f97a79 into bids-standard:master Jun 8, 2022
@sappelhoff
Copy link
Member

Thanks @Lestropie and sorry for the delay with reviews and merging.

@Lestropie Lestropie deleted the define_entity branch June 8, 2022 11:44
@effigies effigies changed the title [FIX] Common principles: Add "entity" to list of definitions [ENH] Clearly define "entity" in common principles Jul 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants