Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added The Device Modeling Language (DML) #7009

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

TSonono
Copy link

@TSonono TSonono commented Aug 22, 2024

  • Add The Device Modeling Language (DML) to the Linguist project.
  • Add the .dml as an extension for the DML language

Description

The Device Modeling Language (DML) is an open source domain-specific language for writing fast functional or transaction-level device models for virtual platforms. DML provides high-level abstractions suitable for functional device models, including constructs like register banks, registers, bit fields, event posting, interfaces between models, and logging. DML code is compiled by the DML Compiler (DMLC), producing C code with API calls tailored for a particular simulator.

Currently, the compiler supports building models for the Intel® Simics® simulator, but other back-ends may be added in the future.

DMLC is maintained by Intel.

  • I am adding a new language.
    • The extension of the new language is used in hundreds of repositories on GitHub.com.
      • Search results for each extension:
    • I have included a real-world usage sample for all extensions added in this PR:
    • I have included a syntax highlighting grammar: https://github.com/intel/device-modeling-language
    • I have added a color
      • Hex value: #0068B5
      • Rationale: Intel logo and Simics logo are blue.
    • I have updated the heuristics to distinguish my language from others using the same extension.

@TSonono TSonono requested a review from a team as a code owner August 22, 2024 09:37
Copy link
Member

@lildude lildude left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • For some reason I see 0 repositories. While it's unclear to me if there are 200 unique public repositories.

This will be because that only returns the count of repos with "dml 1.4" in their name.

  • Intel has a lot of code internally written in DML but they are unfortunately in private repositories. The same is the case with customer's using Intel Simics to write their device models using DML.

We rely on public usage only as there is no way to query private usage... cos it's private, even to GitHub 😁

Regarding your search query:

This search is quite precise. I assume 1.4 is the version. What about earlier versions? A quick search shows there are some files with dml 1.0, dml 1.2, dml 1.4. Are these the same language?

Your specific search means there are not enough files for this PR to meet our usage requirements (also referenced in the CONTRIBUTING.md file).

If we remove your qualifier, we get a lot more files and things look a bit more promising, however a lot of the files returned don't appear to be the same language as you're proposing with this PR as we can see if we exclude the dml 1 string. As a result, these will all be misidentified as this language if this PR is merged as Linguist doesn't know of any other languages with the .dml extension.

There are far more of those other languages than this language if dml 1.? always has to be in the file, so at least one of those other languages needs to be identified and added at the same time in this PR. Your (relaxed?) heuristic will then be used to differentiate the two.

lib/linguist/languages.yml Show resolved Hide resolved
- extensions: ['.dml']
rules:
- language: Device Modeling Language
pattern: '(^dml\s1.4;)'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will have no effect as Linguist doesn't know of any other language that uses the .dml extension. See my primary comment for more details.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I was under the impression that if this regex didn't match, and given that there are no other languages in linguist that has a .dml extension, the language of the file would simply not be recognized.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. Linguist works like a funnel... everything goes in at the top and then works through each of the strategies in decreasing order of specificity until a single language is identified:

linguist/lib/linguist.rb

Lines 51 to 71 in 5fad8d5

# Internal: The strategies used to detect the language of a file.
#
# A strategy is an object that has a `.call` method that takes two arguments:
#
# blob - An object that quacks like a blob.
# languages - An Array of candidate Language objects that were returned by the
# previous strategy.
#
# A strategy should return an Array of Language candidates.
#
# Strategies are called in turn until a single Language is returned.
STRATEGIES = [
Linguist::Strategy::Modeline,
Linguist::Strategy::Filename,
Linguist::Shebang,
Linguist::Strategy::Extension,
Linguist::Strategy::XML,
Linguist::Strategy::Manpage,
Linguist::Heuristics,
Linguist::Classifier
]

As Linguist would only know of the .dml extension with this language, it would stop the process there which as as you can see is before the heuristics strategy. It would also however apply this to all .dml files found on GitHub.

There is an exception to this: very generic extensions which we add to the generic.yml file. These are checked against the heuristics really early in the extension strategy.

Thinking about this further, I think .dml is a good candidate to add to generic.yml too which would then mean this heuristic is used and will leave the other languages unaffected.

Please can you add this extension to the generic.yml file. This heuristic can also be improved (we don't need the capturing group and the tiny overhead it brings with it):

Suggested change
pattern: '(^dml\s1.4;)'
pattern: '^dml\s1.4;'

@TSonono
Copy link
Author

TSonono commented Aug 22, 2024

This search is quite precise. I assume 1.4 is the version. What about earlier versions? A quick search shows there are some files with dml 1.0, dml 1.2, dml 1.4. Are these the same language?

Yes, they are the same language. However, the syntax is quite different between 1.2 and 1.4, and the grammar file only covers 1.4. Do you have any suggestion regarding how this can be handled?

If we remove your qualifier, we get a lot more files and things look a bit more promising, however a lot of the files returned don't appear to be the same language as you're proposing with this PR as we can see if we exclude the dml 1 string. As a result, these will all be misidentified as this language if this PR is merged as Linguist doesn't know of any other languages with the .dml extension.

There are far more of those other languages than this language if dml 1.? always has to be in the file, so at least one of those other languages needs to be identified and added at the same time in this PR. Your (relaxed?) heuristic will then be used to differentiate the two.

Would you say that if we include 1.0, 1.2 and 1.4 and add also add another language with a .dml extension, there is a chance that support for the DML (as in device modeling language) to be added to Github Linguist? Or is it not popular enough anyway?

@TSonono TSonono requested a review from lildude August 22, 2024 11:49
@lildude
Copy link
Member

lildude commented Aug 22, 2024

Would you say that if we include 1.0, 1.2 and 1.4 and add also add another language with a .dml extension, there is a chance that support for the DML (as in device modeling language) to be added to Github Linguist? Or is it not popular enough anyway?

Even if you add a single entry for all .dml files that contain dml 1, you're still a long way off. When adding support for a language, all languages, filenames and extensions being added in the PR need to meet requirements.

@kbrunham-intel
Copy link

We (Altera) have plans to public release repos with Simics code in the near future and would greatly benefit from having DML support in linguist. The repos we plan would be FPGA designs with their associated Simics model, in addition to specific training examples on using Simics and modeling.

I am aware of at least 4 major engineering companies that use both GitHub and Simics, where this linguist support would benefit them too.

Please consider this a very strong endorsement of this PR.

Copy link
Member

@lildude lildude left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've checked the usage again for this precise version and the more lax dml 1 and we're still a long way off from meeting out usage requirements.

I still recommend considering loosening the version you're targeting, at the expense of some possible syntax highlighting anomolies, as I see a lot of "older" versions associated with Simics based on the paths in the search results. This will speed up meeting our usage requirements.

I know this introduces a chicken-and-egg situation, but we have these requirements because we get a lot of requests to add support for hobby and very new languages that don't amount to anything over time. Making an exception for one means an exception must be made for all which defeats the purpose of having this requirement. Removing this requirement will lead to a very bloated and hard to manage and maintain project with no real benefit to the wider GitHub user base.

I review usage of all PRs pending popularity each quarter when I prepare the Linguist release so this won't be forgotten. I will not comment each time as there are waaay too many PRs to do this for each.

- extensions: ['.dml']
rules:
- language: Device Modeling Language
pattern: '(^dml\s1.4;)'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. Linguist works like a funnel... everything goes in at the top and then works through each of the strategies in decreasing order of specificity until a single language is identified:

linguist/lib/linguist.rb

Lines 51 to 71 in 5fad8d5

# Internal: The strategies used to detect the language of a file.
#
# A strategy is an object that has a `.call` method that takes two arguments:
#
# blob - An object that quacks like a blob.
# languages - An Array of candidate Language objects that were returned by the
# previous strategy.
#
# A strategy should return an Array of Language candidates.
#
# Strategies are called in turn until a single Language is returned.
STRATEGIES = [
Linguist::Strategy::Modeline,
Linguist::Strategy::Filename,
Linguist::Shebang,
Linguist::Strategy::Extension,
Linguist::Strategy::XML,
Linguist::Strategy::Manpage,
Linguist::Heuristics,
Linguist::Classifier
]

As Linguist would only know of the .dml extension with this language, it would stop the process there which as as you can see is before the heuristics strategy. It would also however apply this to all .dml files found on GitHub.

There is an exception to this: very generic extensions which we add to the generic.yml file. These are checked against the heuristics really early in the extension strategy.

Thinking about this further, I think .dml is a good candidate to add to generic.yml too which would then mean this heuristic is used and will leave the other languages unaffected.

Please can you add this extension to the generic.yml file. This heuristic can also be improved (we don't need the capturing group and the tiny overhead it brings with it):

Suggested change
pattern: '(^dml\s1.4;)'
pattern: '^dml\s1.4;'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants