-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add common extensions to Motorola 68k Assembly #4637
Add common extensions to Motorola 68k Assembly #4637
Conversation
This pull request has been automatically marked as stale because it has not had recent activity, and will be closed if no further activity occurs. If this pull request was overlooked, forgotten, or should remain open for any other reason, please reply here to call attention to it and remove the stale status. Thank you for your contributions. |
- ".asm" | ||
- ".i" | ||
- ".inc" | ||
- ".s" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any registers or opcodes unique to Motorola we can use to disambiguate assembly files with?
We're definitely going to need some heuristics for .asm
and .inc
. The latter of which is particularly important because it sees very general use across a range of unrelated) languages…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
M68k assembly is easily distinguished from other assembly languages, both by registers and opcodes.
How would such a heuristic look?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A regular expression; I'm happy to write it for you, provided you give me the names of substrings guaranteed (or highly unlikely) to appear in the source code of any other assembler language.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(?im:moveq\b.*?d\d|move\.[bwl]\s+.*\b[ad]\d|movem\.[bwl]\b|btst\b|dbra\b)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be reasonable to limit the moveq
heuristic to match two registers (one address, one data)? From what I hear, 68k is unique for differentiating between the two.
If so, we could try this:
(?xi)
# Mnemonic
\b moveq (\.l)? \s+
# Address
\#( \$ -? [0-9a-f]{1,3}
| % [0-1]{1,8}
| -? [0-9]{1,3}
)
, \s*
# Register
d[0-7] \b
When writing heuristics, it's best to be as specific as possible; anything which doesn't match is passed down to the (less accurate) classification techniques.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Credit for this expression belongs to @zerkman, since it's taken from the language-m68k
grammar we're using to highlight 68k on GitHub (I did clean it up and remove some redundant syntax for clarity).
I've amended the other parts of that expression to use other bits of that grammar, bringing us down to:
(?xim:
# Mnemonic
\b moveq (\.l)? \s+
# Address
\#( \$ -? [0-9a-f]{1,3}
| % [0-1]{1,8}
| -? [0-9]{1,3}
)
, \s*
# Register
d[0-7] \b
| ^ \s* move (\.[bwl])? \s+ (sr|usp), \s* [^\s]+
| ^ \s* movem \.[bwl] \b
| ^ \s* move[mp] (\.[wl])? \b
| ^ \s* btst \b
| ^ \s* dbra \b
)
Notice that I've anchored the remaining parts to match at the beginning of a line (with or without indentation). This reduces the risk of incorrectly matching part of a comment in an unrelated file. For the same reason, you'll notice I avoid using wildcards when possible (.*
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@idrougge If the above revisions look good to you, then the changes to make to heuristics.yml
are below.
I'll still need to test them thoroughly on my end, as well as investigate any possible formats using the .i
extension that we've not registered yet.
Click to show diff
--- heuristics.yml 2019-10-03 21:45:48.000000000 +1000
+++ heuristics.yml 2019-10-03 22:30:25.000000000 +1000
@@ -49,8 +49,12 @@
rules:
- language: ActionScript
pattern: '^\s*(package\s+[a-z0-9_\.]+|import\s+[a-zA-Z0-9_\.]+;|class\s+[A-Za-z0-9_]+\s+extends\s+[A-Za-z0-9_]+)'
- language: AngelScript
+- extensions: ['.asm']
+ rules:
+ - language: Motorola 68K Assembly
+ named_pattern: m68k
- extensions: ['.asc']
rules:
- language: Public Key
pattern: '^(----[- ]BEGIN|ssh-(rsa|dss)) '
@@ -191,8 +195,10 @@
pattern: '\A\s*[{\[]'
- language: Slice
- extensions: ['.inc']
rules:
+ - language: Motorola 68K Assembly
+ named_pattern: m68k
- language: PHP
pattern: '^<\?(?:php)?'
- language: SourcePawn
pattern: '^public\s+(?:SharedPlugin(?:\s+|:)__pl_\w+\s*=(?:\s*{)?|(?:void\s+)?__pl_\w+_SetNTVOptional\(\)(?:\s*{)?)'
@@ -383,8 +389,12 @@
- language: Rust
pattern: '^(use |fn |mod |pub |macro_rules|impl|#!?\[)'
- language: RenderScript
pattern: '#include|#pragma\s+(rs|version)|__attribute__'
+- extensions: ['.s']
+ rules:
+ - language: Motorola 68K Assembly
+ named_pattern: m68k
- extensions: ['.sc']
rules:
- language: SuperCollider
pattern: '(?i:\^(this|super)\.|^\s*~\w+\s*=\.)'
@@ -486,7 +496,15 @@
- '^[ \t]*(private|public|protected):$'
- 'std::\w+'
fortran: '^(?i:[c*][^abd-z]| (subroutine|program|end|data)\s|\s*!)'
key_equals_value: '^[^#!;][^=]*='
+ m68k:
+ - '(?im)\bmoveq(?:\.l)?\s+#(?:\$-?[0-9a-f]{1,3}|%[0-1]{1,8}|-?[0-9]{1,3}),\s*d[0-7]\b'
+ - '(?im)^\s*move(?:\.[bwl])?\s+(?:sr|usp),\s*[^\s]+'
+ - '(?im)^\s*move\.[bwl]\s+.*\b[ad]\d'
+ - '(?im)^\s*movem\.[bwl]\b'
+ - '(?im)^\s*move[mp](?:\.[wl])?\b'
+ - '(?im)^\s*btst\b'
+ - '(?im)^\s*dbra\b'
objectivec: '^\s*(@(interface|class|protocol|property|end|synchronised|selector|implementation)\b|#import\s+.+\.h[">])'
perl5: '\buse\s+(?:strict\b|v?5\.)'
perl6: '^\s*(?:use\s+v6\b|\bmodule\b|\b(?:my\s+)?class\b)'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested the ^ \s* move (\.[bwl])? \s+ (sr|usp), \s* [^\s]+
line, and it only catches moves to sr
or usp
, not moves to any given register. I feel that the Motorola syntax of `move.size with source or destination as a register named Dn or An is sufficiently dissimilar to other assembly syntaxes to avoid confusion with other assembly languages while also catching even the shortest snippet.
Testing may prove the totality of heuristics to still be sufficient to catch all m68k assembly sources.
.i
, like .inc
, .asm
or .s
is used by assemblers on most platforms AFAIK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, the third line of the m68k
heuristic becomes:
- '(?im)^\s*move\.[bwl]\s+.*\b[ad]\d'
I'll update the diff I just posted.
.i
, like.inc
,.asm
or.s
is used by assemblers on most platforms AFAIK.
There are currently 11,273,157 .i
files publicly indexed on GitHub. Surely there must be other formats hidden out there...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That regex looks fine for heuristics.
A quick glance at those results indicate that a lot of .i
files are SWIG files, which may need a language definition of their own.
This pull request has been automatically marked as stale because it has not had recent activity, and will be closed if no further activity occurs. If this pull request was overlooked, forgotten, or should remain open for any other reason, please reply here to call attention to it and remove the stale status. Thank you for your contributions. |
This pull request has been automatically marked as stale because it has not had recent activity, and will be closed if no further activity occurs. If this pull request was overlooked, forgotten, or should remain open for any other reason, please reply here to call attention to it and remove the stale status. Thank you for your contributions. |
This pull request has been automatically marked as stale because it has not had recent activity, and will be closed if no further activity occurs. If this pull request was overlooked, forgotten, or should remain open for any other reason, please reply here to call attention to it and remove the stale status. Thank you for your contributions. |
Christ. Going through the search results for @idrougge Are these files Motorola 68k assembly? If so, we can add What makes me unsure is that when I googled
|
@Alhadis The files you linked are all Motorola 6809 assembly, which is an 8-bit processor totally distinct from 68000. I agree that |
Ah, my bad. I took We'll need to add I can push these changes to your branch later, as I'll need to run Linguist on the files I harvested (I'm not currently on a computer where I'm able to do so). Thanks for your input and patience, I realise this PR's been left hanging open fo some time. 👍 |
Indeed 68k is 68000…68999, but 6809 < 68000. ;) |
Erp. So it is. This just keeps getting better. 😅 I should stop talking now. |
Alright. I've pushed the aforementioned changes to your branch; running Linguist locally on the harvested All that's left now is for @lildude to give a final 👍 before we can merge. |
Thanks! |
* Add common extensions to Motorola 68k * Revert ACE mode for m68k assembly * Add heuristics for Motorola 68K Assembly * Add SWIG language and `.i` Assembly extension Co-authored-by: John Gardner <gardnerjohng@gmail.com>
* add .4dm extensons * no language for the moment * change the source of syntax highlighting for Agda (#4768) * Add interpreters 'csh' and 'tcsh' for language 'Tcsh' (#4760) * Update languages.yml * Create regtest_nmmnest.csh Source: https://github.com/barlage/WRF-kill/blob/master/tools/regtest_nmmnest.csh * Register `.bibtex` as a BibTeX file-extension (#4764) * Register `.dof` as an INI file-extension (#4766) * Register `.epsi` as a PostScript file-extension (#4763) * Add common extensions to Motorola 68k Assembly (#4637) * Add common extensions to Motorola 68k * Revert ACE mode for m68k assembly * Add heuristics for Motorola 68K Assembly * Add SWIG language and `.i` Assembly extension Co-authored-by: John Gardner <gardnerjohng@gmail.com> * Add file extension for SnakeMake (#3953) * Add file extension for SnakeMake Previously a file name was defined for [SnakeMake[(snakemake-wrappers.readthedocs.io): #1834 Currently, the canonical extension is `smk` (see [this discussion](https://groups.google.com/forum/#!topic/Snakemake/segLE-RlV_s) with the author (@johanneskoester) of SnakeMake, and the [FAQ](http://snakemake.readthedocs.io/en/stable/project_info/faq.html#how-do-i-enable-syntax-highlighting-in-vim-for-snakefiles)). * Adding two Snakemake (smk) example files * add .4dm extensons * no language for the moment * add lang-4d tmLanguage * link syntax highliting * typo Co-authored-by: Guillaume Brunerie <guillaume.brunerie+github@gmail.com> Co-authored-by: friedc <52925889+friedc@users.noreply.github.com> Co-authored-by: John Gardner <gardnerjohng@gmail.com> Co-authored-by: Iggy Drougge <idrougge@mac.com> Co-authored-by: Nils Homer <nh13@users.noreply.github.com>
Description
Common extensions for m68k assembly are
.asm
,.s
,.i
,.inc
. The only currently registered extensionx68
is only used within one specific development environment AFAIK.Checklist:
I am associating a language with a new file extension.
The new extension is used in hundreds of repositories on GitHub.com
.asm
.i
.inc
.s
I have included real-world usage samples for all extensions added in this PR:
Motorola 68K Assembly
bls_routines.inc
: MITcpu.s
: Public Domain / BSD 2-Clauseiff_ilbm.i
: Apache 2.0rom_testbench.asm
: ISCsystem.s
: BSD 2-Clause“General” Assembly
3D_PRG.I
: Public DomainA8514.I
: BSD 3-Clauseaudio.i
: BSD 2-ClauseI have included a change to the heuristics to distinguish my language from others using the same extension.
I am adding a new language.
.i
CGAL_AABB_tree.i
: Boost Licensedictionary.i
: MITgauss.i
: MIT]