Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use markdown parser to identify code blocks in literate coffeescript #3924

Closed
wants to merge 3 commits into from
Closed

Conversation

billymoon
Copy link
Contributor

There are many false positive matches for code blocks made by the current coffee-script parser which are not really code blocks in the markdown. This fixes it by using a markdown parser to identify code blocks.

this should help to handle ligitimate markdown with indentation correctly
@carlsmith
Copy link
Contributor

+1 if this stops indented code being classed as CoffeeScript when it's inside html tags. That's a pain point for CoffeeShop.

@billymoon
Copy link
Contributor Author

@carlsmith this most definitely stops indented code within html from being treated as coffee script, but I had a look at how it would fit in your library, and realised I have missed something important. I used a require statement to include the marked library, assuming coffee script running in nodejs. There needs to be a way for this dependency to be included in the browser build. I don't know what is the best approach for this, and coffee script currently has no dependencies, but it used to, so there might be a clue as how best to do it in the source code history.

@lydell
Copy link
Collaborator

lydell commented Apr 7, 2015

It used to have mkdirp as a dependency, but it was only used in the CLI, not in the browser version of the compiler.

@carlsmith
Copy link
Contributor

It'd be nice if you could pass a function to the compiler, so people could pass adapters for Marked or whatever library has the features they need, and let it fall back to the current behaviour by default.

Nice work by the way.

@GeoffreyBooth
Copy link
Collaborator

@billymoon I don’t know if you’re still interested in this PR, but I figured out how to make it work in browsers too:

diff --git a/Cakefile b/Cakefile
index cf538b8..1e822ce 100644
--- a/Cakefile
+++ b/Cakefile
@@ -140,6 +140,14 @@ task 'build:parser', 'rebuild the Jison parser (run build first)', ->

 task 'build:browser', 'rebuild the merged script for inclusion in the browser', ->
   code = ''
+  for {name, src} in [{name: 'marked', src: 'lib/marked.js'}]
+    code += """
+      require['#{name}'] = (function() {
+        var exports = {}, module = {exports: exports};
+        #{fs.readFileSync "node_modules/#{name}/#{src}"}
+        return module.exports;
+      })();
+    """
   for name in ['helpers', 'rewriter', 'lexer', 'parser', 'scope', 'nodes', 'sourcemap', 'coffee-script', 'browser']
     code += """
       require['./#{name}'] = (function() {

@GeoffreyBooth
Copy link
Collaborator

Also I improved invertLiterate so that the token it uses as a placeholder for tabs is ensured to not appear in code, rather than a magic string that we’re gambling won’t appear in code:

diff --git a/src/helpers.coffee b/src/helpers.coffee
index 0ac49d7..0f9f4e8 100644
--- a/src/helpers.coffee
+++ b/src/helpers.coffee
@@ -82,16 +82,19 @@ exports.some = Array::some ? (fn) ->
 # out all non-code blocks,  producing a string of CoffeeScript code that can
 # be compiled "normally".
 exports.invertLiterate = (code) ->
-  # don't know how to avoid this hack using token as placeholder for tabs, then
-  # re-inserting tabs after code extraction. The token has been split in two
-  # so that it does not end up getting parsed in this src code
-  token = '9ddb1d26184bdaf8'+'d4de55835d82eb56'
+  # Create a placeholder for tabs, that isn’t used anywhere in `code`, and then
+  # re-insert the tabs after code extraction.
+  generateRandomToken = ->
+    "#{Math.random() * Date.now()}"
+  while token is undefined or code.indexOf(token) isnt -1
+    token = generateRandomToken()
+
   code = code.replace "\t", token

@GeoffreyBooth
Copy link
Collaborator

Since we merged this code via #4345 into 2, are we deciding this won’t be implemented in the 1.x branch? And therefore we should close this PR?

@lydell
Copy link
Collaborator

lydell commented Oct 26, 2016

Yes.

@GeoffreyBooth
Copy link
Collaborator

@billymoon do you mind providing examples of the false positives you mentioned when you started this thread? So that I can verify that the current implementation does, in fact, fix them. Or if we want to try yet another implementation that only looks at indentation and doesn’t try to parse Markdown, it would be good to have test cases that need to pass.

@GeoffreyBooth GeoffreyBooth mentioned this pull request Apr 14, 2017
6 tasks
@billymoon
Copy link
Contributor Author

@GeoffreyBooth I think that was done in this commit: bf1d9d3

@akfish
Copy link
Contributor

akfish commented Apr 14, 2017

Looking through CommonMark spec, I found one false positive and one false negative in Coffee 1.12.5.

Case 1

If there is any ambiguity between an interpretation of indentation as a code block and as indicating that material belongs to a list item, the list item interpretation takes precedence:

Example 77:

  - foo

    should_not_be_code

Example 78:

1.  foo

    - should_not_be_Code_too

Compiles to:

// Generated by CoffeeScript 1.12.5
(function() {
  should_not_be_code;
  -should_not_be_code_too;

}).call(this);

Parsed by CommonMark Reference Parser:

<p>If there is any ambiguity between an interpretation of indentation as a code block and as indicating that material belongs to a list item, the list item interpretation takes precedence:</p>
<p>Example 77:</p>
<ul>
<li>
<p>foo</p>
<p>should_not_be_code</p>
</li>
</ul>
<p>Example 78:</p>
<ol>
<li>
<p>foo</p>
<ul>
<li>should_not_be_code_too</li>
</ul>
</li>
</ol>

GitHub Render Result:


If there is any ambiguity between an interpretation of indentation as a code block and as indicating that material belongs to a list item, the list item interpretation takes precedence:

Example 77:

  • foo

    should_not_be_code

Example 78:

  1. foo

    • should_not_be_Code_too

Case 2

And indented code can occur immediately before and after other kinds of blocks:

Example 84:

# Heading
    should_be_code
Heading
------
    should_be_code_too
----

Compiles to:

// Generated by CoffeeScript 1.12.5
(function() {


}).call(this);

CommonMark generates:

<p>And indented code can occur immediately before and after other kinds of blocks:</p>
<p>Example 84:</p>
<h1>Heading</h1>
<pre><code>should_be_code
</code></pre>
<h2>Heading</h2>
<pre><code>should_be_code_too
</code></pre>
<hr />

GitHub Render Result:


And indented code can occur immediately before and after other kinds of blocks:

Example 84:

Heading

should_be_code

Heading

should_be_code_too


edit: format

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants