allow processing of some tags (e.g. ` `, ``) inside `<pre>`/`<code>` blocks #245

asdfzdfj · 2023-12-12T13:11:26Z

Version(s) affected

5.1.1

Description

basically, when converting (possibly rendered) html <pre> or <code> block that contains some tags inside them, these extra tags will be included in the converted markdown output verbatim, which could results in the converted markdown that looks quite different from what the source markdown was or what the input html would renders to, and from what I could make out from <pre> and <code> docs, these tags does allow its content to contain some other tags.

some cases where this could happen:

converting output from whatever html renderer that really, really likes using   for newline even in these blocks
converting a code block that used  inside the block for the purpose of syntax highlighting

How to reproduce

note: all these examples were run with strip_tags option enabled, but otherwise using default converters and options

first case: ` ` usage for newlines inside `<pre><code>`

input

(yes it's all in one line)

<pre><code># When I first wrote this regex I thought it was slick.<br># I still think that, but 2y after doing it the first time<br># it just hurt to look at.  So, /x modifier we go!<br><br>my @Set = map { [ split( m/\s*:\s*/, $_, 2 ) ] } $args =~ m/<br>    \s*         # ignore preceeding whitespace<br>    (           # begin capturing<br>     (?:        # grab characters we want<br>         \\.    # skip over escapes<br>         | <br>         [^;]   # or anything but a ; <br>     )+?        # ? greedyness hack lets the \s* actually match<br>    )           # end capturing<br>    \s*         # ignore whitespace between value and ; or end of line<br>    (?:         # stop anchor at ...<br>      ;         # semicolon<br>      |         # or<br>      $         # end of line<br>    ) <br>    \s*/gx;<br></code></pre>

expected

(no actual markdown source to compare so this is an approximation of what I expect to see)

```
# When I first wrote this regex I thought it was slick.
# I still think that, but 2y after doing it the first time
# it just hurt to look at.  So, /x modifier we go!

my @Set = map { [ split( m/\s*:\s*/, $_, 2 ) ] } $args =~ m/
    \s*         # ignore preceeding whitespace
    (           # begin capturing
     (?:        # grab characters we want
         \\.    # skip over escapes
         | 
         [^;]   # or anything but a ; 
     )+?        # ? greedyness hack lets the \s* actually match
    )           # end capturing
    \s*         # ignore whitespace between value and ; or end of line
    (?:         # stop anchor at ...
      ;         # semicolon
      |         # or
      $         # end of line
    ) 
    \s*/gx;
```

`html-to-markdown` output

```
# When I first wrote this regex I thought it was slick.<br></br># I still think that, but 2y after doing it the first time<br></br># it just hurt to look at.  So, /x modifier we go!<br></br><br></br>my @Set = map { [ split( m/\s*:\s*/, $_, 2 ) ] } $args =~ m/<br></br>    \s*         # ignore preceeding whitespace<br></br>    (           # begin capturing<br></br>     (?:        # grab characters we want<br></br>         \.    # skip over escapes<br></br>         | <br></br>         [^;]   # or anything but a ; <br></br>     )+?        # ? greedyness hack lets the \s* actually match<br></br>    )           # end capturing<br></br>    \s*         # ignore whitespace between value and ; or end of line<br></br>    (?:         # stop anchor at ...<br></br>      ;         # semicolon<br></br>      |         # or<br></br>      $         # end of line<br></br>    ) <br></br>    \s*/gx;<br></br>
```

second case: mixing `` inside `<pre>`/`<code>`

input

<pre style="background-color:#ffffff;">
<span style="color:#323232;">while (number &gt; 1) {
</span><span style="color:#323232;">  number -= 2;
</span><span style="color:#323232;">}
</span><span style="color:#323232;">return number;
</span></pre>

expected

```
while (number > 1) {
  number -= 2;
}
return number;
```

`html-to-markdown` output

```
                              
<span style="color:#323232;">while (number > 1) {
</span><span style="color:#323232;">  number -= 2;
</span><span style="color:#323232;">}
</span><span style="color:#323232;">return number;
</span>
```

The text was updated successfully, but these errors were encountered:

olegme mentioned this issue Mar 26, 2024

 not properly converted to markdown when inside <pre></pre> tags smarinier/importer#2

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

allow processing of some tags (e.g. `<br>`, `<span>`) inside `<pre>`/`<code>` blocks #245

allow processing of some tags (e.g. `<br>`, `<span>`) inside `<pre>`/`<code>` blocks #245

asdfzdfj commented Dec 12, 2023

allow processing of some tags (e.g. <br>, <span>) inside <pre>/<code> blocks #245

allow processing of some tags (e.g. <br>, <span>) inside <pre>/<code> blocks #245

Comments

asdfzdfj commented Dec 12, 2023

Version(s) affected

Description

How to reproduce

first case: <br> usage for newlines inside <pre><code>

input

expected

html-to-markdown output

second case: mixing <span> inside <pre>/<code>

input

expected

html-to-markdown output

allow processing of some tags (e.g. `<br>`, `<span>`) inside `<pre>`/`<code>` blocks #245

allow processing of some tags (e.g. `<br>`, `<span>`) inside `<pre>`/`<code>` blocks #245

first case: `<br>` usage for newlines inside `<pre><code>`

`html-to-markdown` output

second case: mixing `<span>` inside `<pre>`/`<code>`

`html-to-markdown` output