Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code blocks from doxygen are not highlighted correctly #123

Closed
rweickelt opened this issue Sep 1, 2014 · 25 comments · Fixed by #760
Closed

Code blocks from doxygen are not highlighted correctly #123

rweickelt opened this issue Sep 1, 2014 · 25 comments · Fixed by #760
Assignees
Labels
bug Problem in existing code code Source code

Comments

@rweickelt
Copy link
Contributor

Code blocks in doxygen are rendered as literal blocks without syntax highlighting. The reason is, that the doxygen XML output does not contain information about the used code domain.

The example code

bool myFunction(int parameter)
{
    if (parameter > 0)
    {
        return false;
    }

    return true;
}

produces the following XML output

<para><programlisting><codeline><highlight class="keywordtype">bool</highlight><highlight class="normal"><sp/>myFunction(</highlight><highlight class="keywordtype">int</highlight><highlight class="normal"><sp/>parameter)</highlight></codeline>
<codeline><highlight class="normal">{</highlight></codeline>
<codeline><highlight class="normal"><sp/><sp/><sp/><sp/></highlight><highlight class="keywordflow">if</highlight><highlight class="normal"><sp/>(parameter<sp/>&gt;<sp/>0)</highlight></codeline>
<codeline><highlight class="normal"><sp/><sp/><sp/><sp/>{</highlight></codeline>
<codeline><highlight class="normal"><sp/><sp/><sp/><sp/><sp/><sp/><sp/><sp/></highlight><highlight class="keywordflow">return</highlight><highlight class="normal"><sp/></highlight><highlight class="keyword">false</highlight><highlight class="normal">;</highlight></codeline>
<codeline><highlight class="normal"><sp/><sp/><sp/><sp/>}</highlight></codeline>
<codeline><highlight class="normal"></highlight></codeline>
<codeline><highlight class="normal"><sp/><sp/><sp/><sp/></highlight><highlight class="keywordflow">return</highlight><highlight class="normal"><sp/></highlight><highlight class="keyword">true</highlight><highlight class="normal">;</highlight></codeline>
<codeline><highlight class="normal">}</highlight></codeline>
</programlisting></para>

The problem could be probably solved by using embedded rst but at the price of not having links to documented items.

@michaeljones
Copy link
Collaborator

Hey,

Thanks for raising this. Maybe Doxygen always assumes that code blocks are in the same code domain as the source files themselves. Maybe we can assume the same thing? I'm not sure how to set up syntax highlighted blocks in Sphinx/RST but we can give it a go.

Cheers,
Michael

@rweickelt
Copy link
Contributor Author

Hi,

Maybe Doxygen always assumes that code blocks are in the same code domain
as the source files themselves.

exactly.

Maybe we can assume the same thing? I'm not sure how to set up syntax
highlighted blocks in Sphinx/RST but we can give it a go.

The problem is, that Doxygen embeds semantical syntax highlighting into the
XML but leaves out any information about the original language. The proposed
solution would fail if the code block is in another code domain than the
source file.

But maybe we could just map the highlight information from doxygen to
sphinx? Sphinx highlight classes are listet in pygments.css.

.highlight .hll { background-color: #ffffcc }
.highlight  { background: #eeffcc; }
.highlight .c { color: #408090; font-style: italic } /* Comment */
.highlight .err { border: 1px solid #FF0000 } /* Error */
.highlight .k { color: #007020; font-weight: bold } /* Keyword */
.highlight .o { color: #666666 } /* Operator */
.highlight .cm { color: #408090; font-style: italic } /* Comment.Multiline */
.highlight .cp { color: #007020 } /* Comment.Preproc */
.highlight .c1 { color: #408090; font-style: italic } /* Comment.Single */
.highlight .cs { color: #408090; background-color: #fff0f0 } /*
Comment.Special */
.highlight .gd { color: #A00000 } /* Generic.Deleted */
.highlight .ge { font-style: italic } /* Generic.Emph */
.highlight .gr { color: #FF0000 } /* Generic.Error */
.highlight .gh { color: #000080; font-weight: bold } /* Generic.Heading */
.highlight .gi { color: #00A000 } /* Generic.Inserted */
.highlight .go { color: #303030 } /* Generic.Output */
.highlight .gp { color: #c65d09; font-weight: bold } /* Generic.Prompt */
.highlight .gs { font-weight: bold } /* Generic.Strong */
.highlight .gu { color: #800080; font-weight: bold } /* Generic.Subheading */
.highlight .gt { color: #0040D0 } /* Generic.Traceback */
.highlight .kc { color: #007020; font-weight: bold } /* Keyword.Constant */
.highlight .kd { color: #007020; font-weight: bold } /* Keyword.Declaration */
.highlight .kn { color: #007020; font-weight: bold } /* Keyword.Namespace */
.highlight .kp { color: #007020 } /* Keyword.Pseudo */
.highlight .kr { color: #007020; font-weight: bold } /* Keyword.Reserved */
.highlight .kt { color: #902000 } /* Keyword.Type */
.highlight .m { color: #208050 } /* Literal.Number */
.highlight .s { color: #4070a0 } /* Literal.String */
.highlight .na { color: #4070a0 } /* Name.Attribute */
.highlight .nb { color: #007020 } /* Name.Builtin */
.highlight .nc { color: #0e84b5; font-weight: bold } /* Name.Class */
.highlight .no { color: #60add5 } /* Name.Constant */
.highlight .nd { color: #555555; font-weight: bold } /* Name.Decorator */
.highlight .ni { color: #d55537; font-weight: bold } /* Name.Entity */
.highlight .ne { color: #007020 } /* Name.Exception */
.highlight .nf { color: #06287e } /* Name.Function */
.highlight .nl { color: #002070; font-weight: bold } /* Name.Label */
.highlight .nn { color: #0e84b5; font-weight: bold } /* Name.Namespace */
.highlight .nt { color: #062873; font-weight: bold } /* Name.Tag */
.highlight .nv { color: #bb60d5 } /* Name.Variable */
.highlight .ow { color: #007020; font-weight: bold } /* Operator.Word */
.highlight .w { color: #bbbbbb } /* Text.Whitespace */
.highlight .mf { color: #208050 } /* Literal.Number.Float */
.highlight .mh { color: #208050 } /* Literal.Number.Hex */
.highlight .mi { color: #208050 } /* Literal.Number.Integer */
.highlight .mo { color: #208050 } /* Literal.Number.Oct */
.highlight .sb { color: #4070a0 } /* Literal.String.Backtick */
.highlight .sc { color: #4070a0 } /* Literal.String.Char */
.highlight .sd { color: #4070a0; font-style: italic } /* Literal.String.Doc */
.highlight .s2 { color: #4070a0 } /* Literal.String.Double */
.highlight .se { color: #4070a0; font-weight: bold } /* Literal.String.Escape */
.highlight .sh { color: #4070a0 } /* Literal.String.Heredoc */
.highlight .si { color: #70a0d0; font-style: italic } /*
Literal.String.Interpol */
.highlight .sx { color: #c65d09 } /* Literal.String.Other */
.highlight .sr { color: #235388 } /* Literal.String.Regex */
.highlight .s1 { color: #4070a0 } /* Literal.String.Single */
.highlight .ss { color: #517918 } /* Literal.String.Symbol */
.highlight .bp { color: #007020 } /* Name.Builtin.Pseudo */
.highlight .vc { color: #bb60d5 } /* Name.Variable.Class */
.highlight .vg { color: #bb60d5 } /* Name.Variable.Global */
.highlight .vi { color: #bb60d5 } /* Name.Variable.Instance */
.highlight .il { color: #208050 } /* Literal.Number.Integer.Long */

Doxygen uses much less classes. I found this list in doxygen.sty

\definecolor{comment}{rgb}{0.5,0.0,0.0}
\definecolor{keyword}{rgb}{0.0,0.5,0.0}
\definecolor{keywordtype}{rgb}{0.38,0.25,0.125}
\definecolor{keywordflow}{rgb}{0.88,0.5,0.0}
\definecolor{preprocessor}{rgb}{0.5,0.38,0.125}
\definecolor{stringliteral}{rgb}{0.0,0.125,0.25}
\definecolor{charliteral}{rgb}{0.0,0.5,0.5}
\definecolor{vhdldigit}{rgb}{1.0,0.0,1.0}
\definecolor{vhdlkeyword}{rgb}{0.43,0.0,0.43}
\definecolor{vhdllogic}{rgb}{1.0,0.0,0.0}
\definecolor{vhdlchar}{rgb}{0.0,0.0,0.0}

Colors are irrelevant.

@rweickelt
Copy link
Contributor Author

I'm still looking for a solution, but it may take while because my
contingent of freetime has been pruned recently ;-)

@michaeljones
Copy link
Collaborator

Hey,

Thanks for the bump. I guess we're all busy :)

I think it is an interesting idea and your proposal of converting from doxygen to sphinx styles might well work. I'm afraid I've not rushed into it as I don't know where to begin so I need a little more time or prompting. Messages like yours do help. I'll try to find time to think about it some more and start investigations.

Though if you find time yourself, I'll welcome any additional discoveries :)

All the best,
Michael

@rweickelt
Copy link
Contributor Author

I figured out docutils' raw node and hacked up something based on the above style map. But it's only a hack and there are many things that I do not understand.

Commit: rweickelt/breathe@ae2a87e

@michaeljones
Copy link
Collaborator

Thank you for sharing. It is interesting to see an approach that works and maybe we can figure out a way to use something like this. Unfortunately we can't have raw html in the main code as I'm sure you're aware due to other output formats (though the latex is currently broken!)

Good to see though.

I can't promise much movement on this soon but I'll try to find some time to think about it.

@rweickelt
Copy link
Contributor Author

Yes. LaTeX should be considered. Perhaps this would work very similar. Well, we could also just implement it for html and fall back for other formats.

@mosra
Copy link

mosra commented Nov 19, 2017

Hi all, hope it's okay that I revived this old issue :)

I just submited a patch to Doxygen that propagates language information to all <programlisting> elements in XML files and I think that might help you here: doxygen/doxygen#621

In my own Doxygen theme I'm just removing all Doxygen-made highlighting in those blocks and passing the raw code + language info through Pygments to get proper highlighting. For anyone who's interested, here's a screenshot and link to live snapshot: https://twitter.com/czmosra/status/932068062693642242

@SylvainCorlay
Copy link
Contributor

@mosra thanks for the update. I think that it is especially important when documenting things such as language bindings etc...

I don't know how much time it will take for doxygen to be released with your patch.

Although there might be some hope:

  • readthedocs supports conda (a general-purpose package manager)
  • doxygen is packaged for conda (on the conda-forge channel)

We use conda-forge + breathe for the documentation of e.g. xtensor.

So I can probably make a new build of the doxygen conda-package including your patch, and we could use it with breathe.

@vermeeren
Copy link
Collaborator

I think it is safe to say that @mosra 's patch is in current Doxygen releases. Based upon reading the above is it safe to say that Breathe still needs to implement reading the additional XML and use it for rendering?

@vermeeren vermeeren self-assigned this Jun 2, 2018
@vermeeren vermeeren removed their assignment Aug 27, 2019
@2bndy5
Copy link
Contributor

2bndy5 commented Nov 8, 2021

+1 to this. It would be very helpful to feed the code-block's domain into sphinx.

For example, I have a project that documents the C++ implementation, but it is littered with relative python wrapper "gotcha" example code-blocks to note slight differences in buffer object usages. Currently, I get a warning (related to this issue) for every python snippet in the C++ docs.

@jakobandersen
Copy link
Collaborator

@2bndy5, would it be possible for you (or someone else) to make a small example project to start discussion from? so we are aligned on what this feature request is about.

@2bndy5
Copy link
Contributor

2bndy5 commented Nov 8, 2021

After posting that comment, I tried to reproduce the warnings in a simple example project, but I was unable to do so. It seems that pygments is raising warnings for highly terse code-blocks.

I also looked at the XML output for a few programlisting tags, and I didn't see any obvious indication of the domain related to them. So, I'm not sure what help the mentioned & merged PR to doxygen can bring to this issue.

@jakobandersen
Copy link
Collaborator

With a bit of poking around in Breathe and Sphinx I think what is needed is:

  1. Get the filename attribute from the Doxygen XML parsed.
  2. In visit_listing https://github.com/michaeljones/breathe/blob/aec47028e05ad8d77e960db9d25140e9fd1c0127/breathe/renderer/sphinxrenderer.py#L1595, covert that attribute into a language name that the Sphinx code-block directive would accept and set it as the language attribute on the literal_block node, a la Sphinx: https://github.com/sphinx-doc/sphinx/blob/6c6cc8a6f50b18331cb818160d168d7bb9c03e55/sphinx/directives/code.py#L152
  3. Instead of setting child nodes of the literal_block node, stringify it and set it as the first two arguments, like in the Sphinx code-blcok directive implementation (https://github.com/sphinx-doc/sphinx/blob/6c6cc8a6f50b18331cb818160d168d7bb9c03e55/sphinx/directives/code.py#L145)

@2bndy5
Copy link
Contributor

2bndy5 commented Nov 8, 2021

I would be happier if doxygen would output the domain of a code-block as the programlisting tag attribute. Now, the link you mentioned also deals with line numbers. It would also be a lot easier if that flag was also an attribute. My intention here is to more easily translate the doxygen @code cmd's args into sphinx directive args (including doxygen default arg values) as they are almost identical (sphinx is better - naturally).

My use case wouldn't be solved by your proposed step 1 & 2 because I'm putting python code-blocks in a document created from a C++ header file.

@jakobandersen
Copy link
Collaborator

I would be happier if doxygen would output the domain of a code-block as the programlisting tag attribute.

For me it does that. E.g., with test.hpp having

/// 
/// Something
/// \code{blah}
/// int main() {}
/// \endcode
/// something else
void f();

I get the Doxygen XML

        <detaileddescription>
<para>Something <programlisting filename=".blah"><codeline><highlight class="normal">int<sp/>main()<sp/>{}</highlight></codeline>
</programlisting> something else </para>
        </detaileddescription>

It seems that whatever is in {...} after the \code becomes a file extension-like "filename".

Now, the link you mentioned also deals with line numbers. It would also be a lot easier if that flag was also an attribute.

All the line number stuff for code-block is optional, so we can safely ignore it.

My use case wouldn't be solved by your proposed step 1 & 2 because I'm putting python code-blocks in a document created from a C++ header file.

Hmm, it seems to me that individual \code commands can have whichever language you like. Or am I misunderstanding your situation?

@2bndy5
Copy link
Contributor

2bndy5 commented Nov 8, 2021

Hmm, it seems to me that individual \code commands can have whichever language you like. Or am I misunderstanding your situation?

exactly correct.

I never noticed the filename attribute. That could very well be used to specify the highlighing language used for pygments! I'll look into that more.

@2bndy5
Copy link
Contributor

2bndy5 commented Nov 8, 2021

I see that the filename attribute is only shown when it is specified by the docs' author. Sorry if I'm repeating what you already know; I'm just catching up...

Same is applied to MD-fenced code-blocks.

@2bndy5
Copy link
Contributor

2bndy5 commented Nov 8, 2021

@2bndy5
Copy link
Contributor

2bndy5 commented Nov 8, 2021

What I've tried so far:

  1. # in compoundsuper.py
    # in listingType class
     def buildAttributes(self, attrs: minidom.NamedNodeMap):
         if "filename" in attrs.keys():
             # extract the domain for this programlisting tag.
             ext_tuple = os.path.splitext(attrs["filename"].value)
             self.domain = ext_tuple[0 if not ext_tuple[1] else 1].lstrip(".")
  2. # in sphinxrenderer.py
     def visit_listing(self, node) -> List[Node]:
         nodelist = []  # type: List[Node]
         for i, item in enumerate(node.codeline):
             # Put new lines between the lines. There must be a more pythonic way of doing this
             if i:
                 nodelist.append(nodes.Text("\n"))
             nodelist.extend(self.render(item))
         code = "".join([x.astext() for x in nodelist])
    
         block = nodes.literal_block(code, code)
         if node.domain:
             block["langauge"] = node.domain
         return [block]

Unfortunately, these changes don't seem to render any differently. All code-blocks still respect whatever the highlight language is specified (or the highlight_language config option). I'll keep digging, but I feel like I'm missing something not-so-obvious...

@2bndy5
Copy link
Contributor

2bndy5 commented Nov 8, 2021

🤣

fixing this typo solved it

      block["langauge"] = node.domain

Ironlically, language is spelled wrong 🤣

@2bndy5
Copy link
Contributor

2bndy5 commented Nov 8, 2021

@jakobandersen I'm ready to submit your proposed implementation as a PR. Should I make any other considerations? Would breathe's ProjectInfo.domain_for_file() apply to this scenario?

Looking at the pygment lexers aliases, it seems that option may not be needed.

@2bndy5
Copy link
Contributor

2bndy5 commented Nov 9, 2021

I've added a bit that passes the compounddef tag's language attribute to the modified listingType c'tor domain parameter. This should only affect programlisting tags that are discovered as direct children of a compoundddef. Ultimately, this little bit won't ever get used until someone tries to hack together a viewsource-type feature in breathe (yes, I have been looking into that).

@mosra
Copy link

mosra commented Nov 11, 2021

I see that the filename attribute is only shown when it is specified by the docs' author

Just FYI, it also gets specified when you use Doxygen's @include or @snippet commands (which I use quite often to have documentation snippets checked by the compiler), and that was the main reason behind the Doxygen PR I linked above.

@2bndy5
Copy link
Contributor

2bndy5 commented Nov 11, 2021

@mosra Interesting advice! I re-read the PR you submitted to doxygen, with my newly enlightened understanding, and I took your advice about using pygments to guess the lexer name from the full filename. Thanks for submitting that PR!

@vermeeren vermeeren added bug Problem in existing code code Source code labels Feb 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Problem in existing code code Source code
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants