Improve comments using LLMs. #4013

dcompoze · 2024-04-05T21:36:48Z

Hello, I recently submitted changes to fix spelling mistakes in this repository (#3808).

Based on the discussion, I also wanted to improve the general quality of comments in the repository (e.g. grammatical errors, periods, capital letters, markdown formatting).

Simple spelling mistakes are relatively easy to detect and fix with regular spell checkers and some manual intervention, but the formatting and grammar of comments is more difficult to fix with those types of tools.

The approach I took here is to create a Rust parser (using nom) to extract comments from Rust files and feed them to the gpt-4-turbo large language model to suggest improvements.

The current prompt for the model is:

You are provided a list of comments from a Rust file in the Polkadot repository.
Each comment starts with a line number followed by a colon and then the contents of the comment.
Your task is to improve the spelling and grammar of comments and markdown formatting of doc comments.
You should format each comment according to the following rules:
- Comments should be full sentences, begin with a capital letter and end with a period.
- Doc comments should follow markdown formatting rules (e.g. code items should always use backticks).
- Improve comment grammar if it can be improved.
- If you encounter a British or American spelling of a given word, you should keep the one that is generally more common.
- If a comment is already well formatted, you should not modify it.
- Prefer capital letters for abbreviations such as 'ID' vs 'id' unless the text is part of code, such as 'UserId'.
- A single comment cannot exceed a 100 character width limit.

The suggested changes are then manually reviewed by me one by one.

This draft PR includes changes to 12 files in the bridges directory just to showcase what the resulting changes look like.

The cost of the OpenAI API requests is about 0.45 USD per 10 average files.

Given that the whole repository has about 3570 files, the estimate for the full repository would be somewhere around 160.0 USD in API credits.

Note: I also tried doing this with local ollama models but the results were not as good as GPT4.

Also, it takes me about about 30-60 seconds to review the changes to each file, so doing this to all the files would be somewhere in the ballpark of 30-50 hours.

The verification process looks something like this from my end:

I wanted to gauge the interest for these changes, and if there is interest, if some funding could be provided to cover the API and time cost.

@joepetrowski @bkchr What do you think?

…king.rs.

….rs.

…on.rs.

…nsion.rs.

…arking.rs.

…or.rs.

…tension.rs.

dcompoze added 12 commits April 5, 2024 21:05

Improved comments in bridges/bin/runtime-common/src/integrity.rs.

aa8a20c

Improved comments in bridges/bin/runtime-common/src/lib.rs.

1cc6092

Improved comments in bridges/bin/runtime-common/src/messages_api.rs.

816d75a

Improved comments in bridges/bin/runtime-common/src/messages_benchmar…

8357bcf

…king.rs.

Improved comments in bridges/bin/runtime-common/src/messages_call_ext…

791fc9b

….rs.

Improved comments in bridges/bin/runtime-common/src/messages_generati…

a8df8b5

…on.rs.

Improved comments in bridges/bin/runtime-common/src/messages_xcm_exte…

4296581

…nsion.rs.

Improved comments in bridges/bin/runtime-common/src/messages.rs.

3e4b522

Improved comments in bridges/bin/runtime-common/src/mock.rs.

867cb92

Improved comments in bridges/bin/runtime-common/src/parachains_benchm…

24c16b3

…arking.rs.

Improved comments in bridges/bin/runtime-common/src/priority_calculat…

59349c0

…or.rs.

Improved comments in bridges/bin/runtime-common/src/refund_relayer_ex…

1f4a2eb

…tension.rs.

dcompoze closed this Jul 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve comments using LLMs. #4013

Improve comments using LLMs. #4013

dcompoze commented Apr 5, 2024 •

edited

Loading

Improve comments using LLMs. #4013

Improve comments using LLMs. #4013

Conversation

dcompoze commented Apr 5, 2024 • edited Loading

dcompoze commented Apr 5, 2024 •

edited

Loading