Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: improve for source providing huge list of items #1980

Merged
merged 4 commits into from
Oct 20, 2024

Conversation

yioneko
Copy link
Contributor

@yioneko yioneko commented Jul 9, 2024

Problem

nvim-cmp is still not ideally responsive when dealing with huge list of completion items, especially with tailwindcss which sends thousands of candidates on each completion response: #1009 #1828 #1949. Though some improvements have been done before but cmp is still apparently slower than other completion engine like vscode and coc.nvim in extreme cases.

Attempt to improve

This PR extracts two commits from my experimental perf branch fork which I think could be better upstreamed. The details and rationales are explained in commit descriptions.

The commits introduce massive changes to the codebase, and I'm not sure if you'd like to accept them. Since the changes are made just for improving performance in extreme cases, it's totally fine for me if you consider that it is not worth it.

Comparison

Recorded at a110e12:
cmp2

PR:
cmp1

@epheien
Copy link
Contributor

epheien commented Jul 9, 2024

Good jobs.
This plugin is still very useful.
Unfortunately, the author doesn't seem to have much time to maintain this plugin.

@yioneko yioneko force-pushed the perf-up branch 3 times, most recently from e3d9729 to 9033adc Compare July 15, 2024 01:05
@idelice
Copy link

idelice commented Aug 4, 2024

@hrsh7th please let's get this performance boost in asap

@Shougo
Copy link

Shougo commented Aug 4, 2024

But it seems draft state. It is ready to merge?

@idelice
Copy link

idelice commented Aug 4, 2024

I tested this by forking @yioneko and there is a huge difference in the autocompletion. Everything works like the same - just better.

I also tested it by forking the original nvim-cmp and copying this PR changes over, and things seems to break. Could be because it's few commits behind which are kinda deal-breaking!

@yioneko
Copy link
Contributor Author

yioneko commented Aug 4, 2024

@idelice Could you expand on what is breaking? I just rebased the branch onto the main locally and it seems to work fine as before. I've been using this for about one month personally without issues and I think this is generally ready for review if no more problems found.

@idelice
Copy link

idelice commented Aug 4, 2024

@idelice Could you expand on what is breaking? I just rebased the branch onto the main locally and it seems to work fine as before. I've been using this for about one month personally without issues and I think this is generally ready for review if no more problems found.

I don't know. I'm not familiar with lua enough to pinpoint what caused my user-testing to fail

@Shougo
Copy link

Shougo commented Aug 4, 2024

Hm... I think more testers are needed.

@xzbdmw
Copy link

xzbdmw commented Aug 5, 2024

Works well so far 😄

@lopi-py
Copy link

lopi-py commented Aug 26, 2024

It works fine for me, no issues. I'll be using this fork for now, @yioneko thanks for the improvement!

@tronikelis
Copy link

This is awesome, but why is it still draft? @yioneko you feel like its not ready yet?

…ters

This mainly addresses the perf issue on large amount of calls to
`entry.new`. Previously every `cache.ensure` calls in the code path of
it creates an anonymous function, and it seems that luajit just could
not inline it. Function creation is not expensive in luajit, but that
overhead is noticeable if every `cache.ensure` call creates a function.
The first improvemnt is to solidate the cache callback and attach it to
the metatable of `entry`. This ensures that every created entry instance
share the same cache callback and no new functions will be frequently created,
reduces the ram usage and GC overhead.

To improve it further, some frequently accessed fields of entry like
`completion_item` and `offset` is refactored to use simple table access
instead of getter pattern. The current cached getter is implemented
using `cache.ensure`, which introduces two more levels of function calls
on each access: `cache.key` and `cache.get`. The overhead is okay if but
noticeable if entries amount is quite large: you need to call 4 functions on
a simple `completion_item` field access for each item.

All of the changes done in the commit is just constant time
optimization. But the different is huge if tested with LS providing
large amount of entries like tailwindcss.
`entry.get_vim_item` is a very heavy call, especially when user do
complex stuff on item formatting. Delay its call to window displaying to
let `performance.max_view_entries` applied to it.
@yioneko
Copy link
Contributor Author

yioneko commented Aug 28, 2024

I think this is generally ready for review. It might do not matter whether the PR status is draft or open as it seems that it would take a while to receive response from the owner, so I'll just change it to indicate that the branch here is open to user testing.

About the CI failing: leafo/gh-actions-lua#49 and it passed before the rebase and all the tests are passed on my local environment.

@yioneko yioneko marked this pull request as ready for review August 28, 2024 09:54
@pidgeon777
Copy link

Hopefully this will be merged, can't wait to test it.

@xzbdmw
Copy link

xzbdmw commented Sep 2, 2024

Tested with cmp-dictionary, and with 1000 items this branch is 6x faster, 300ms vs 2s 🚀

@rockyzhang24
Copy link

I know I could use this PR personally but I still desire it gets merged and I believe it must benefit every user. Apologize sincerely for the ping @hrsh7th .

@pidgeon777
Copy link

I tested the PR and I also noticed a big improvement in performance.

@EuCaue
Copy link

EuCaue commented Sep 18, 2024

I tested this PR and noticed a significant improvement in performance. I haven't encountered any bugs. (yet?)

WilliamHsieh added a commit to WilliamHsieh/dotfiles that referenced this pull request Sep 26, 2024
@pidgeon777
Copy link

Hello! I would like to kindly ask if there is any news about the testing outcome

@hrsh7th
Copy link
Owner

hrsh7th commented Oct 9, 2024

Sorry for the late reply. I'll do review.

}
-- https://www.lua.org/pil/11.6.html
-- do not use '..' to allocate multiple strings
local cache_key = string.format('%s:%d:%d:%d:%d:%d:%d', input, self.resolved_completion_item and 1 or 0, matching_config.disallow_fuzzy_matching and 1 or 0, matching_config.disallow_partial_matching and 1 or 0, matching_config.disallow_prefix_unmatching and 1 or 0, matching_config.disallow_partial_fuzzy_matching and 1 or 0, matching_config.disallow_symbol_nonprefix_matching and 1 or 0)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion this does not affect performance.
The issue of allocating multiple strings only arises when concatenating strings in a loop.

(But I can accept this.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe that is true, especially for nested expressions, as every expression separately has to be evaluated and each evaluation will allocate a string.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My benchmark is here.

local ITERATION = 1000000000

local function consume(_)
end

local function run(name, f)
  collectgarbage('collect')
  collectgarbage('setpause')
  local s = os.clock()
  for _ = 0, ITERATION, 1 do
    consume(f())
  end
  local e = os.clock()
  print(('%s'):format(name))
  print(('    %s seconds'):format(e - s))
  print(('    %s kb'):format(collectgarbage('count')))
end

local a = string.rep('a', 100)
local b = string.rep('b', 100)
local c = string.rep('c', 100)
local d = string.rep('d', 100)
local e = string.rep('e', 100)
local f = string.rep('f', 100)
local g = string.rep('g', 100)

vim.print(('Lua version %s (%s)'):format(_VERSION, type(jit) == 'table' and 'JIT' or 'PUC'))

for i = 1, 10 do
  run(('[%s] format'):format(i), function()
    return string.format('%s%s%s%s%s%s%s', a, b, c, d, e, f, g)
  end)
  run(('[%s] syntax'):format(i), function()
    return a .. b .. c .. d .. e .. f .. g
  end)
end
Lua version Lua 5.1 (JIT)
[1] format
    0.309662 seconds
    7442.2548828125 kb
[1] syntax
    0.287758 seconds
    7442.9619140625 kb
[2] format
    0.288337 seconds
    7443.9189453125 kb
[2] syntax
    0.289305 seconds
    7444.6806640625 kb
[3] format
    0.28824099999999 seconds
    7445.8564453125 kb
[3] syntax
    0.288509 seconds
    7447.3330078125 kb
[4] format
    0.28775099999999 seconds
    7446.8486328125 kb
[4] syntax
    0.29039900000001 seconds
    7446.5986328125 kb
[5] format
    0.288085 seconds
    7446.4580078125 kb
[5] syntax
    0.290503 seconds
    7446.3955078125 kb
[6] format
    0.28749599999998 seconds
    7446.4111328125 kb
[6] syntax
    0.287892 seconds
    7446.3955078125 kb
[7] format
    0.28724800000001 seconds
    7446.4111328125 kb
[7] syntax
    0.287656 seconds
    7446.3955078125 kb
[8] format
    0.28767599999998 seconds
    7446.4111328125 kb
[8] syntax
    0.28742099999999 seconds
    7446.4111328125 kb
[9] format
    0.28775399999998 seconds
    7446.4111328125 kb
[9] syntax
    0.28724700000001 seconds
    7446.4111328125 kb
[10] format
    0.28726999999998 seconds
    7446.4111328125 kb
[10] syntax
    0.287116 seconds
    7446.3955078125 kb

Copy link
Contributor

@lewis6991 lewis6991 Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you try with nested expressions (using ())?

Also the benchmark uses the same strings so you'll get the benefit of string interning.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmark updated.

However, it was necessary to use math.random.
This may have become a benchmark for math.random.

local ITERATION = 10000000

local function consume(_)
end

local function run(name, f)
  collectgarbage('collect')
  collectgarbage('stop')
  local s = os.clock()
  for _ = 0, ITERATION, 1 do
    consume(f())
  end
  local e = os.clock()
  print(('%s'):format(name))
  print(('    %s seconds'):format(e - s))
  print(('    %s kb'):format(collectgarbage('count')))
  collectgarbage('restart')
end


vim.print(('Lua version %s (%s)'):format(_VERSION, type(jit) == 'table' and 'JIT' or 'PUC'))

for i = 1, 10 do
  run(('[%s] format'):format(i), function()
    local a = tostring(math.random())
    local b = tostring(math.random())
    local c = tostring(math.random())
    return string.format('%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s', a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c)
  end)
  run(('[%s] syntax'):format(i), function()
    local a = tostring(math.random())
    local b = tostring(math.random())
    local c = tostring(math.random())
    return (a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c)
  end)
end
Lua version Lua 5.1 (JIT)
[1] format
    0.173224 seconds
    7593.55078125 kb
[1] syntax
    0.161859 seconds
    6650.484375 kb
[2] format
    0.158104 seconds
    6652.5 kb
[2] syntax
    0.159953 seconds
    6654.12890625 kb
[3] format
    0.158455 seconds
    6656.18359375 kb
[3] syntax
    0.157741 seconds
    6656.9921875 kb
[4] format
    0.160288 seconds
    6645.15234375 kb
[4] syntax
    0.160823 seconds
    6644.81640625 kb
[5] format
    0.161516 seconds
    6644.77734375 kb
[5] syntax
    0.159043 seconds
    6644.62890625 kb
[6] format
    0.159526 seconds
    6644.71484375 kb
[6] syntax
    0.159805 seconds
    6644.62890625 kb
[7] format
    0.160698 seconds
    6644.71484375 kb
[7] syntax
    0.159825 seconds
    6644.62890625 kb
[8] format
    0.160317 seconds
    6644.71484375 kb
[8] syntax
    0.161488 seconds
    6644.62890625 kb
[9] format
    0.159925 seconds
    6644.71484375 kb
[9] syntax
    0.159897 seconds
    6644.62890625 kb
[10] format
    0.15959 seconds
    6644.70703125 kb
[10] syntax
    0.159551 seconds
    6644.62890625 kb```

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know. Thanks again for doing that. I'll reference this for future discussions.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry. my benchmark is wrong.

It seems to affect to performance to use string.format instead.

local ITERATION = 1000000

local function consume(data)
  _G.data = data
end

local function run(name, f)
  collectgarbage('collect')
  collectgarbage('stop')
  local s = os.clock()
  for _ = 0, ITERATION, 1 do
    consume(f())
  end
  local e = os.clock()
  print(('%s'):format(name))
  print(('    %s seconds'):format(e - s))
  print(('    %s kb'):format(collectgarbage('count')))
  collectgarbage('restart')
end


vim.print(('Lua version %s (%s)'):format(_VERSION, type(jit) == 'table' and 'JIT' or 'PUC'))

for i = 1, 10 do
  run(('[%s] format'):format(i), function()
    local a = tostring(math.random())
    local b = tostring(math.random())
    local c = tostring(math.random())
    return string.format('%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s', a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c)
  end)
  run(('[%s] syntax'):format(i), function()
    local a = tostring(math.random())
    local b = tostring(math.random())
    local c = tostring(math.random())
    return (a .. b .. c .. a .. b .. c .. a .. b) .. (c .. (a .. b) .. (c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c) .. a .. b .. c .. (a .. b .. c .. a .. b .. c) .. a .. b .. (c .. a .. b .. c) .. a .. b .. c)
  end)
end
Lua version Lua 5.1 (JIT)
[1] format
    2.233753 seconds
    1088391.6679688 kb
[1] syntax
    3.371306 seconds
    2050993.4609375 kb
[2] format
    1.949485 seconds
    1088394.8554688 kb
[2] syntax
    3.23479 seconds
    2051043.8085938 kb
[3] format
    2.216916 seconds
    1088433.4296875 kb
[3] syntax
    3.398898 seconds
    2051061.4921875 kb
[4] format
    1.848707 seconds
    1088388.6757813 kb
[4] syntax
    3.150517 seconds
    2051042.03125 kb
[5] format
    1.838001 seconds
    1088439.6914063 kb
[5] syntax
    3.136715 seconds
    2051044.0859375 kb
[6] format
    1.865727 seconds
    1088417.75 kb
[6] syntax
    3.187182 seconds
    2051002.7890625 kb
[7] format
    1.915407 seconds
    1088402.2890625 kb
[7] syntax
    3.131942 seconds
    2051061.3242188 kb
[8] format
    1.882645 seconds
    1088402.6679688 kb
[8] syntax
    3.171707 seconds
    2051076.8320313 kb
[9] format
    1.829586 seconds
    1088384.7734375 kb
[9] syntax
    3.060516 seconds
    2050994.234375 kb
[10] format
    1.831612 seconds
    1088394.1328125 kb
[10] syntax
    3.126076 seconds
    2051067.203125 kb

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, do you mean string.format was always faster? What specifically is different in the last run?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference is

local function consume(data)
  _G.data = data -- ← ADD THIS
end

Probably, this disables JIT compilation, which makes it slower for syntax case.

The results show that string.format is consistently faster.
And there's no speed difference in a JIT environment.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had prepared a consume function to properly control JIT compilation, but it didn't seem to work.
I fixed this, so I think the benchmark is probably working properly now.

@hrsh7th
Copy link
Owner

hrsh7th commented Oct 17, 2024

My last review subject is entry.lua, which is a bit complicated. It will take some time.

@hrsh7th hrsh7th merged commit 1a1d7ec into hrsh7th:main Oct 20, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.