Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Match on module aliases for auto import suggestions #730

Merged
merged 14 commits into from
Jan 30, 2024
6 changes: 6 additions & 0 deletions rope/contrib/autoimport/defs.py
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,12 @@ class Package(NamedTuple):
type: PackageType


class Alias(NamedTuple):
"""A module alias to be added to the database."""
alias: str
modname: str


class Name(NamedTuple):
"""A Name to be added to the database."""

Expand Down
21 changes: 21 additions & 0 deletions rope/contrib/autoimport/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,27 @@ class Metadata(Model):
objects = Query(table_name, columns)


class Alias(Model):
table_name = "aliases"
schema = {
"alias": "TEXT",
"module": "TEXT",
}
columns = list(schema.keys())
objects = Query(table_name, columns)

@classmethod
def create_table(cls, connection):
super().create_table(connection)
connection.execute("CREATE INDEX IF NOT EXISTS alias ON aliases(alias)")

modules = Query(
"(SELECT DISTINCT aliases.*, package, source, type FROM aliases INNER JOIN names on aliases.module = names.module)",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a DB expert, but the names table can comprise 10,000 - 100,000 rows, so I am wondering if we should run this inner join on every autoimport request (which can happen with every keystroke when rope is run inside of a language server).
Can we quickly test how much adding alias support slows down search?
Alternatively, I'd make sure aliases only contains the aliases to modules that exist in names

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The joins are pretty fast, I made a notebook to test it out.

The join time should be dominated by the Alias table, not the Names table, because the Names table has an index on the module column. Also, here we're including a where clause which makes the left side of the join even smaller. Most DB engines are pretty good about pushing down the filter past the join and sqlite3 seems to handle it well.

I thought about this a little bit before testing out this implementations I see 3 main paths forward:

  • The join approach
  • Materialize the availability information in the Aliases table as a column, we'd need to be careful to always update the Aliases table whenever updating the cache. This would probably be the fastest approach, but more work.
  • Keep the aliases in memory as a list or dict. We'd basically be implementing the join logic manually, but it might be really fast if the # of Aliases is very small. Then again if the # of Aliases is very slow the join should also be very fast.

@tkrabel what do you think?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current approach has the benefit that we never have to do any updates on the aliases tables. The names table is the source of truth of that is available to the user.
If you're happy with the performance, then let's go with the current approach.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MrBago thanks for doing testing the performance notebook, the notebook brings up something that is interesting/surprising to me, in that the module search_by_name_like query is much slower than what I was expecting. A prefix search using an index should not have been that slow.

That is an unrelated issue from this PR though, so I've created another ticket for that #736, but with the fixed index the Alias query should hopefully become faster as well. 883ms for an inner join between a large table and a very small table doesn't smell right to me that seems to indicate a full table scan as well.

I'll see if I can fix this tomorrow, but in the meantime, apologies but I'll be holding off on merging this PR yet until that is fixed and then we can see the new performance impact.

columns + ["package", "source", "type"],
)
search_modules_with_alias = modules.where("alias LIKE (?)")


class Name(Model):
table_name = "names"
schema = {
Expand Down
13 changes: 13 additions & 0 deletions rope/contrib/autoimport/sqlite.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
from rope.contrib.autoimport import models
from rope.contrib.autoimport.defs import (
ModuleFile,
Alias,
Name,
NameType,
Package,
Expand Down Expand Up @@ -293,6 +294,12 @@ def _search_module(
yield SearchResult(
f"import {module}", module, source, NameType.Module.value
)
for alias, module, source in self._execute(
models.Alias.search_modules_with_alias.select("alias", "module", "source"), (name,)
):
yield SearchResult(
f"import {module} as {alias}", alias, source, NameType.Module.value
)

def get_modules(self, name) -> List[str]:
"""Get the list of modules that have global `name`."""
Expand Down Expand Up @@ -434,9 +441,11 @@ def clear_cache(self):
"""
with self.connection:
self._execute(models.Name.objects.drop_table())
self._execute(models.Alias.objects.drop_table())
self._execute(models.Package.objects.drop_table())
self._execute(models.Metadata.objects.drop_table())
models.Name.create_table(self.connection)
models.Alias.create_table(self.connection)
models.Package.create_table(self.connection)
models.Metadata.create_table(self.connection)
data = (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if I understand this correctly, this will add the aliases into the database only when the database is created. IIUC, this would need to depend on the database being re-created when preference changes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I didn't look at the different ways that prefs can change, we could add a method to clear the aliases table and reset it and invoke that when the prefs are updated.

Expand Down Expand Up @@ -548,6 +557,10 @@ def _convert_name(name: Name) -> tuple:
name.source.value,
name.name_type.value,
)

def add_aliases(self, aliases: Iterable[Alias]):
if aliases is not None:
self._executemany(models.Alias.objects.insert_into(), aliases)

def _add_names(self, names: Iterable[Name]):
if names is not None:
Expand Down
16 changes: 16 additions & 0 deletions ropetest/contrib/autoimporttest.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import unittest

from rope.contrib.autoimport import sqlite as autoimport
from rope.contrib.autoimport.defs import Alias
from ropetest import testutils


Expand Down Expand Up @@ -124,6 +125,21 @@ def test_search_module(self):
self.assertIn(import_statement, self.importer.search("os"))
self.assertIn(import_statement, self.importer.search("o"))

def test_search_alias(self):
self.mod2.write("myvar = None\n")
self.importer.update_resource(self.mod2)
self.importer.add_aliases([
("noMatch", "does_not_exists_this"),
("hasMatch", "pkg.mod2"),
])

self.assertEqual([], self.importer.search("noMatch", exact_match=True))

import_statement = ("import pkg.mod2 as hasMatch", "hasMatch")
self.assertIn(import_statement, self.importer.search("hasMatch", exact_match=True))
self.assertIn(import_statement, self.importer.search("hasM"))
self.assertIn(import_statement, self.importer.search("h"))

def test_search(self):
self.importer.update_module("typing")
import_statement = ("from typing import Dict", "Dict")
Expand Down