Import with folders as tags #199

jwezel · 2022-02-16T13:37:00Z

Provide an option to have folders in bookmark imports serve as tags.

sissbruecker · 2022-02-27T08:27:35Z

bookmarks/services/importer.py

@@ -20,11 +21,11 @@ class ImportResult:
    failed: int = 0


-def import_netscape_html(html: str, user: User):
+def import_netscape_html(html: str, user: User, tags_from_folders: str):


tags_from_folders can just be a boolean, also it would make sense to set a default value IMO.

The form data for checkboxes is "on" (true) / "" (false). Does Django convert it into True/False if it's declared boolean? I'm not that proficient in Django.

No, my concern is more that this part of the application should be free from frontend semantics. It would be enough to convert into a boolean when calling this function: request.POST['tags_from_folders'] == 'on'

sissbruecker · 2022-02-27T09:03:47Z

bookmarks/services/parser.py

@@ -12,18 +14,81 @@ class NetscapeBookmark:
    tag_string: str


+class BookmarkParser(HTMLParser):


Not OK with adding a second parser implementation to the codebase. This should either extend the current parser, or replace the existing one. Kind of like the approach here with HTMLParser, the implementation might be easier to understand than the current one, and it handles the missing closing tags pretty well.

Also this needs tests. Unfortunately there are no existing tests for the parser itself, but test_importer.py could be used to add test cases for converting folders to tags.

OK, going to re-implement the other parser with a subclass of HTMLParser and add a test for it.

Cool - just to make sure, the goal is to have a single parser that can handle both cases. I think your existing parser could be parameterized to either create tags from folders, or not.

sissbruecker · 2022-02-27T09:03:58Z

bookmarks/services/parser.py

+            getattr(self, name)(data)
+
+    def handle_start_dl(self, attrs: Dict[str, str]):
+        print('<DL>')


Remove debug code, or use debug logging

sissbruecker · 2022-02-27T09:05:25Z

bookmarks/services/parser.py

@@ -53,7 +118,7 @@ def extract_description(tag):
 bookmark_tag.addParseAction(extract_bookmark)


-def parse(html: str) -> [NetscapeBookmark]:
+def parse(html: str) -> List[NetscapeBookmark]:


Out of curiousity, why use List[...] rather than [...]?

To my understanding [...] is incorrect. The brackets are not type annotations like in C/C++ (int vec[] being an array of ints) but sort of a parenthesis for specifying generic types, like in a C++ template specifier (template MyType<class T>) where angle brackets serve the same purpose.

sissbruecker · 2022-02-28T07:16:00Z

bookmarks/services/importer.py

@@ -20,11 +21,11 @@ class ImportResult:
    failed: int = 0


-def import_netscape_html(html: str, user: User):
+def import_netscape_html(html: str, user: User, tags_from_folders: str):


No, my concern is more that this part of the application should be free from frontend semantics. It would be enough to convert into a boolean when calling this function: request.POST['tags_from_folders'] == 'on'

sissbruecker · 2022-02-28T07:20:04Z

bookmarks/services/parser.py

@@ -12,18 +14,81 @@ class NetscapeBookmark:
    tag_string: str


+class BookmarkParser(HTMLParser):


Cool - just to make sure, the goal is to have a single parser that can handle both cases. I think your existing parser could be parameterized to either create tags from folders, or not.

sissbruecker · 2022-02-28T07:22:26Z

bookmarks/services/parser.py

+            title=data,
+            description=self.description,
+            date_added=self.add_date,
+            tag_string=','.join(self.tag_stack[:-1]),


I think what's missing here is also applying the tags from the anchor tag itself, currently the explicit tags on the anchor tag are ignored and only the tags created from folders are added.

sissbruecker · 2022-02-28T07:32:24Z

bookmarks/services/parser.py

+
+    def handle_start_dl(self, attrs: Dict[str, str]):
+        print('<DL>')
+        self.tag_stack.append(None)


This part, and the related logic in handle_h3_data and handle_a_data seems a bit convoluted. I think it might be easier to just add a tag to the stack when encountering an <h3>, and then to remove a tag from the stack when encountering and </dl> - unless the stack is already empty.

sissbruecker · 2022-07-03T04:45:30Z

This has been requested before, and I can see this being useful, so I might still work on this at some point in the future. The parser proposed in this PR has already been adopted in #261.

Import with folders as tags

94ccb08

sissbruecker requested changes Feb 27, 2022

View reviewed changes

sissbruecker reviewed Feb 28, 2022

View reviewed changes

sissbruecker mentioned this pull request May 14, 2022

Faster importing of "bookmarks.html" using regex? #213

Closed

sissbruecker mentioned this pull request May 21, 2022

Improve import performance #261

Merged

jwezel closed this by deleting the head repository Apr 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Import with folders as tags #199

Import with folders as tags #199

jwezel commented Feb 16, 2022

sissbruecker Feb 27, 2022

jwezel Feb 27, 2022

sissbruecker Feb 28, 2022

sissbruecker Feb 27, 2022

jwezel Feb 27, 2022 •

edited

Loading

sissbruecker Feb 28, 2022

sissbruecker Feb 27, 2022

sissbruecker Feb 27, 2022

jwezel Feb 27, 2022 •

edited

Loading

sissbruecker Feb 28, 2022

sissbruecker Feb 28, 2022

sissbruecker Feb 28, 2022

sissbruecker Feb 28, 2022

sissbruecker commented Jul 3, 2022

		@@ -12,18 +14,81 @@ class NetscapeBookmark:
		tag_string: str


		class BookmarkParser(HTMLParser):

Import with folders as tags #199

Import with folders as tags #199

Conversation

jwezel commented Feb 16, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jwezel Feb 27, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jwezel Feb 27, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sissbruecker commented Jul 3, 2022

jwezel Feb 27, 2022 •

edited

Loading

jwezel Feb 27, 2022 •

edited

Loading