Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import with folders as tags #199

Closed
wants to merge 1 commit into from
Closed

Import with folders as tags #199

wants to merge 1 commit into from

Conversation

jwezel
Copy link

@jwezel jwezel commented Feb 16, 2022

Provide an option to have folders in bookmark imports serve as tags.

@@ -20,11 +21,11 @@ class ImportResult:
failed: int = 0


def import_netscape_html(html: str, user: User):
def import_netscape_html(html: str, user: User, tags_from_folders: str):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tags_from_folders can just be a boolean, also it would make sense to set a default value IMO.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The form data for checkboxes is "on" (true) / "" (false). Does Django convert it into True/False if it's declared boolean? I'm not that proficient in Django.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, my concern is more that this part of the application should be free from frontend semantics. It would be enough to convert into a boolean when calling this function: request.POST['tags_from_folders'] == 'on'

@@ -12,18 +14,81 @@ class NetscapeBookmark:
tag_string: str


class BookmarkParser(HTMLParser):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not OK with adding a second parser implementation to the codebase. This should either extend the current parser, or replace the existing one. Kind of like the approach here with HTMLParser, the implementation might be easier to understand than the current one, and it handles the missing closing tags pretty well.

Also this needs tests. Unfortunately there are no existing tests for the parser itself, but test_importer.py could be used to add test cases for converting folders to tags.

Copy link
Author

@jwezel jwezel Feb 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, going to re-implement the other parser with a subclass of HTMLParser and add a test for it.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool - just to make sure, the goal is to have a single parser that can handle both cases. I think your existing parser could be parameterized to either create tags from folders, or not.

getattr(self, name)(data)

def handle_start_dl(self, attrs: Dict[str, str]):
print('<DL>')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove debug code, or use debug logging

@@ -53,7 +118,7 @@ def extract_description(tag):
bookmark_tag.addParseAction(extract_bookmark)


def parse(html: str) -> [NetscapeBookmark]:
def parse(html: str) -> List[NetscapeBookmark]:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiousity, why use List[...] rather than [...]?

Copy link
Author

@jwezel jwezel Feb 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To my understanding [...] is incorrect. The brackets are not type annotations like in C/C++ (int vec[] being an array of ints) but sort of a parenthesis for specifying generic types, like in a C++ template specifier (template MyType<class T>) where angle brackets serve the same purpose.

@@ -20,11 +21,11 @@ class ImportResult:
failed: int = 0


def import_netscape_html(html: str, user: User):
def import_netscape_html(html: str, user: User, tags_from_folders: str):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, my concern is more that this part of the application should be free from frontend semantics. It would be enough to convert into a boolean when calling this function: request.POST['tags_from_folders'] == 'on'

@@ -12,18 +14,81 @@ class NetscapeBookmark:
tag_string: str


class BookmarkParser(HTMLParser):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool - just to make sure, the goal is to have a single parser that can handle both cases. I think your existing parser could be parameterized to either create tags from folders, or not.

title=data,
description=self.description,
date_added=self.add_date,
tag_string=','.join(self.tag_stack[:-1]),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what's missing here is also applying the tags from the anchor tag itself, currently the explicit tags on the anchor tag are ignored and only the tags created from folders are added.


def handle_start_dl(self, attrs: Dict[str, str]):
print('<DL>')
self.tag_stack.append(None)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part, and the related logic in handle_h3_data and handle_a_data seems a bit convoluted. I think it might be easier to just add a tag to the stack when encountering an <h3>, and then to remove a tag from the stack when encountering and </dl> - unless the stack is already empty.

@sissbruecker
Copy link
Owner

This has been requested before, and I can see this being useful, so I might still work on this at some point in the future. The parser proposed in this PR has already been adopted in #261.

@jwezel jwezel closed this by deleting the head repository Apr 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants