UnicodeEncodeError when Downloading Novels from Arabic Sources with Non-ASCII Characters in URLs #2308

LSXAxeller · 2024-03-20T12:31:50Z

Describe the bug

"When attempting to download a novel from an Arabic source containing Arabic characters in the URL, a UnicodeEncodeError is raised with the message 'ascii' codec can't encode characters in position XX-XX: ordinal not in range(128). This error occurs due to the presence of non-ASCII characters in the URL.

Example novel links causing the issue:

Log:

 ! Error: 'ascii' codec can't encode characters in position 28-32: ordinal not in range(128)
<class 'UnicodeEncodeError'>
File "lncrawl\core\scraper.py", line 306, in get_soup
    response = self.get_response(url, **kwargs)
  File "lncrawl\core\scraper.py", line 201, in get_response
    return self.__process_request(
  File "lncrawl\core\scraper.py", line 107, in __process_request
    kwargs["headers"] = {
  File "lncrawl\core\scraper.py", line 108, in <dictcomp>
    str(k).encode("ascii"): str(v).encode("ascii")

This error originates from attempting to encode non-ASCII characters into ASCII during the scraping process."

Let us know

App source: EXE
App version: 3.5.0
Your OS: Windows 11 23H2 22631.2506

The text was updated successfully, but these errors were encountered:

LSXAxeller · 2024-03-20T13:07:35Z

I've resolved the issue by updating the code in lncrawl\core\scraper.py. Specifically, I modified line 108 from:

str(k).encode("ascii"): str(v).encode("ascii")

to:

str(k).encode("utf-8"): str(v).encode("utf-8")

This change ensures that headers are now encoded using UTF-8 instead of ASCII encoding. I'm keeping this issue open for anyone who may encounter a similar error or to investigate further.

zGadli · 2024-03-20T14:31:05Z

I'll make a new PR for this no need to keep this issue open.

zGadli · 2024-03-20T14:50:26Z

Can you test the change with the links? It doesn't work for me with both links.

LSXAxeller · 2024-03-20T15:58:53Z

Can you test the change with the links? It doesn't work for me with both links.

Working fine with me, I forgot to mention that I did the change in the PIP version not exe, since EXE version extracting a new scraper.py on each launch.

C:\Users\RI>python -m lncrawl -s https://kolnovel.com/series/القوس-المحنون-كول/ --format epub
================================================================================
                          [#] Lightnovel Crawler v3.5.0
                  https://github.com/dipu-bd/lightnovel-crawler
--------------------------------------------------------------------------------

-> Press  Ctrl + C  to exit

Retrieving novel info...

[#] القس&المجنون&Kol
24 volumes and 2372 chapters found.
- https://kolnovel.com/series/القوس-المحنون-كول/

? Enter output directory: C:\Users\RI\Lightnovels\Master of Gu - Reverted Insanity
? Which chapters to download? Everything! (2372 chapters)
? 2372 chapters selected Continue
? How many files to generate? Pack everything into a single file
Chapters:   2%|█                                                                   | 37/2372 [00:20<21:06,  1.84item/s]

if the links doesn't work maybe you need vpn, and copy the links directly from issue not after opening in new browser tab since it will redirect to new domain kolnovel.org instead kolnovel.com and ar-novel.com instead arnovel.me.

LSXAxeller added the bug Something isn't working label Mar 20, 2024

LSXAxeller changed the title ~~Fix this bug~~ UnicodeEncodeError when Downloading Novels from Arabic Sources with Non-ASCII Characters in URLs Mar 20, 2024

zGadli mentioned this issue Mar 21, 2024

Arabic source fix #2309

Merged

LSXAxeller closed this as completed Mar 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UnicodeEncodeError when Downloading Novels from Arabic Sources with Non-ASCII Characters in URLs #2308

UnicodeEncodeError when Downloading Novels from Arabic Sources with Non-ASCII Characters in URLs #2308

LSXAxeller commented Mar 20, 2024

LSXAxeller commented Mar 20, 2024

zGadli commented Mar 20, 2024

zGadli commented Mar 20, 2024 •

edited

Loading

LSXAxeller commented Mar 20, 2024

UnicodeEncodeError when Downloading Novels from Arabic Sources with Non-ASCII Characters in URLs #2308

UnicodeEncodeError when Downloading Novels from Arabic Sources with Non-ASCII Characters in URLs #2308

Comments

LSXAxeller commented Mar 20, 2024

Describe the bug

Let us know

LSXAxeller commented Mar 20, 2024

zGadli commented Mar 20, 2024

zGadli commented Mar 20, 2024 • edited Loading

LSXAxeller commented Mar 20, 2024

zGadli commented Mar 20, 2024 •

edited

Loading