Skip to content

1ycx/m.wuxiaworld.co-Scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Note :

This project will be dropped because wuxiaworld.co has released their Android app. Thus there is no meaning to this project.

Note 1:

Because I don't know which version I used during development andthe latest version of downloaded modules, you may find un expected problems. In the meanwhile, try using my other scraper for Official WuxiaWorld.

About:

Python Script To Copy m.wuxiaworld.co Chapters Into EPUB File.

Ask Me, Why This Website? Well, It Has Novels From Webnovel(Qidan) & WuxiaWorld With All Latest Chapters Unlocked.

No Spirit Stones, No Patreon, No Subscription Or Any Of Those Things Required To Read The Latest Chapters! Don't Take My Word For It ? Check It Out.

How Does The Script Work ? Just Enter The Novel URL Inside The Script And You're Done!

I'll Try To Add Any Necessary Updates.


Problem(s) :

  • None Yet(Report if any).

Sample :

kogam22@home:~/code$ python3 code.py

Novel URL Set

Name :  The Magus Era

Total No. Of Chapters =  1792

---------------------------------------------------
Enter 1 - To Download All Chapters
Enter 2 - To Download A Part, Like 1-100 Or 400-650
Enter 3 - To View Chapter Titles Before Download

Enter Your Choice : 2
---------------------------------------------------

===================================================
**Note : "First Chapter" Starts From "1"
         "Last Chapter" Ends At "1792"

Enter First Chapter : 1
Enter Last Chapter  : 10
===================================================
Parsed Chapter : Prologue
Parsed Chapter : Chapter 1 - Hunter
Parsed Chapter : Chapter 2 - Malice
Parsed Chapter : Chapter 3 - Challenge
Parsed Chapter : Chapter 4 - Deal
Parsed Chapter : Chapter 5 - Gain
Parsed Chapter : Chapter 6 - Parents
Parsed Chapter : Chapter 7 - Defiance
Parsed Chapter : Chapter 8 - Different Races
Parsed Chapter : Chapter 9 - Calculation
Created "About Novel" Page
Saving . . .
Saved at /home/kogam22/code as "The Magus Era_0_9.epub"
kogam22@home:~/code$

Documentation :

  1. Download The Python Script And Unzip It.

  2. For Beginners, After Setting Up A Working Python(>=3.6) Environment(Along With Latest pip), You Need To Install Some Packages. To Install, Open CMD/Terminal & Navigate To The Folder Where You Unzipped This Script & Run This Command :

    • pip install -r requirements.txt OR pip3 install -r requirements.txt
  3. Optional : Open The Script With A Text Editor And Read The Details Inside(To Understand What Actually Happens).

  4. In Case The Script Was Not Updated According To The Changes In Website, You Might Refer The BeautifulSoup Docs To Make Changes Accordingly.

  5. To Run, Open CMD/Terminal, Navigate To The Unzip Location And Type :

    • Linux - python3 code.py
    • Windows - python code.py or py code.py
  6. EPUB File Will Be Saved At The Location Of Script.

Working :

Parsing :

html5lib Is Used Because Although Being Tiny Winy Bit Slow, It Generates Valid HTML. You May Compare Others Here, Differences Between Parsers. I've Copied The Table From BS4 Website Below To Give A Faint Overview.

Parser Typical usage Advantages Disadvantages
Python’s html.parser BeautifulSoup(markup, "html.parser")
  • Batteries included
  • Decent speed
  • Lenient (as of Python 2.7.3 and 3.2.)
  • Not very lenient (before Python 2.7.3 or 3.2.2)
lxml’s HTML parser BeautifulSoup(markup, "lxml")
  • Very fast
  • Lenient
  • External C dependency
lxml’s XML parser BeautifulSoup(markup, "lxml-xml") BeautifulSoup(markup, "xml")
  • Very fast
  • The only currently supported XML parser
  • External C dependency
html5lib BeautifulSoup(markup, "html5lib")
  • Extremely lenient
  • Parses pages the same way a web browser does
  • Creates valid HTML5
  • Very slow
  • External Python dependency

If Any Problem Occurs With html5lib :

  • In Case You Update It Accidentally, You Can Reinstall The Specific Version By Checking The Details For Beginners.
  • Another Choice, Change html5lib To lxml - If Installed, Otherwise To Python's Inbuilt html.parser .

License

Copyright © 2018 Kogam22. Released under the terms of the Apache 2.0 license.

Releases

No releases published

Packages

No packages published

Languages