Skip to content

🎬A simple scraper to capture text from posts of Hong Kong forum LIHKG , which is a good starting point for who interested in web scraping

License

Notifications You must be signed in to change notification settings

papatekken/simple-LIHKG-scraper-with-python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

simple LIHKG scraper with python

Version license Python selenium

About

A simple scraper to capture text from Hong Kong forum LIHKG posts, which is a good starting point for who interested in web scraping, It was developed in Python, by using the library [Selenium]

Installation

  1. Setup Python and GIT in runtime environment

  2. Install library [selenium] (https://pypi.org/project/selenium/)

  3. Clone the repository

    git clone https://github.com/papatekken/simple-LIHKG-scraper-with-python LIHKG-scraper
    
  4. In root directory of 'LIHKG-scraper', run following command to start the application, when the application finished the run, a new text file is created with capture data .

    the program is expecting the post ID as argument

    e.g. post ID = 1996060

    python hkg.py 1996060
    

License

MIT

Contact

Created by @papatekken - feel free to contact me!

About

🎬A simple scraper to capture text from posts of Hong Kong forum LIHKG , which is a good starting point for who interested in web scraping

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages