No login parse #885

martig7 · 2024-08-23T04:32:51Z

Issue

closes #861. Scrapes SIS and catalog.rpi.edu using beautiful soup and selenium. The full scrape should take ~15 mins depending on how many browser instances you give it. It should be extremely accurate with prerequisites, corequisites, and descriptions now too. Also gets extra information from professor Goldschmidt's website.

Test Procedure

Run no_login.py. Change the parameters in name == "main" to change the term.

…off those codes.

…ormation files.

…or data to our json format.

… just have to output as json.

… of large edits and additions to make it work with my previous code and to maximize performance.

becausej · 2024-11-01T20:23:21Z

Reviewed.

dorian451

seems to work

martig7 and others added 15 commits March 1, 2024 17:49

Wrote functions to find all course codes and to generate links based …

97388f2

…off those codes.

Made a lot of progress but forgot to commit.

82ff8eb

end of session commit

89d2564

Wrote some formatting for the scraped data.

0780f3b

First draft of no login scraper works (mostly)

e3a4ede

Started work on a script to scrape Professor Goldschmidt's course inf…

0a89816

…ormation files.

Saving the work for today. Started work on a sctipt to scrape profess…

8b6f52a

…or data to our json format.

Got all of the professor links.

190524b

Committing my work from Friday. Finished scraping all professor data,…

61ef4c0

… just have to output as json.

Finished up the faculty scrape, finished up Goldschmidt parse.

73fe3cd

Commit to work off on desktop

a92e7bd

Changed file locations and started catalog parser.

b6a40f0

Pulling over some work I did on HASSPathways over to YACS, with a lot…

4a4ca4e

… of large edits and additions to make it work with my previous code and to maximize performance.

Commented my scraper functions.

15d0ae1

oops.

80d27cd

martig7 requested review from dorian451 and bnavac August 23, 2024 04:33

martig7 self-assigned this Sep 16, 2024

martig7 added Review Ready! Inform team that PR is ready for review python Pull requests that update Python code Priority 2 Important Issue Priority 1 Critical Issue and removed Priority 2 Important Issue labels Sep 16, 2024

dorian451 approved these changes Nov 5, 2024

View reviewed changes

dorian451 merged commit 5153051 into YACS-RCOS:master Nov 5, 2024
1 of 2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No login parse #885

No login parse #885

martig7 commented Aug 23, 2024

becausej commented Nov 1, 2024

dorian451 left a comment

No login parse #885

No login parse #885

Conversation

martig7 commented Aug 23, 2024

becausej commented Nov 1, 2024

dorian451 left a comment

Choose a reason for hiding this comment