-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Important dates scraper #52
Comments
i'd love to give the utm scraper a try, is there anything I should know/read about before I start? |
@anderson202 yes please! Give it a go and if you have any questions, we can answer them. I have a very basic wiki here with information: https://github.com/cobalt-uoft/uoft-scrapers/wiki but it really isn't a lot. Have a look around at other scrapers to see whats up. For this one, We can also discuss the schema format we want to go with. Any ideas? |
@qasim I'm definitely a newbie to this so I'm not too sure how the format should be like. Basic info we need would be the date and the detailed information regarding the day. Maybe we can list which academic session the date falls in as well. A quick question, how should the scraper function? Should it scrape everything it can for upcoming dates, scrape only a specific session or a specific date? |
+1 on including the session, I'm thinking something like: {
"date":String,
"session":String,
"events":[String]
} It looks like the UTM mobile site has links to two years worth of data. I think the scraper can take a For example (
Edit: Looks like they actually have data since the 2010-11 school year - http://m.utm.utoronto.ca/importantDates.php?mode=full&session=20105 |
Wow I didn't even think of using the mobile site. It's so much cleaner. I'll start working on it and see if I can contribute to this. Thanks. Edit: For example, would this work? |
I'll take the |
@anderson202 That's actually what we want! Take a look at the athletics and shuttle scrapers, they work the same way. I got started on the UTSG scraper and I found it might be better to use the following format instead: "date":String,
"session":String,
"events":[{
"end_date"String, // some go on for more than a single day (i.e. winter break)
"campus":String,
"description":String
}] This will allow us to merge events across campuses for each date, like we do with the athletics scraper (take a look at this). The API ends up being a lot cleaner this way. |
I think I have the UTM scraper done. But I'm not sure how the JSON files should be named. The ones I have currently is simply the date (or period) of the event as shown on the mobile site. Should I change it to a specific format before making a pull request? |
We should scrape the important dates info off of places like the Faculty of Arts & Science or UTM websites.
EDIT: This is a better list
The text was updated successfully, but these errors were encountered: