-
Notifications
You must be signed in to change notification settings - Fork 10
Scraper for MP office expenses #40
base: master
Are you sure you want to change the base?
Conversation
mptracker/scraper/__init__.py
Outdated
def infoecon(): | ||
from mptracker.scraper.infoecon import EconScraper | ||
econ=EconScraper() | ||
return econ.fetch() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aici zice că nu ai pus newline la end-of-file. Pune te rog.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comenzile nu au ce să returneze. Poți să pui aici print
și mai încolo salvezi în DB.
Scuze că nu ți-am zis de la început, dar codul trebuie să fie conform PEP-8 (style guide-ul de python). |
mptracker/scraper/infoecon.py
Outdated
key=(item.text().encode('utf-8')) | ||
|
||
for sub in table_data: | ||
print (sub) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vezi că nu se lasă spațiu înainte de paranteza cu argumentele funcției. Also, funcția ar trebui să returneze rezultate, nu să le printeze :)
mptracker/scraper/__init__.py
Outdated
from mptracker.scraper.infoecon import EconScraper | ||
econ=EconScraper() | ||
print(econ.fetch()) | ||
''' # circ_elect doar numele , id coleg uninom , chelt pers chelt bun_serv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aici este cod comentat? Mai bine îl ștergi.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, de fapt are sens, așa vei salva lucrurile în baza de date. Ok. Dar nu pot să fac merge la branch-ul tău cu codul ăsta comentat. Fie îl scoți temporar, fie îl faci să meargă până la capăt.
mptracker/scraper/infoecon.py
Outdated
|
||
|
||
class EconScraper(Scraper): | ||
index_url = 'http://www.cdep.ro/pls/parlam/informatii_economice.home' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
în fișierul ăsta, tot codul este indentat la 2 spații, nu e ok. PEP-8 zice 4 spații.
Îmi e destul de greu să urmăresc codul de la scraper. Ai putea să redenumești unele din variabile? De exemplu, aș ghici că |
mptracker/scraper/expenses.py
Outdated
#Tested with return, when finished @ yield self.fetch_section(url) | ||
|
||
def fetch_month(self, section_url): | ||
page_name = (section_url.split('?'))[1].split('&') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nu e nevoie să faci parsing manual de URL. Folosește url_args
din mptracker.scraper.common
, îți dă un MultiDict cu argumentele din query string.
mptracker/scraper/expenses.py
Outdated
|
||
for link in tables_months.items('td > a'): | ||
url_set.add(link.attr('href')) | ||
for url in url_set: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nici aici nu mi-e clar de ce ai folosit url_set
. N-ai putea să chemi fetch_month
în for-ul de mai sus?
#15