Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix HDSpace pubdate parsing #5111

Merged
merged 13 commits into from
Sep 4, 2018
15 changes: 14 additions & 1 deletion medusa/providers/generic_provider.py
Original file line number Diff line number Diff line change
Expand Up @@ -574,12 +574,25 @@ def parse_pubdate(pubdate, human_time=False, timezone=None, **kwargs):
matched_time = int(round(float(matched_time.strip())))

seconds = parse('{0} {1}'.format(matched_time, matched_granularity))
if seconds is None:
log.warning('Failed parsing human time: {0} {1}', matched_time, matched_granularity)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the warning is needed since it's already going to cause an exception.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah but up until now we often had to guess why

Copy link
Contributor

@sharkykh sharkykh Sep 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but now the exception includes the details you added to the warning log, so you don't really need the log?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exeception details are not logged down the stack.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see now. Sorry I missed that.

raise ValueError('Failed parsing human time: {0} {1}'.format(matched_time, matched_granularity))
return datetime.now(tz.tzlocal()) - timedelta(seconds=seconds)

if fromtimestamp:
dt = datetime.fromtimestamp(int(pubdate), tz=tz.gettz('UTC'))
else:
dt = parser.parse(pubdate, dayfirst=df, yearfirst=yf, fuzzy=True)
from tzlocal import get_localzone
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to remove the import?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah

Copy link
Contributor

@sharkykh sharkykh Sep 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@p0psicles You marked as resolved but the import is still there
Edit: Marked as unresolved

day_offset = 0
if 'yesterday at' in pubdate.lower() or 'today at' in pubdate.lower():
# Extract a time
time = re.search(r'(?P<time>[0-9:]+)', pubdate)
if time:
if 'yesterday' in pubdate:
day_offset = 1
pubdate = time.group('time')

dt = parser.parse(pubdate.strip(), dayfirst=df, yearfirst=yf, fuzzy=True) - timedelta(days=day_offset)

# Always make UTC aware if naive
if dt.tzinfo is None or dt.tzinfo.utcoffset(dt) is None:
Expand Down
9 changes: 8 additions & 1 deletion medusa/providers/torrent/html/hdspace.py
Original file line number Diff line number Diff line change
Expand Up @@ -155,7 +155,14 @@ def parse(self, data, mode):
torrent_size = row.find('td', class_='lista222', attrs={'width': '100%'}).get_text()
size = convert_size(torrent_size) or -1

pubdate_raw = row.find_all('td', class_='lista', attrs={'align': 'center'})[3].get_text()
pubdate_td = row.find_all('td', class_='lista', attrs={'align': 'center'})[3]
pubdate_human_offset = pubdate_td.find('b')
if pubdate_human_offset:
time_search = re.search('([0-9:]+)', pubdate_td.get_text())
pubdate_raw = pubdate_human_offset.get_text() + ' at ' + time_search.group(1)
else:
pubdate_raw = pubdate_td.get_text()

pubdate = self.parse_pubdate(pubdate_raw)

item = {
Expand Down
12 changes: 11 additions & 1 deletion tests/providers/test_generic_provider.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"""Provider test code for Generic Provider."""
from __future__ import unicode_literals

from datetime import date, datetime
from datetime import date, datetime, timedelta

from dateutil import tz

Expand Down Expand Up @@ -127,6 +127,16 @@
'timezone': 'US/Eastern',
'fromtimestamp': True
},
{ # p22: hd-space test human date like yesterdat at 12:00:00
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: yesterday

'pubdate': 'yesterday at {0}'.format((datetime.now() - timedelta(minutes=10, seconds=25)).strftime('%H:%M:%S')),
'expected': datetime.now().replace(microsecond=0, tzinfo=tz.gettz('UTC')) - timedelta(days=1, minutes=10, seconds=25),
'human_time': False
},
{ # p22: hd-space test human date like today at 12:00:00
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be # p23

'pubdate': 'today at {0}'.format((datetime.now() - timedelta(minutes=10, seconds=25)).strftime('%H:%M:%S')),
'expected': datetime.now().replace(microsecond=0, tzinfo=tz.gettz('UTC')) - timedelta(days=0, minutes=10, seconds=25),
'human_time': False
},
])
def test_parse_pubdate(p):
# Given
Expand Down