-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WhoScored] Ignore cached events file if empty #420
Conversation
The problem only exists for the WhoScored scraper but your solution affects all scrapers. For some scrapers, an empty reply might actually be a valid answer. For example, if a new team is promoted, an empty result is expected in the ClubElo scraper. Moreover, the bash script checks the file size simply because that was easy to write as a bash command. What it really should check is whether the file contains an empty JSON object. Something that could easily be done in Python. |
Yes, you're right. I'll think about how it can be implemented in a different way. |
@probberechts reader = self.get(
url,
filepath,
var="requirejs.s.contexts._.config.config.params.args.matchCentreData",
no_cache=live,
)
if reader.read(4) == b'null':
reader = self.get(
url,
filepath,
var="requirejs.s.contexts._.config.config.params.args.matchCentreData",
no_cache=True,
)
reader.seek(0)
json_data = json.load(reader) |
Nice solution! Thanks. |
Hello, @probberechts.
I propose a solution to the problem of empty files in the cache for Whoscored.
In issue 98 you suggest delete empty file with bash command by file size.
I made method
_size_file
which does same withPath.stat().st_size
. If the file is smaller than threshold, we believe that it is not cached