Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feed sensor return empty state for valid rss #112

Closed
juliomatcom opened this issue Dec 10, 2023 · 4 comments · Fixed by #115
Closed

Feed sensor return empty state for valid rss #112

juliomatcom opened this issue Dec 10, 2023 · 4 comments · Fixed by #115
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@juliomatcom
Copy link

juliomatcom commented Dec 10, 2023

Hi all, I'm using the next configuration but no entries are retrieved

sensor:
  - platform: feedparser
    name: elcomercio
    feed_url: 'https://www.elcomercio.es/rss/2.0/?section=gijon'
    show_topn: 20
    scan_interval:
      hours: 1

screenshot-hass lan-2023 12 10-11_59_29

It does work with the example provided so I guess it should be related to the xml format or the response somehow.

@ogajduse
Copy link
Collaborator

The feed that you are using seems to be valid.

I see that the upstream feedparser library does not resolve the extra HTTP GET parameters that follow after ?.

$ python
Python 3.11.5 (main, Aug 28 2023, 00:00:00) [GCC 13.2.1 20230728 (Red Hat 13.2.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import feedparser
>>> parsed_feed = feedparser.parse('https://www.elcomercio.es/rss/2.0/?section=gijon')
>>> parsed_feed
{'bozo': 1, 'entries': [], 'feed': {'summary': '<h1>Access Denied</h1>\n \nYou don\'t have permission to access "http://www.elcomercio.es/rss/2.0/?" on this server.<p>\nReference #18.bf361060.1702380172.cd132963'}, 'headers': {'server': 'AkamaiGHost', 'content-length': '292', 'content-type': 'text/html', 'mime-version': '1.0', 'vary': 'User-Agent,Cookie,Accept-Encoding', 'alt-svc': 'h3=":443"; ma=93600', 'expires': 'Tue, 12 Dec 2023 11:22:52 GMT', 'cache-control': 'max-age=0, no-cache', 'pragma': 'no-cache', 'date': 'Tue, 12 Dec 2023 11:22:52 GMT', 'connection': 'close'}, 'href': 'https://www.elcomercio.es/rss/2.0/?section=gijon', 'status': 403, 'encoding': 'us-ascii', 'bozo_exception': SAXParseException('mismatched tag'), 'version': '', 'namespaces': {}}
>>> parsed_feed.status
403

kurtmckee/feedparser#385 describes the same issue. We could use requests library to put extra headers to the HTTP request that elcomercio RSS feed requires.

>>> import feedparser
>>> import requests
>>> response = requests.get("https://www.elcomercio.es/rss/2.0/?section=gijon", headers={"User-Agent": "someagent"})
>>> response.ok
True
>>> response.text
'<?xml version="1.0" encoding="UTF-8"?>\n<rss xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0">\n  <channel>\n    <atom:link href="https://www.elcomercio.es/rss/2.0/?section=gijon"  ... output ommited ...'
>>> parsed_feed = feedparser.parse(response.text)
>>> len(parsed_feed.entries)
100

I can take it and fix it. That should be a simple fix.

@ogajduse
Copy link
Collaborator

@juliomatcom #115 should fix the issue you are seeing. I would be glad if you could test it and confirm that it fixes the issue for you.

However, this specific feed does not provide the full URL to the image, so you will not be able to render an image in your Lovelace.

image

@ogajduse ogajduse self-assigned this Dec 12, 2023
@ogajduse ogajduse added the enhancement New feature or request label Dec 12, 2023
@ogajduse ogajduse added this to the Release 0.2.0 milestone Dec 12, 2023
@juliomatcom
Copy link
Author

juliomatcom commented Dec 12, 2023

Hi @ogajduse, thank you for taking a look, I cloned your repo and changed to the feat/add-http-headers-to-request branch and still no data in state from https://www.elcomercio.es/rss/2.0/?section=gijon, is this how I should test this ? I don't have any experience debugging HA nor Python.

@ogajduse
Copy link
Collaborator

@juliomatcom I have merged #115 into master and released https://github.com/custom-components/feedparser/releases/tag/0.2.0b6. That should allow you to install the beta release directly from HACS. Check HACS docs on how to install it. I am still interested in your feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants