Newlines lead to ugly output #36

mirabilos · 2022-11-06T09:50:13Z

See https://toot.mirbsd.org/@mirabilos/statuses/01GH65F7V9ZK7KQ6YRG3PF0AR2

newlines are converted to hard linebreaks
some spaces are lost (jschauma!Consequently)

Input feed: http://www.mirbsd.org/wlog.rss

The text was updated successfully, but these errors were encountered:

mirabilos · 2023-01-07T15:59:32Z

Possible fix follows, although this is

my complete diff for Feediverse, i.e. with…
- KeyError: 'link' #34
- gotosocial compatibility #35
- Posts not public and cut off badly #37
  … all applied as well
https://github.com/matthewwithanm/python-markdownify cloned next to Feediverse (as it’s not packaged for Debian yet)
- and patched with autolink when Goodreads breaks the URL matthewwithanm/python-markdownify#82 to fix links in Goodreads’ RSS feed
- symlinked so I can import it into Feediverse
quite some cleanup(text) going on still
- newlines/whitespace fixup
- input is HTML-ish so drop input newlines as GotoSocial’s Markdown-ish keeps them even without the double-space-before-linefeed thing (because it’s supposed to be plaintext-compatibl-ish)
  - though Switch markdown from blackfriday to goldmark superseriousbusiness/gotosocial#1267 (not yet released) changed their parser so this needs reëvaluation then
- markdownify output also needs postprocessing sigh…

diff --git a/feediverse.py b/feediverse.py
index cee0078..7161182 100755
--- a/feediverse.py
+++ b/feediverse.py
@@ -11,6 +11,8 @@ import feedparser
 from bs4 import BeautifulSoup
 from mastodon import Mastodon
 from datetime import datetime, timezone, MINYEAR
+# with https://github.com/matthewwithanm/python-markdownify/issues/82 applied
+from markdownify.markdownify import markdownify
 
 DEFAULT_CONFIG_FILE = os.path.join("~", ".feediverse")
 
@@ -37,6 +39,7 @@ def main():
     config = read_config(config_file)
 
     masto = Mastodon(
+        version_check_mode="none",
         api_base_url=config['url'],
         client_id=config['client_id'],
         client_secret=config['client_secret'],
@@ -50,11 +53,15 @@ def main():
         for entry in get_feed(feed['url'], config['updated']):
             newest_post = max(newest_post, entry['updated'])
             if args.verbose:
-                print(entry)
+                print("‣‣‣ entry {{{", entry, "}}}")
+            postbody = feed['template'].format(**entry)
             if args.dry_run:
-                print("trial run, not tooting ", entry["title"][:50])
+                print("trial run, not tooting {{{", postbody, "}}}")
                 continue
-            masto.status_post(feed['template'].format(**entry)[:499])
+            if len(postbody) > 500:
+                postfix = "…\n\n(more…)"
+                postbody = postbody[:(500 - len(postfix))] + postfix
+            masto.status_post(postbody, visibility='public')
 
     if not args.dry_run:
         config['updated'] = newest_post.isoformat()
@@ -83,7 +90,7 @@ def get_entry(entry):
     url = entry.id
     return {
         'url': url,
-        'link': entry.link,
+        'link': entry.get('link', ''),
         'title': cleanup(entry.title),
         'summary': cleanup(summary),
         'content': content,
@@ -92,6 +99,15 @@ def get_entry(entry):
     }
 
 def cleanup(text):
+    text = re.sub('\r+\n?', '\n', text)
+    text = re.sub(' *\n *', '\n', text)
+    text = re.sub('\n\n\n+', '\n\n', text, flags=re.M)
+    text = re.sub('\n+ *<', ' <', text)
+    text = markdownify(text)
+    text = re.sub('  \n  \n', '\n\n', text)
+    text = re.sub(' *\n\n+', '\n\n', text)
+    return text
+    # old HTML to plaintext output:
     html = BeautifulSoup(text, 'html.parser')
     text = html.get_text()
     text = re.sub('\xa0+', ' ', text)
diff --git a/markdownify b/markdownify
new file mode 120000
index 0000000..deec112
--- /dev/null
+++ b/markdownify
@@ -0,0 +1 @@
+../python-markdownify
\ No newline at end of file

mirabilos · 2023-01-28T16:44:37Z

Autumn! dixit:

these fix notes don't seem to relate to GtS - is this issue still relevant here?

It’s relevant for all instances that can post Markdown. bye, //mirabilos -- "Using Lynx is like wearing a really good pair of shades: cuts out the glare and harmful UV (ultra-vanity), and you feel so-o-o COOL." -- Henry Nelson, March 1999

mirabilos · 2023-02-01T17:57:14Z

followup fix:

handle embedded single newlines better (translate to space everywhere, not just before tag)
drop inline images, they won’t work in a fediverse status anyway

--- a/feediverse.py
+++ b/feediverse.py
@@ -101,9 +101,10 @@ def get_entry(entry):
 def cleanup(text):
     text = re.sub('\r+\n?', '\n', text)
     text = re.sub(' *\n *', '\n', text)
-    text = re.sub('\n\n\n+', '\n\n', text, flags=re.M)
-    text = re.sub('\n+ *<', ' <', text)
-    text = markdownify(text)
+    text = text.replace('\n', '\1')
+    text = re.sub('\1\1\1+', '\n\n', text)
+    text = re.sub('\1+ *', ' ', text).strip()
+    text = markdownify(text, strip=['img']).strip()
     text = re.sub('  \n  \n', '\n\n', text)
     text = re.sub(' *\n\n+', '\n\n', text)
     return text

on top of the previous large diff

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Newlines lead to ugly output #36

Newlines lead to ugly output #36

mirabilos commented Nov 6, 2022

mirabilos commented Jan 7, 2023

mirabilos commented Jan 28, 2023 via email

mirabilos commented Feb 1, 2023

Newlines lead to ugly output #36

Newlines lead to ugly output #36

Comments

mirabilos commented Nov 6, 2022

mirabilos commented Jan 7, 2023

mirabilos commented Jan 28, 2023 via email

mirabilos commented Feb 1, 2023