Skip to content
This repository has been archived by the owner on Sep 27, 2024. It is now read-only.

Fix unwanted extra paragraphs after HTML parsing #820

Merged
merged 1 commit into from
Sep 20, 2023

Conversation

jonnyandrew
Copy link
Contributor

@jonnyandrew jonnyandrew commented Sep 20, 2023

Problem

The editor may be initialised with arbitrary HTML content however, it currently misbehaves when supplied HTML that contains new lines and indentation.

For example, consider the following HTML snippet:

<p>
  Hello, how are you doing?
</p>

<p>
  I'm fine thanks.
  And you?
</p>

This snippet contains valid HTML but the editor would not understand that the new lines and and spacing are for formatting and indenting the markup, rather than to be displayed as text characters.

The resulting AST contains lots of stray paragraphs and new line characters in the text, which is not correct:

├>p
│ └>"
│   "
├>p
│ └>"
│     Hello, how are you doing?
│   "
├>p
│ └>"
│
│   "
├>p
│ └>"
│     I'm fine thanks.
│     And you?
│   "
└>p
  └>"
            "

Solution

Instead of treating new lines as text, treat new lines (and adjacent spacing) within HTML as indentation:

  • Trim surrounding indentation from text
  • Replace internal indentation with a single space

After this change, the AST now looks like:

└>p
  └>"Hello, how are you doing?"
└>p
  └>"I'm fine thanks. And you?"

See this JSFiddle example where the same behaviour is shown.

@codecov-commenter
Copy link

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.03% 🎉

Comparison is base (0ea1686) 89.85% compared to head (3eaef66) 89.88%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #820      +/-   ##
==========================================
+ Coverage   89.85%   89.88%   +0.03%     
==========================================
  Files         113       82      -31     
  Lines       16231    14725    -1506     
  Branches      604        0     -604     
==========================================
- Hits        14584    13236    -1348     
+ Misses       1623     1489     -134     
+ Partials       24        0      -24     
Flag Coverage Δ
uitests ?
uitests-ios ?
unittests 89.88% <100.00%> (+1.32%) ⬆️
unittests-ios ?
unittests-rust 89.88% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
crates/wysiwyg/src/dom/parser/parse.rs 97.93% <100.00%> (+0.03%) ⬆️

... and 31 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@sonarcloud
Copy link

sonarcloud bot commented Sep 20, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

@jonnyandrew jonnyandrew requested a review from a team September 20, 2023 10:36
@jonnyandrew jonnyandrew marked this pull request as ready for review September 20, 2023 10:36
@jonnyandrew jonnyandrew merged commit 68ad9e2 into main Sep 20, 2023
10 checks passed
@jonnyandrew jonnyandrew deleted the jonny/fix-html-parsing-newlines branch September 20, 2023 13:22
jonnyandrew added a commit to element-hq/element-x-android that referenced this pull request Sep 29, 2023
- Add full screen mode for the rich text editor (RTE). When text formatting options are enabled, the editor can be dragged to full screen.
- Remove `ConstraintLayout` from `textcomposer` module, now made much simpler now the RTE supports being called in multiple layouts matrix-org/matrix-rich-text-editor#822

- Part of element-hq/element-meta#1973
- Includes design from #1315
- Fixes #1293 (through new layout)
- Fixes #1394 (through inclusion of matrix-org/matrix-rich-text-editor#824)
- Fixes #1259 (through inclusion of matrix-org/matrix-rich-text-editor#820)

---------

Co-authored-by: ElementBot <benoitm+elementbot@element.io>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Setting content from HTML results in unwanted paragraphs
3 participants