Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

genericcontenthandler is brutal #348

Closed
deliminator opened this issue Nov 10, 2011 · 13 comments
Closed

genericcontenthandler is brutal #348

deliminator opened this issue Nov 10, 2011 · 13 comments

Comments

@deliminator
Copy link
Contributor

The genericcontenthandler does too much. It rougly does the following:

  • removes table attributes: border, cellspacing, cellpadding
  • removes td attributes: width, height, valign
  • removes the paragraph in a td if it is the only one
  • replaces <p><br></p> with &nbsp in a td
  • removes all colgroups (including contents)
  • transforms strong => b, em => i, s => del
  • removes u tags
  • removes comments
  • removes span, font, div tags
  • removes style tags and attributes
  • removes elements that are not in the HTML namespace (if set)

This is too much, since it is supposed to be generic. Only those clean-ups that are always a good idea should be done (I suppose removing comments is always a good idea).

Pasting from word (and probably other sources) do need the functionality above. So, the things the genriccontenthandler does are probably always a good idea when content is pasted. However, the genericcontenthandler can also be configured for cleaning up the content when the editable is initialized.

It may be possible to make the sanitize content handler handler do some of the things the genericcontenthandler does.

What would be interesting to have is a generic content handler, that can be activated by default and used for initializing editables without causing any undesired side-effects.

@evo42
Copy link
Member

evo42 commented Nov 10, 2011

+1 for moving some functions from generic to word and sanitize (and/or new ones) is a good idea

@scribu
Copy link

scribu commented Nov 10, 2011

Removing comments is not always a good idea. For example, in WordPress, <!--more--> is used to split the content.

@petrosalema
Copy link
Member

The "brutality" of the content handlers is well noted, @deliminator.

@evo42 and I have discussed how the content handling approach needs to be
further considered, and significantly improved. It is a tricky business and
needs to be done right.

It will be a good idea to put content handlers back into the roadmap.

Here are some of things that need to be addressed:

  • Content handlers must stop removing classes or attributes that belong to
    Aloha Editor or its plugins. Such classes and attributes are prefixed with
    "aloha-" to distinguish them, and they should only be removed by which ever
    component added them. The fact that they are currently being cleaned out of
    incoming paste content is a source of some erratic behaviour in a number of
    plugins.
  • We need to formalize how content handlers are categorized so that content
    handlers are limited in what content they are to mutate. A content handler
    that is responsible for santizing content pasted from MS Word should (to the
    extent possible) not worry itself with merging adjacent <span> tags and
    other such things, since that ought to be handled by something like the
    generic content handler later in the execution order.
  • We also need to establish an ideal order for how content handlers are to be
    stacked so that as the content handlers operate over the content, they do not
    accidentally clobber each other's mutations, and handlers that remove
    specific information do not get executed before handlers that need this
    information in order to work correctly.
  • Code in the contenthandler plugin needs to be refactored and brought to
    conformance with Aloha Editor coding guidelines.
  • Perhaps the most critical and distressing issue: Script timeouts when
    runnning content handlers in IE (tested in IE8). I see 2 issues here: one
    is that we need a strategy to improve the peformance of content handler
    algorithms, and to developer/discover techniques to reduce the work being
    done; the other issue is about developing a strategy for preventing the epic
    failure and humiliation of a browser hang. I can hardly think of anything
    that would kill the user experience quite like an timely hang! Aloha Editor
    should never hang, it should fail gracefully. So I think we need to
    consider about how to break up potentially long executions into smaller
    ones, which will release execution often enough that timeouts can be
    effected if a clean up operation is taking too long to complete.

@petrosalema
Copy link
Member

Is there anyone who would like to take this on?

@deliminator
Copy link
Contributor Author

What is the purpose of content handlers? To clean up pasted content. Anything else?

@evo42
Copy link
Member

evo42 commented Apr 19, 2012

moved to backlog for further discussion

@armellarcier
Copy link

Hi!
This problem is huge!
I'm trying to work with the align plugin and the only thing stopping me is the contentHandler... All styles are removed and I can't see any workaround...

@evo42
Copy link
Member

evo42 commented Jul 30, 2012

@Benew you can define your own settings for the (sanitize) contentHandler (Aloha.settings.contentHandler.allows):

Aloha.settings.contentHandler: {
    insertHtml: [ 'word', 'generic', 'oembed', 'sanitize' ],
    initEditable: [ 'sanitize' ],
    getContents: [ 'blockelement', 'sanitize', 'basic' ],
    sanitize: 'relaxed' // relaxed, restricted, basic,
    allows: {
        elements: [
            'strong', 'em', 'i', 'b', 'blockquote', 'br', 'cite', 'code', 'dd', 'div', 'dl', 'dt', 'em',
            'i', 'li', 'ol', 'p', 'pre', 'q', 'small', 'strike', 'sub',
            'sup', 'u', 'ul'],

        attributes: {
            'a'         : ['href'],
            'blockquote': ['cite'],
            'q'         : ['cite']
         },

        protocols: {
            'a'         : {'href': ['ftp', 'http', 'https', 'mailto', '__relative__']}, // Sanitize.RELATIVE
            'blockquote': {'cite': ['http', 'https', '__relative__']},
            'q'         : {'cite': ['http', 'https', '__relative__']}
        }
    },
   handler: {
        generic: {
            transformFormattings: false // this will disable the transformFormattings method in the generic content handler
        }
    }
}

source: http://aloha-editor.org/guides/plugin_contenthandler.html

You can use the default ("relaxed") config from the sanitize handler and add the attributes (eg style) for the elements where you want to use/allow the align plugin:

default settings: https://github.com/alohaeditor/Aloha-Editor/blob/dev/src/plugins/common/contenthandler/lib/sanitizecontenthandler.js#L75

@wimleers
Copy link
Contributor

I notice above:

transforms strong => b, em => i, s => del

What if you want the inverse? I.e. b -> strong, i -> em.

@evo42
Copy link
Member

evo42 commented Jul 30, 2012

@wimleers at the moment you'll need to deactivate the genericcontenthandler and use your own handler (eg. copy it and rename) -- it's not configurable (yet)

here's the part which is doing the transformation:
https://github.com/alohaeditor/Aloha-Editor/blob/dev/src/plugins/common/contenthandler/lib/genericcontenthandler.js#L156

@wimleers
Copy link
Contributor

Thanks. Won't cause disabling genericcontenthandler unwanted side-effects?

(Your reply partially answers https://getsatisfaction.com/aloha_editor/topics/content_handler_to_convert_to_when_pasting_must_run_before_all_others, by the way. I've linked to your reply from there.)

@evo42
Copy link
Member

evo42 commented Jul 30, 2012

That's right, the genericcontenthandler is quite important. Not only disable it, but copy it, modify it to your needs, rename it and put it into your own plugin -- then define the order of / which contenthandler you'll want to use in what situation: Aloha.settings.contentHandler.insertHtml {used at paste}, Aloha.settings.contentHandler.initEditable and the undocumented one: Aloha.settings.contentHandler.getContents

@evo42
Copy link
Member

evo42 commented Jul 30, 2012

@wimleers you can also just deactivate the transformFormattings method in the generic CH with Aloha.settings.contentHandler.handler.generic.transformFormattings = false (and add just the adjusted transformFormattings method to your drupalcontenthandler)

@evo42 evo42 closed this as completed Oct 17, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants