Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use emoji from commonmarker #373

Merged
merged 5 commits into from
Jan 26, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion Gemfile
Original file line number Diff line number Diff line change
Expand Up @@ -27,14 +27,16 @@ group :development do
end

group :test do
gem "commonmarker", "~> 1.0.0.pre4", require: false
gem "commonmarker", "~> 1.0.0.pre7", require: false
gem "gemoji", "~> 3.0", require: false
gem "gemojione", "~> 4.3", require: false

gem "minitest"

gem "minitest-bisect", "~> 1.6"

gem "nokogiri", "~> 1.13"

gem "minitest-focus", "~> 1.1"
gem "rouge", "~> 3.1", require: false
end
29 changes: 19 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -230,19 +230,28 @@ end

For more information on how to write effective `NodeFilter`s, refer to the provided filters, and see the underlying lib, [Selma](https://www.github.com/gjtorikian/selma) for more information.

- `AbsoluteSourceFilter` - replace relative image urls with fully qualified versions
- `EmojiFilter` - converts `:<emoji>:` to [emoji](http://www.emoji-cheat-sheet.com/)!
- `HttpsFilter` - Replacing http urls with https versions
- `ImageMaxWidthFilter` - link to full size image for large images
- `MentionFilter` - replace `@user` mentions with links
- `SanitizationFilter` - allow sanitize user markup
- `TableOfContentsFilter` - anchor headings with name attributes and generate Table of Contents html unordered list linking headings
- `TeamMentionFilter` - replace `@org/team` mentions with links
- `AbsoluteSourceFilter`: replace relative image urls with fully qualified versions
- `EmojiFilter`: converts `:<emoji>:` to [emoji](http://www.emoji-cheat-sheet.com/)
- (Note: the included `MarkdownFilter` will already convert emoji)
- `HttpsFilter`: Replacing http urls with https versions
- `ImageMaxWidthFilter`: link to full size image for large images
- `MentionFilter`: replace `@user` mentions with links
- `SanitizationFilter`: allow sanitize user markup
- `SyntaxHighlightFilter`: applies syntax highlighting to `pre` blocks
- (Note: the included `MarkdownFilter` will already apply highlighting)
- `TableOfContentsFilter`: anchor headings with name attributes and generate Table of Contents html unordered list linking headings
- `TeamMentionFilter`: replace `@org/team` mentions with links

## Dependencies

Since filters can be customized to your heart's content, gem dependencies are _not_ bundled; this project doesn't know which of the default filters you might use, and as such, you must bundle each filter's gem
dependencies yourself.
Since filters can be customized to your heart's content, gem dependencies are _not_ bundled; this project doesn't know which of the default filters you might use, and as such, you must bundle each filter's gem dependencies yourself.

For example, `SyntaxHighlightFilter` uses [rouge](https://github.com/jneen/rouge)
to detect and highlight languages; to use the `SyntaxHighlightFilter`, you must add the following to your Gemfile:

```ruby
gem "rouge"
```

> **Note**
> See the [Gemfile](/Gemfile) `:test` group for any version requirements.
Expand Down
1 change: 0 additions & 1 deletion UPGRADING.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@ This project is now under a module called `HTMLPipeline`, not `HTML::Pipeline`.
The following filters were removed:

- `AutolinkFilter`: this is handled by [Commonmarker](https://www.github.com/gjtorikian/commonmarker) and can be disabled/enabled through the `MarkdownFilter`'s `context` hash
- `SyntaxHighlightFilter`: this is handled by [Commonmarker](https://www.github.com/gjtorikian/commonmarker) and can be disabled/enabled through the `MarkdownFilter`'s `context` hash
- `SanitizationFilter`: this is handled by [Selma](https://www.github.com/gjtorikian/selma); configuration can be done through the `sanitization_config` hash

- `EmailReplyFilter`
Expand Down
21 changes: 15 additions & 6 deletions lib/html_pipeline.rb
Original file line number Diff line number Diff line change
Expand Up @@ -145,8 +145,11 @@ def call(text, context: {}, result: {})
context = context.freeze
result ||= {}

payload = default_payload({ text_filters: @text_filters.map(&:name),
context: context, result: result, })
payload = default_payload({
text_filters: @text_filters.map(&:name),
context: context,
result: result,
})
instrument("call_text_filters.html_pipeline", payload) do
result[:output] =
@text_filters.inject(text) do |doc, filter|
Expand All @@ -159,8 +162,11 @@ def call(text, context: {}, result: {})
html = @convert_filter.call(text) unless @convert_filter.nil?

unless @node_filters.empty?
payload = default_payload({ node_filters: @node_filters.map { |f| f.class.name },
context: context, result: result, })
payload = default_payload({
node_filters: @node_filters.map { |f| f.class.name },
context: context,
result: result,
})
instrument("call_node_filters.html_pipeline", payload) do
result[:output] = Selma::Rewriter.new(sanitizer: @sanitization_config, handlers: @node_filters).rewrite(html)
end
Expand All @@ -178,8 +184,11 @@ def call(text, context: {}, result: {})
#
# Returns the result of the filter.
def perform_filter(filter, doc, context: {}, result: {})
payload = default_payload({ filter: filter.name,
context: context, result: result, })
payload = default_payload({
filter: filter.name,
context: context,
result: result,
})
instrument("call_filter.html_pipeline", payload) do
filter.call(doc, context: context, result: result)
end
Expand Down
62 changes: 62 additions & 0 deletions lib/html_pipeline/node_filter/syntax_highlight_filter.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# frozen_string_literal: true

HTMLPipeline.require_dependency("rouge", "SyntaxHighlightFilter")

class HTMLPipeline
class NodeFilter
# HTML Filter that syntax highlights text inside code blocks.
#
# Context options:
#
# :highlight => String represents the language to pick lexer. Defaults to empty string.
# :scope => String represents the class attribute adds to pre element after.
# Defaults to "highlight highlight-css" if highlights a css code block.
#
# This filter does not write any additional information to the context hash.
class SyntaxHighlightFilter < NodeFilter
def initialize(context: {}, result: {})
super(context: context, result: result)
# TODO: test the optionality of this
@formatter = context[:formatter] || Rouge::Formatters::HTML.new
end

SELECTOR = Selma::Selector.new(match_element: "pre", match_text_within: "pre")

def selector
SELECTOR
end

def handle_element(element)
default = context[:highlight]&.to_s
@lang = element["lang"] || default

scope = context.fetch(:scope, "highlight")

element["class"] = "#{scope} #{scope}-#{@lang}" if include_lang?
end

def handle_text_chunk(text)
return if @lang.nil?
return if (lexer = lexer_for(@lang)).nil?

content = text.to_s

text.replace(highlight_with_timeout_handling(content, lexer), as: :html)
end

def highlight_with_timeout_handling(text, lexer)
Rouge.highlight(text, lexer, @formatter)
rescue Timeout::Error => _e
text
end

def lexer_for(lang)
Rouge::Lexer.find(lang)
end

def include_lang?
!@lang.nil? && !@lang.empty?
end
end
end
end
6 changes: 4 additions & 2 deletions lib/html_pipeline/node_filter/table_of_contents_filter.rb
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,10 @@ class NodeFilter
# result[:output].to_s
# # => "<h1>\n<a id=\"ice-cube\" class=\"anchor\" href=\"#ice-cube\">..."
class TableOfContentsFilter < NodeFilter
SELECTOR = Selma::Selector.new(match_element: "h1 a[href], h2 a[href], h3 a[href], h4 a[href], h5 a[href], h6 a[href]",
match_text_within: "h1, h2, h3, h4, h5, h6")
SELECTOR = Selma::Selector.new(
match_element: "h1 a[href], h2 a[href], h3 a[href], h4 a[href], h5 a[href], h6 a[href]",
match_text_within: "h1, h2, h3, h4, h5, h6",
)

def selector
SELECTOR
Expand Down
147 changes: 135 additions & 12 deletions lib/html_pipeline/sanitization_filter.rb
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,70 @@ class SanitizationFilter
# The main sanitization allowlist. Only these elements and attributes are
# allowed through by default.
DEFAULT_CONFIG = Selma::Sanitizer::Config.freeze_config({
elements: ["h1", "h2", "h3", "h4", "h5", "h6", "br", "b", "i", "strong", "em", "a", "pre", "code",
"img", "tt", "div", "ins", "del", "sup", "sub", "p", "picture", "ol", "ul", "table", "thead", "tbody", "tfoot",
"blockquote", "dl", "dt", "dd", "kbd", "q", "samp", "var", "hr", "ruby", "rt", "rp", "li", "tr", "td", "th",
"s", "strike", "summary", "details", "caption", "figure", "figcaption", "abbr", "bdo", "cite",
"dfn", "mark", "small", "source", "span", "time", "wbr",],
elements: [
"h1",
"h2",
"h3",
"h4",
"h5",
"h6",
"br",
"b",
"i",
"strong",
"em",
"a",
"pre",
"code",
"img",
"tt",
"div",
"ins",
"del",
"sup",
"sub",
"p",
"picture",
"ol",
"ul",
"table",
"thead",
"tbody",
"tfoot",
"blockquote",
"dl",
"dt",
"dd",
"kbd",
"q",
"samp",
"var",
"hr",
"ruby",
"rt",
"rp",
"li",
"tr",
"td",
"th",
"s",
"strike",
"summary",
"details",
"caption",
"figure",
"figcaption",
"abbr",
"bdo",
"cite",
"dfn",
"mark",
"small",
"source",
"span",
"time",
"wbr",
],

attributes: {
"a" => ["href"],
Expand All @@ -31,13 +90,77 @@ class SanitizationFilter
"ins" => ["cite"],
"q" => ["cite"],
"source" => ["srcset"],
all: ["abbr", "accept", "accept-charset", "accesskey", "action", "align", "alt", "aria-describedby",
"aria-hidden", "aria-label", "aria-labelledby", "axis", "border", "char",
"charoff", "charset", "checked", "clear", "cols", "colspan", "compact", "coords", "datetime", "dir",
"disabled", "enctype", "for", "frame", "headers", "height", "hreflang", "hspace", "id", "ismap", "label", "lang",
"maxlength", "media", "method", "multiple", "name", "nohref", "noshade", "nowrap", "open", "progress",
"prompt", "readonly", "rel", "rev", "role", "rows", "rowspan", "rules", "scope", "selected", "shape",
"size", "span", "start", "summary", "tabindex", "title", "type", "usemap", "valign", "value", "width", "itemprop",],
all: [
"abbr",
"accept",
"accept-charset",
"accesskey",
"action",
"align",
"alt",
"aria-describedby",
"aria-hidden",
"aria-label",
"aria-labelledby",
"axis",
"border",
"char",
"charoff",
"charset",
"checked",
"clear",
"cols",
"colspan",
"compact",
"coords",
"datetime",
"dir",
"disabled",
"enctype",
"for",
"frame",
"headers",
"height",
"hreflang",
"hspace",
"id",
"ismap",
"label",
"lang",
"maxlength",
"media",
"method",
"multiple",
"name",
"nohref",
"noshade",
"nowrap",
"open",
"progress",
"prompt",
"readonly",
"rel",
"rev",
"role",
"rows",
"rowspan",
"rules",
"scope",
"selected",
"shape",
"size",
"span",
"start",
"summary",
"tabindex",
"title",
"type",
"usemap",
"valign",
"value",
"width",
"itemprop",
],
},
protocols: {
"a" => { "href" => Selma::Sanitizer::Config::VALID_PROTOCOLS }.freeze,
Expand Down
2 changes: 1 addition & 1 deletion lib/html_pipeline/version.rb
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# frozen_string_literal: true

class HTMLPipeline
VERSION = "3.0.0.pre1"
VERSION = "3.0.0.pre2"
end
Loading