Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prefer emoji.raw without VARIATION SELECTOR 16 #164

Merged
merged 4 commits into from
Jul 3, 2019
Merged

Conversation

mislav
Copy link
Contributor

@mislav mislav commented Jul 3, 2019

The emoji.raw value should preferrably be a shorter sequence without the \u{FE0F} suffix.

The only characters whose .raw value remains with the VARIATION SELECTOR 16 suffix are ones that would commonly render as text glyphs otherwise, or those that have variants with \u{FE0F} appearing in multiple places.

Fixes #163

@mislav mislav merged commit d48b0d9 into master Jul 3, 2019
@mislav mislav deleted the mislav/text-glyphs branch July 3, 2019 18:11
@xPaw
Copy link

xPaw commented Oct 3, 2019

Did this PR miss a lot of the emojis that have text representations by default?

For example I don't see this face in TEXT_GLYPHS:
☺️ with fe0f
☺ without

I'm using the emoji.json fail to map emoji.aliases to its emoji.emoji symbol and this PR kind of breaks that, even when I manually add fe0f from TEXT_GLYPHS.

These emojis aren't being found by https://github.com/mathiasbynens/emoji-regex anymore, and even github doesn't appear to render it.

@mislav
Copy link
Contributor Author

mislav commented Oct 3, 2019

@xPaw Different applications, OSs, and even fonts differ in how they decide to render characters that have no fe0f. Some are rendered as color emoji, some as text glyph. I could not find an official recommendation about how these should be handled. Instead, I've based the TEXT_GLYPHS list on how macOS chooses to treat these characters.

If you find a certain character that should be rendered as a text glyph if it has no fe0f, but it's not present in TEXT_GLYPHS, please send us a PR to add the character, and also please share your methodology of how you discovered that this should be a text glyph. Thank you!

@xPaw
Copy link

xPaw commented Oct 3, 2019

Well my comment above has an emoji that renders as text on Windows 10. It's a bit weird to base TEXT_GLYPHS list based on macOS alone.

03-1234-IntentAngelwingmussel

If I want to use emoji.json to generate alias->raw mapping for emoji autocompletion, what logic should I use to append fe0f? TEXT_GLYPHS is certainly not enough, at least on Windows.

@mislav
Copy link
Contributor Author

mislav commented Oct 3, 2019

I don't understand the example in your comment above. You mention “frowning face”, and yet you use 263a (“smiling face”) in your example. This exact character is explicitly listed in TEXT_GLYPHS

"\u{263a}", # smiling face

It's a bit weird to base TEXT_GLYPHS list based on macOS alone.

It may appear weird now, but when this project started out, Windows and various popular Linux distributions had little to no native emoji support. Apple was and is still a leader in emoji support, meaning that their decisions influence emoji adoption and support on other platforms too.

what logic should I use to append fe0f?

If you want a character to be rendered a color emoji, then just always append fe0f unless it's already there.

@xPaw
Copy link

xPaw commented Oct 3, 2019

You're right I used incorrect example. As it turns out it also depends on the font used (e.g. it shows in github comment, but not in diff), and that doesn't help.

in diffs:
03-1246-CyanChihuahua

in textarea:
03-1246-EveryUmbrellabird

If you want a character to be rendered a color emoji, then just always append fe0f unless it's already there.

Does github's :abcd: (when typing emojis with colon) auto completion do this?

@mislav
Copy link
Contributor Author

mislav commented Oct 3, 2019

Does github's :abcd: (when typing emojis with colon) auto completion do this?

It does ensure that all inserted emoji are always color emoji. For instance, typing :relaxed and pressing Enter inserts 263a-fe0f. Do note: in Chome's editing textarea, this emoji will sometimes be shown as a text glyph (this is incorrect behavior and is a known bug in Chrome for many years). Once the comment is submitted, the character is rendered correctly as color emoji, even in Chrome.

GitHub's emoji completer gets all its information from gemoji. In this case, gemoji's TEXT_GLYPHS dictates that 263a should never be inserted without fe0f.

@xPaw
Copy link

xPaw commented Nov 26, 2019

So I looked at this again.

Do i understand correctly that the dump.json contains emojis with TEXT_GLYPHS? If so, there's certainly some emojis that need to be added.

I made a diff of the map we generate: https://gist.github.com/xPaw/d3b3e1236b22f0fc19494dad279278dc

Quickly looking at the diff, I see plenty of emojis that should have the variant selector (these emojis don't get detected by emoji-regex).

There's also some emojis with male/female sign that seem to require the variant selector to be parsed/rendered correctly.

From what I can tell, there's only a few emojis that don't need the variant selector (like raised_hand_with_fingers_splayed), but most of emojis affected by this PR still need it.

@mislav
Copy link
Contributor Author

mislav commented Nov 26, 2019

Hmm I'm not sure what your diff represents or what the overarching problem is. I might be missing something.

Just to backtrack a little bit and make sure we're on the same page:

  • db/emoji.json is not an exhaustive list of every unicode variation that renders as the same emoji character; it only contains one "minimal" unicode sequence that hopefully renders as a color emoji. If that sequence renders as a text glyph in some common environment, we can consider that a bug with our emoji.json and let's correct it.

  • Emoji.find_by_unicode(str) should map a unicode sequence representing a color emoji character to an Emoji::Character instance. If a certain sequence is a known color emoji but is not recognized by find_by_unicode, that would be a bug with this library.

  • Emoji::Character#unicode_aliases is a set of different unicode sequences representing the same color emoji character. Each of these sequences should on its own render as a color emoji, even if it doesn't contain fe0f ("variation selector 16"). None should render as a text glyph. That is the sole purpose of the TEXT_GLYPHS internal list; to keep text glyphs out of unicode_aliases.

I'm curius—what is the problem that you're experiencing with this library? Lets scope the problem around a single example character, to keep it simple. Where is this character and its unicode sequence present in gemoji? Where is it missing?

@xPaw
Copy link

xPaw commented Nov 26, 2019

db/emoji.json is not an exhaustive list of every unicode variation that renders as the same emoji character; it only contains one "minimal" unicode sequence that hopefully renders as a color emoji

That's what I assumed, and how I've been using it.

The diff shows emoji map from before this PR, and after. There are plenty of emojis that no longer render as colored emoji, at least on Windows.

Emojis like eye and speech bubble would be such examples (but pretty much most of the emojis in the diff don't render correctly).

It worked fine before, but now it looks like TEXT_GLYPHS needs to be a way longer list.

We use the JSON file, not the library it self, there's sadly no good source to get it from, and your map is good enough.

@mislav
Copy link
Contributor Author

mislav commented Nov 26, 2019

There are plenty of emojis that no longer render as colored emoji, at least on Windows.

Emojis like eye and speech bubble would be such examples (but pretty much most of the emojis in the diff don't render correctly).

Ah so this is on Windows! Thanks for the leads. I will look into it. I don't often test on Windows.

As a temporary workaround, when you're generating simplemap.json you can append fe0f to every emoji that doesn't have one. I realize that this is not ideal experience for you, though.

@xPaw
Copy link

xPaw commented Nov 26, 2019

As a temporary workaround, when you're generating simplemap.json you can append fe0f to every emoji that doesn't have one.

That might be worse than before this PR, right now I simply haven't updated the map as there's no reason to until there are new emojis.

Here's the generator script we use, it simply re-maps it into an object: https://github.com/thelounge/thelounge/blob/master/scripts/generate-emoji.js

Ah so this is on Windows!

Not just, because \p{Emoji_Presentation}|\p{Emoji}\uFE0F also no longer catches these as emojis.

https://github.com/mathiasbynens/emoji-regex/blob/c75480c94abae44aaaa5a4e2038f05fe3550f38f/src/index.js#L3

Thank you for looking.

EDIT: Github also doesn't detect/wrap some of these as g-emoji. Try taking frowning_face or skull_and_crossbones from the json and checking it in the comment.

☹️ vs ☹
☠️ vs ☠

26-1647-ButteryDipper

@mislav
Copy link
Contributor Author

mislav commented Nov 26, 2019

Not just, because \p{Emoji_Presentation}|\p{Emoji}\uFE0F also no longer catches these as emojis.

I'm not familiar with emoji-regex. How is it related or affected by changes to gemoji? Also, which example characters does now emoji-regex not catch?

@xPaw
Copy link

xPaw commented Nov 26, 2019

How is it related or affected by changes to gemoji?

The "emoji" unicode sequence is not detected as an actual emoji, even Github comments don't detect them as emojis and don't wrap them as <g-emoji>.

Also, which example characters does now emoji-regex not catch?

Pretty much most of the emojis that were affected by this PR (I think there's also a separate issue with male/female modifier not having the variant selector in ZWJ emojis).

@xPaw
Copy link

xPaw commented Nov 26, 2019

Here's a simple script to test againt emoji-regex (which really, just follows the emoji/unicode specification).

Before this PR, it detected every single sequence as an emoji. After this PR, the follow emojis no longer get detected:

frowning face
skull and crossbones
hole
left speech bubble
right anger bubble
eye
skier
speaking head
chipmunk
dove
spider
spider web
rosette
shamrock
hot pepper
fork and knife with plate
world map
snow-capped mountain
mountain
camping
beach with umbrella
desert
desert island
national park
stadium
classical building
building construction
houses
derelict house
shinto shrine
cityscape
racing car
motorcycle
motorway
railway track
oil drum
passenger ship
ferry
motor boat
small airplane
satellite
bellhop bell
stopwatch
timer clock
mantelpiece clock
thermometer
cloud with lightning and rain
sun behind small cloud
sun behind large cloud
sun behind rain cloud
cloud with rain
cloud with snow
cloud with lightning
tornado
fog
wind face
umbrella on ground
comet
reminder ribbon
admission tickets
military medal
ice skate
joystick
chess pawn
framed picture
sunglasses
shopping bags
rescue worker’s helmet
studio microphone
level slider
control knobs
desktop computer
printer
keyboard
computer mouse
trackball
film frames
film projector
candle
rolled-up newspaper
label
ballot box with ballot
fountain pen
pen
paintbrush
crayon
card index dividers
spiral notepad
spiral calendar
linked paperclips
card file box
file cabinet
wastebasket
old key
pick
hammer and pick
hammer and wrench
dagger
crossed swords
shield
gear
clamp
balance scale
chains
alembic
bed
couch and lamp
coffin
funeral urn
radioactive
biohazard
atom symbol
om
wheel of dharma
orthodox cross
star and crescent
peace symbol
next track button
play or pause button
last track button
pause button
stop button
record button
medical symbol
infinity
fleur-de-lis
white flag
"use strict";

const got = require("got");
const emojiRegExp = require("emoji-regex")();

(async () => {
	const response = await got(
		"https://raw.githubusercontent.com/github/gemoji/master/db/emoji.json"
		// "https://raw.githubusercontent.com/github/gemoji/22b920f8bd6c2e453832955fe9e13971b95772c5/db/emoji.json"
	);

	const emojiStrategy = JSON.parse(response.body);

	for (const emoji of emojiStrategy) {
		let match;
		let found = false;

		while ((match = emojiRegExp.exec(emoji.emoji))) {
			found = true;
		}

		if (!found) {
			console.log(emoji.description);
		}
	}
})();

mislav added a commit that referenced this pull request Nov 29, 2019
This reverts commit d48b0d9, reversing
changes made to 03dea3b.
@mislav
Copy link
Contributor Author

mislav commented Nov 29, 2019

@xPaw Thanks for arguing your case; I've reverted it https://github.com/github/gemoji/releases/tag/v4.0.0.rc1

I saw your point re: ever-expanding TEXT_GLYPHS and I didn't want to go down that route.

@xPaw
Copy link

xPaw commented Nov 29, 2019

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

wrong codepoints for 🏎, 🏍 and maybe more
2 participants