-
Notifications
You must be signed in to change notification settings - Fork 12
Home
- Data source
- Data consolidation
- Translations
- Fitzpatrick scale
- Code points conversion
- Flag emojis conversion
- Code points composition
Components metadata (skin tones & hair styles) :
- code point
- group
- subgroup
- version
And emojis metadata :
- code points
- group
- subgroup
- version
- relationship between a base emoji and its variations
- components used by a variation (skin tone & hair style)
Are all retrieved based on the fully-qualified emojis and components from the emoji-test.txt
file, which is available on Unicode's website (https://unicode.org/Public/emoji/).
The version 14.0
has been used, and because the file's structure has been subject to changes, versions prior to the 12.1
are not able to produce the complete data set.
Components & emojis translations :
- text-to-speech (description)
- keywords
Are retrieved from the common/annotations/en.xml
and common/annotationsDerived/en.xml
files, which are available on Unicode's CLDR (Common Locale Data Repository) (https://github.com/unicode-org/cldr).
The release-40
tag has been used to ensure that the file structure doesn't change over time.
Some consolidation of the data has been made to make them more useful.
An additional metadata named category
has been added to each emojis to more conventionally reflect emojis grouping on mobile devices, making groups of emojis more balanced. Though it differs a little from Android, it's quite similar to how emojis are grouped on it.
If you don't like it, you can make your own grouping logic using the original group
and subgroup
metadata.
Only the en
(American English) locale is actually provided, as translations are quite heavy (~500kB per locale).
Until a way is found, so that people can load only the required locales to reduce the size of their projects, I recommend generating translations yourself using this repository :
You just need to change the unicodeCldrLocale
variable inside the generate-unicode-emoji.cjs
file to whatever locale is available on Unicode's CLDR and then run the npm run build
command (or node generate-unicode-emoji.cjs
command).
Skin tones are based on the Fitzpatrick scale (https://en.wikipedia.org/wiki/Fitzpatrick_scale) :
Emoji | Description | Fitzpatrick scale |
---|---|---|
π» | Light skin tone | Type I and II |
πΌ | Medium-light skin tone | Type III |
π½ | Medium skin tone | Type IV |
πΎ | Medium-dark skin tone | Type V |
πΏ | Dark skin tone | Type VI |
JavaScript natively supports code points in string using unicode escapes \u
.
Code points between U+0000
and U+FFFF
doesn't require to be surrounded by {}
:
const heartEmoji = '\u2764\uFE0F';
console.log(heartEmoji); // β€οΈ
Code points greater than U+FFFF
(named astral code points) are internally represented as surrogate pairs and need either to be broken down into two code points (the surrogate pair) or to be surrounded by {}
:
const grinningEmojiWithSurrogatePair = '\uD83D\uDE00';
console.log(grinningEmojiWithSurrogatePair); // π
const grinningEmojiWithAstralCodePoint = '\u{1F600}'
console.log(grinningEmojiWithAstralCodePoint); // π
I recommend always surrounding the code points with {}
to avoid error and improve readability.
const pirateFlagEmoji = '\u{1F3F4}\u{200D}\u{2620}\u{FE0F}';
console.log(pirateFlagEmoji); // π΄ββ οΈ
If you prefer, you can also programmatically retrieve an emoji using an array of code points like this :
const pirateFlagCodePoints = ['1F3F4', '200D', '2620', 'FE0F'];
const pirateFlagEmoji = String.fromCodePoint(
...pirateFlagCodePoints.map(codePoint => parseInt(codePoint, 16))
);
console.log(pirateFlagEmoji); // π΄ββ οΈ
And retrieve the code points of an emoji like this :
const pirateFlagEmoji = 'π΄ββ οΈ';
const pirateFlagCodePoints = Array.from(pirateFlagEmoji).map(character => {
return character.codePointAt(0).toString(16).toUpperCase();
});
console.log(pirateFlagCodePoints); // ['1F3F4', '200D', '2620', 'FE0F']
You can convert country ISO 3166-1 codes to flag emojis using Code points conversion
.
It's super easy, fortunately, the guys at Unicode created the flag emojis by just adding an offset (127397
) to the ISO 3166-1 (alpha-2) country codes.
So, if you have a country code like US
(United States of America), you just have to convert each char to its code point equivalent, then add the offset, and finally convert it back to char.
Here's how to convert a country code to its flag emoji equivalent :
const flagEmojiOffset = 127397;
const franceCountryCode = 'FR';
const franceFlagEmoji = String.fromCodePoint(
...Array.from(franceCountryCode).map(character => character.codePointAt(0) + flagEmojiOffset)
);
console.log(franceFlagEmoji); // 'π«π·'
And here's how to convert a flag emoji back to its country code equivalent :
const flagEmojiOffset = 127397;
const franceFlagEmoji = 'π«π·';
const franceCountryCode = String.fromCodePoint(
...Array.from(franceFlagEmoji).map(character => character.codePointAt(0) - flagEmojiOffset)
);
console.log(franceCountryCode); // 'FR'
Complex emojis and emoji's variations often consist of one or more base emojis.
Unicode uses the 200D
code point as a ligature code point (zero-width joiner) between two base emojis to combine them :
const blackFlagEmojiEmoji = '\u{1F3F4}';
console.log(blackFlagEmojiEmoji); // π΄
const skullAndCrossbonesEmoji = '\u{2620}\u{FE0F}';
console.log(skullAndCrossbonesEmoji); // β οΈ
const ligatureCodePoint = '\u{200D}';
const pirateFlagEmoji =
blackFlagEmojiEmoji +
ligatureCodePoint +
skullAndCrossbonesEmoji;
console.log(pirateFlagEmoji); // π΄ββ οΈ
This even works for more complex compositions :
const womanEmoji = '\u{1F469}';
console.log(womanEmoji); // π©
const heartEmoji = '\u{2764}\u{FE0F}';
console.log(heartEmoji); // β€οΈ
const kissEmoji = '\u{1F48B}';
console.log(kissEmoji); // π
const manEmoji = '\u{1F468}';
console.log(manEmoji); // π¨
const ligatureCodePoint = '\u{200D}';
const womanAndManKissingEmoji =
womanEmoji +
ligatureCodePoint +
heartEmoji +
ligatureCodePoint +
kissEmoji +
ligatureCodePoint +
manEmoji;
console.log(womanAndManKissingEmoji); // π©ββ€οΈβπβπ¨
Skin tone components must not use the ligature code point, and be placed directly after base emojis that support skin tone variations, however, if the base emojis ends up with the FE0F
code point (which serves as a presentation selector), you'll need to remove it first :
// Emoji without presentation selector
const thumbsUpBaseEmoji = '\u{1F44D}';
console.log(thumbsUpBaseEmoji); // π
const lightSkinToneComponent = '\u{1F3FB}';
console.log(lightSkinToneComponent); // π»
const thumbsUpWithLightSkinToneEmoji =
thumbsUpBaseEmoji +
lightSkinToneComponent;
console.log(thumbsUpWithLightSkinToneEmoji); // ππ»
// Emoji with presentation selector
const victoryHandBaseEmoji = '\u{270C}\u{FE0F}';
console.log(victoryHandBaseEmoji); // βοΈ
const darkSkinToneComponent = '\u{1F3FF}';
console.log(darkSkinToneComponent); // πΏ
const presentationSelectorCodePoint = '\u{FE0F}'
const victoryHandWithDarkSkinToneEmoji =
victoryHandBaseEmoji.replace(presentationSelectorCodePoint, '') +
darkSkinToneComponent;
console.log(victoryHandWithDarkSkinToneEmoji); // βπΏ
Now you can combine both, skin tone variations and ligature code points to create even more complex emojis :
const personFacepalmingEmoji = '\u{1F926}'; // Note this is a genderless emoji
console.log(personFacepalmingEmoji); // π€¦
const mediumSkinToneComponent = '\u{1F3FD}';
console.log(mediumSkinToneComponent); // π½
const femaleSignEmoji = '\u{2640}\u{FE0F}';
console.log(femaleSignEmoji); // βοΈ
const ligatureCodePoint = '\u{200D}';
const womanFacepalmingWithMediumSkinToneEmoji =
personFacepalmingEmoji +
mediumSkinToneComponent +
ligatureCodePoint +
femaleSignEmoji;
console.log(womanFacepalmingWithMediumSkinToneEmoji); // π€¦π½ββοΈ
const womanEmoji = '\u{1F469}';
console.log(womanEmoji); // π©
const mediumLightSkinToneComponent = '\u{1F3FC}';
console.log(mediumLightSkinToneComponent); // πΌ
const handshakeEmoji = '\u{1F91D}';
console.log(handshakeEmoji); // π€
const manEmoji = '\u{1F468}';
console.log(manEmoji); // π¨
const mediumDarkSkinToneComponent = '\u{1F3FE}';
console.log(mediumDarkSkinToneComponent); // πΎ
const ligatureCodePoint = '\u{200D}';
const womanWithMediumLightSkinToneAndManWithMediumDarkSkinToneHoldingHandsEmoji =
womanEmoji +
mediumLightSkinToneComponent +
ligatureCodePoint +
handshakeEmoji +
ligatureCodePoint +
manEmoji +
mediumDarkSkinToneComponent;
console.log(womanWithMediumLightSkinToneAndManWithMediumDarkSkinToneHoldingHandsEmoji); // π©πΌβπ€βπ¨πΎοΈ