auto convert certain Unicode Latin characters to the ASCII equivalents #78

SuperStormer · 2023-07-04T04:49:06Z

For example, all of the bold, italics, etc, letters here https://www.compart.com/en/unicode/block/U+1D400 which are sometimes used instead of ASCII letters in YT titles to make it look fancy or something. I have personally never seen any of them used for any legitimate purpose in a title.

Current list of applicable characters:

ajayyy · 2023-07-04T05:33:24Z

I have a feeling chat gpt could be useful to get a list of character maps

mchangrh · 2023-07-20T16:55:06Z

should this also be used for replacing 🅱️, α Α (Alpha) and similar glyphs that aren't transformation of a letter but are commonly used as replacements?

ajayyy · 2023-07-20T16:57:17Z

ooh, interesting. The new emoji cleaning probably broke a lot of 🅱️ titles

ajayyy · 2023-07-20T16:59:04Z

Yea...

ajayyy · 2023-07-20T17:00:26Z

I think we should exclude letter emojis until this issue is complete

mchangrh · 2023-07-20T17:07:33Z

We might have to add exclusions for B, A and AB since they are in the pictograph range iirc

Resolves discussion in #78

ajayyy · 2023-07-20T18:28:56Z

Added

SuperStormer · 2024-01-17T04:31:56Z

A partial solution would be NFKC normalization (String#normalize in JS).

list of character replacements: out.txt

import sys
import unicodedata
from pprint import pprint

chrs = [chr(c) for c in range(sys.maxunicode+1)]
pprint([(c,unicodedata.normalize("NFKC",c),unicodedata.name(c,None),unicodedata.name(unicodedata.normalize('NFKC',c)[0],None)) for c in chrs if unicodedata.normalize('NFKC', c)!=c])

Note that this doesn't fix letter emojis, nor small caps.

mchangrh · 2024-01-17T05:37:52Z

I've used NFKD and NFKC normalization in other projects before, It creates a mess out of emojis. I prefer https://github.com/gc/confusables for normalization using a map

ajayyy · 2024-08-11T19:24:01Z

There's a JS function to normalize strings and one of the modes should work well for this ("𝗠𝘆 𝗖𝗼𝗼𝗹 𝗧𝗶𝘁𝗹𝗲".normalize("NFKD") results in "My Cool Title")

#283

ajayyy added a commit that referenced this issue Jul 20, 2023

Add handling for emoji letters

64996f0

Resolves discussion in #78

ajayyy mentioned this issue Jan 8, 2024

Title formatting not being applied to custom fonts #212

Closed

SuperStormer mentioned this issue Aug 11, 2024

Normalize stylized text in video titles #283

Closed

ajayyy closed this as completed in 0712c85 Sep 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

auto convert certain Unicode Latin characters to the ASCII equivalents #78

auto convert certain Unicode Latin characters to the ASCII equivalents #78

SuperStormer commented Jul 4, 2023 •

edited

Loading

ajayyy commented Jul 4, 2023

mchangrh commented Jul 20, 2023

ajayyy commented Jul 20, 2023

ajayyy commented Jul 20, 2023

ajayyy commented Jul 20, 2023

mchangrh commented Jul 20, 2023

ajayyy commented Jul 20, 2023

SuperStormer commented Jan 17, 2024 •

edited

Loading

mchangrh commented Jan 17, 2024

ajayyy commented Aug 11, 2024

auto convert certain Unicode Latin characters to the ASCII equivalents #78

auto convert certain Unicode Latin characters to the ASCII equivalents #78

Comments

SuperStormer commented Jul 4, 2023 • edited Loading

ajayyy commented Jul 4, 2023

mchangrh commented Jul 20, 2023

ajayyy commented Jul 20, 2023

ajayyy commented Jul 20, 2023

ajayyy commented Jul 20, 2023

mchangrh commented Jul 20, 2023

ajayyy commented Jul 20, 2023

SuperStormer commented Jan 17, 2024 • edited Loading

mchangrh commented Jan 17, 2024

ajayyy commented Aug 11, 2024

SuperStormer commented Jul 4, 2023 •

edited

Loading

SuperStormer commented Jan 17, 2024 •

edited

Loading