profiles.smからタイ語を抜いてタイ語を判定すると高いProbabilityで日本語と判定される #78

GoogleCodeExporter · 2015-05-28T05:40:20Z

空気読まず日本語で失礼します。

What steps will reproduce the problem?
1. profiles.smからth(タイ語)を抜く
2. 1.のprofiles.smを使いタイ語文章の言語判定を行う

What is the expected output? What do you see instead?
"no features in text"と例外が出ると思いましたが、
   [ja:0.999999...]
というProbabilityを得ました。
もしprofiles.smからjaを抜いた場合はkoがそれに近い値となり��
�koを抜くとやっと
"no features in text"と表示されました。

What version of the product are you using? On what operating system?
ライブラリ、profiles.sm共にmasterのHEAD (Rev. 
a1b65d981fc4)のものを使用しました。

Please provide any additional information below.
添付ファイルは再現ソースです。
    ./gradle run
で実行できます。

内容はタイ語のYouTube動画 (http://youtu.be/FwyND40c3pw) 
のタイトルと説明文を判定するものですが、
    タイトル: [ja:0.9999965012903201]
    説明文:   [ja:0.9999983886531987]
となります。





仕様だったらすみません。

Original issue reported on code.google.com by mshiban...@gmail.com on 12 May 2015 at 11:27

Attachments:

language-detection-bug.zip

The text was updated successfully, but these errors were encountered:

dennis97519 · 2015-07-22T12:59:20Z

Probably because there are random thai characters used for kaomoji in the Japanese and Korean short message profile? When opening the profile as plain text there are also random arabic characters and stuff.

Maybe try optimaize's language-detector which is modified from this one, since Shuyo doesn't go on here that often to update.

Also, maybe try see the probability for Thai also. There is a list probability function I remember?

日本語が苦手だから英語で答えてった。ごめん。

GoogleCodeExporter added Type-Other auto-migrated Priority-Medium labels May 28, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

profiles.smからタイ語を抜いてタイ語を判定すると高いProbabilityで日本語と判定される #78

profiles.smからタイ語を抜いてタイ語を判定すると高いProbabilityで日本語と判定される #78

GoogleCodeExporter commented May 28, 2015

dennis97519 commented Jul 22, 2015

profiles.smからタイ語を抜いてタイ語を判定すると高いProbabilityで日本語と判定される #78

profiles.smからタイ語を抜いてタイ語を判定すると高いProbabilityで日本語と判定される #78

Comments

GoogleCodeExporter commented May 28, 2015

dennis97519 commented Jul 22, 2015