Skip to content

Commit

Permalink
Add list of valid encodings with short description (elastic#11285)
Browse files Browse the repository at this point in the history
  • Loading branch information
kvch authored Mar 20, 2019
1 parent 5b6cedb commit daefdb5
Showing 1 changed file with 53 additions and 5 deletions.
58 changes: 53 additions & 5 deletions filebeat/docs/inputs/input-common-harvester-options.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,59 @@ The file encoding to use for reading data that contains international
characters. See the encoding names http://www.w3.org/TR/encoding/[recommended by
the W3C for use in HTML5].

Here are some sample encodings from W3C recommendation:

* plain, latin1, utf-8, utf-16be-bom, utf-16be, utf-16le, big5, gb18030,
gbk, hz-gb-2312,
* euc-kr, euc-jp, iso-2022-jp, shift-jis, and so on
Valid encodings:

* `plain`: plain ASCII encoding
* `utf-8` or `utf8`: UTF-8 encoding
* `gbk`: simplified Chinese charaters
* `iso8859-6e`: ISO8859-6E, Latin/Arabic
* `iso8859-6i`: ISO8859-6I, Latin/Arabic
* `iso8859-8e`: ISO8859-8E, Latin/Hebrew
* `iso8859-8i`: ISO8859-8I, Latin/Hebrew
* `iso8859-1`: ISO8859-1, Latin-1
* `iso8859-2`: ISO8859-2, Latin-2
* `iso8859-3`: ISO8859-3, Latin-3
* `iso8859-4`: ISO8859-4, Latin-4
* `iso8859-5`: ISO8859-5, Latin/Cyrillic
* `iso8859-6`: ISO8859-6, Latin/Arabic
* `iso8859-7`: ISO8859-7, Latin/Greek
* `iso8859-8`: ISO8859-8, Latin/Hebrew
* `iso8859-9`: ISO8859-9, Latin-5
* `iso8859-10`: ISO8859-10, Latin-6
* `iso8859-13`: ISO8859-13, Latin-7
* `iso8859-14`: ISO8859-14, Latin-8
* `iso8859-15`: ISO8859-15, Latin-9
* `iso8859-16`: ISO8859-16, Latin-10
* `cp437`: IBM CodePage 437
* `cp850`: IBM CodePage 850
* `cp852`: IBM CodePage 852
* `cp855`: IBM CodePage 855
* `cp858`: IBM CodePage 858
* `cp860`: IBM CodePage 860
* `cp862`: IBM CodePage 862
* `cp863`: IBM CodePage 863
* `cp865`: IBM CodePage 865
* `cp866`: IBM CodePage 866
* `ebcdic-037`: IBM CodePage 037
* `ebcdic-1040`: IBM CodePage 1140
* `ebcdic-1047`: IBM CodePage 1047
* `koi8r`: KOI8-R, Russian (Cyrillic)
* `koi8u`: KOI8-U, Ukranian (Cyrillic)
* `macintosh`: Macintosh encoding
* `macintosh-cyrillic`: Macintosh Cyrillic encoding
* `windows1250`: Windows1250, Central and Eastern European
* `windows1251`: Windows1251, Russian, Serbian (Cyrillic)
* `windows1252`: Windows1252, Legacy
* `windows1253`: Windows1253, Modern Greek
* `windows1254`: Windows1254, Turkish
* `windows1255`: Windows1255, Hebrew
* `windows1256`: Windows1256, Arabic
* `windows1257`: Windows1257, Estonian, Latvian, Lithuanian
* `windows1258`: Windows1258, Vietnamese
* `windows874`: Windows874, ISO/IEC 8859-11, Latin/Thai
* `utf-16-bom`: UTF-16 with required BOM
* `utf-16be-bom`: big endian UTF-16 with required BOM
* `utf-16le-bom`: little endian UTF-16 with required BOM

The `plain` encoding is special, because it does not validate or transform any input.

Expand Down

0 comments on commit daefdb5

Please sign in to comment.