Английская Википедия:ARIB STD B24 character set
Шаблон:Short description Шаблон:Infobox character encodingШаблон:Infobox character encoding Volume 1 of the Association of Radio Industries and Businesses (ARIB) STD-B24 standard for Broadcast Markup Language[1] specifies, amongst other details, a character encoding for use in Japanese-language broadcasting. It was introduced on 1999-10-26.[1] The latest revision is version 6.3 as of 2016-07-06.
It includes a number of Шаблон:Nihongo not found in the base standards (JIS X 0208 and JIS X 0201). It was the source standard for many symbol characters which were added to Unicode, including portions of the Miscellaneous Symbols, Enclosed Alphanumeric Supplement and Enclosed Ideographic Supplement blocks.[2] Its contributions partially overlap the Unicode emoji, but were added a year earlier, in Unicode 5.2.[3]
Fascicle 1 of the ARIB STD-B62 standard, published in 2014, defines Unicode mappings for a selection of the B24 extended characters (excluding, for example, those duplicated by JIS X 0213), as well as a few extended Kanji.[4] It also includes a mapping of utilised characters outside the Basic Multilingual Plane to the BMP's private use area.
Sets and codes
Шаблон:See also The ARIB STD B24 standard defines multiple character sets and a method of switching between them. These include a Kanji set (an extension of JIS X 0208), an Alphanumeric set, a Hiragana set, Katakana sets of two distinct layouts and four mosaic sets.[5] The sets are selected using ISO 2022 mechanisms for 94-sets, using the following codes (proportional sets use the same layout as the corresponding non-proportional ones):[6]
Set | Type | Code (column/line) | Code (hexadecimal) | Code (ASCII character) | Comments |
---|---|---|---|---|---|
Kanji | 2-byte | 4/2 | 42 | B |
The escape code B used for the ARIB Kanji set[6] is used for the 1983 version of JIS C 6226 (JIS X 0208, of which the ARIB Kanji set is an extension) in ISO-2022-JP.[7][8]
|
Alphanumeric | 1-byte | 4/10 | 4A | J |
JIS_C6220-ro (ISO646-JP, JIS X 0201 Roman set). Similar to ASCII, with two assignments differing. Escape code J matches usage in ISO-2022-JP.[8]
|
Proportional alphanumeric | 1-byte | 3/6 | 36 | 6
| |
Hiragana | 1-byte | 3/0 | 30 | 0 |
Hiragana themselves follow the same layout as row 4 of JIS X 0208, but without a lead byte. Also adds several additional assignments for punctuation. |
Proportional Hiragana | 1-byte | 3/7 | 37 | 7
| |
Katakana | 1-byte | 3/1 | 31 | 1 |
Katakana themselves follow the same layout as row 5 of JIS X 0208, but without a lead byte. Also adds several additional assignments for punctuation. |
Proportional Katakana | 1-byte | 3/8 | 38 | 8
| |
JIS X 0201 Katakana | 1-byte | 4/9 | 49 | I |
JIS_C6220-jp (JIS X 0201 Kana set). Escape code matches usage in ISO-2022-JP-3. |
Mosaic A | 1-byte | 3/2 | 32 | 2 |
Pseudographics |
Mosaic B | 1-byte | 3/3 | 33 | 3
| |
Mosaic C | 1-byte | 3/4 | 34 | 4 |
Non-spacing pseudographics |
Mosaic D | 1-byte | 3/5 | 35 | 5
|
Code charts
Kanji (double-byte) set
This is a double-byte character set extending JIS X 0208.
Lead byte
The encoding bytes correspond to the row or cell number plus 0x20, or 32 in decimal (see below). Hence, the code set starting with 0x21 has a row number of 1, and its cell 1 has a continuation byte of 0x21 (or 33), and so forth. Most of the code corresponds to JIS X 0208.
Character sets 0x21-0x74 (row numbers 1-84: punctuation, alphabets, numbers, Kana, Kanji)
Шаблон:AnchorCharacter set 0x7A (row number 90, traffic symbols)
Characters 90-45 through 90-63 and 90-66 through 90-84 (shown below shaded) are listed in the B24 standard only in table 7-10 (the list of extension characters), and are also the only characters in rows 90 through 91 which are not transport-related symbols; this is noted in the B24 standard in an endnote to table 7-10.[9] The remainder of the extensions are listed in both table 7-4 (the double-byte code chart) and table 7-10.[9]
Шаблон:AnchorCharacter set 0x7B (row number 91, map symbols)
Шаблон:See also Characters from ARIB STD-B24 which were not retained in ARIB STD-B62 are shown shaded.
Шаблон:AnchorCharacter set 0x7C (row number 92, units, enclosed forms, list markers, arrows)
Characters from ARIB STD-B24 which were not retained in ARIB STD-B62 are shown shaded.
Шаблон:AnchorCharacter set 0x7D (row number 93, game and weather symbols, fractions, units, enclosed forms)
Characters from ARIB STD-B24 which were not retained in ARIB STD-B62 are shown shaded.
Шаблон:AnchorCharacter set 0x7E (row number 94, list markers)
Characters from ARIB STD-B24 which were not retained in ARIB STD-B62 are shown shaded.
Single-byte sets
Alphanumeric set
Hiragana set
Katakana set
JIS X 0201 Katakana set
Mosaic sets
Shift_JIS variant
In addition to the modified ISO 2022 encoding, the B24 standard also specifies a Shift JIS encoding following JIS X 0208:1997, but with the addition of the extended characters in the kanji set.[10] Шаблон:Shift-JIS byte map extended
See also
Footnotes
References
Further reading
- Шаблон:Cite book
- Шаблон:Cite book (NB. Translated into Japanese and Chinese in 2002.)
External links
- Official changelog for ARIB STD-B24Шаблон:In lang
- STD-B24 and others, List of ARIB Standards in the Field of Broadcasting (ARIB)
- ↑ 1,0 1,1 Ошибка цитирования Неверный тег
<ref>
; для сносокarib1999
не указан текст - ↑ Шаблон:Cite web
- ↑ Шаблон:Cite web
- ↑ Шаблон:Harvp
- ↑ Шаблон:Harvp
- ↑ 6,0 6,1 Шаблон:Harvp
- ↑ Шаблон:Cite iso-ir
- ↑ 8,0 8,1 Шаблон:IETF RFC (IETF)
- ↑ 9,0 9,1 Шаблон:Harvp
- ↑ Шаблон:Harvp