Английская Википедия:C0 and C1 control codes

Материал из Онлайн справочника
Перейти к навигацииПерейти к поиску

Шаблон:Short description Шаблон:Redirect Шаблон:Redirect Шаблон:More citations needed

The C0 and C1 control code or control character sets define control codes for use in text by computer systems that use ASCII and derivatives of ASCII. The codes represent additional information about the text, such as the position of a cursor, an instruction to start a new line, or a message that the text has been received.

C0 codes are the range 00HEX–1FHEX and the default C0 set was originally defined in ISO 646 (ASCII). C1 codes are the range 80HEX–9FHEX and the default C1 set was originally defined in ECMA-48 (harmonized later with ISO 6429). The ISO/IEC 2022 system of specifying control and graphic characters allows other C0 and C1 sets to be available for specialized applications, but they are rarely used.

Шаблон:AnchorC0 controls

ASCII defined 32 control characters, plus a necessary extra character for the DEL character, 7FHEX or 01111111BIN (needed to punch out all the holes on a paper tape and erase it).

This large number of codes was desirable at the time, as multi-byte controls would require implementation of a state machine in the terminal, which was very difficult with contemporary electronics and mechanical terminals.

Only a few codes have maintained their use: BEL, ESC, and the "Format Шаблон:Linktext" (FEn) characters BS, TAB, LF, VT, FF, and CR. Others are unused or have acquired different meanings such as NUL being the C string terminator. Some data transfer protocols such as ANPA-1312, Kermit, and XMODEM do make extensive use of SOH, STX, ETX, EOT, ACK, NAK and SYN for purposes approximating their original definitions; and some file formats use the "Information Separators" (ISn) such as the Unix info format[1] and Python's Шаблон:Tt string method.[2]

The names of some codes were changed in ISO 6429:1992 (or ECMA-48:1991) to be neutral with respect to writing direction. The abbreviations used were not changed, as the standard had already specified that those would remain unchanged when the standard is translated to other languages. In this table both new and old names are shown for the renamed controls (the old name is the one matching the abbreviation).

Шаблон:Anchor

ASCII control codes, originally defined in ANSI X3.4.[3]
Шаблон:Vert header Шаблон:Vert header Шаблон:Vert header Abbreviations Шаблон:Vert header Name Шаблон:Vert header Description
Шаблон:Tt 0 00 NUL Null Шаблон:Tt Does nothing. The code of blank paper tape, and also used for padding to slow transmission.
Шаблон:Tt 1 01 TC1, SOH Start of Heading First character of the heading of a message.[4]
Шаблон:Tt 2 02 TC2, STX Start of Text Terminates the header and starts the message text.
Шаблон:Tt 3 03 TC3, ETX End of Text Ends the message text, starts a footer (up to the next TC character).[4][5]
Шаблон:Tt 4 04 TC4, EOT End of Transmission Ends the transmission of one or more messages.[4][5] May place terminals on standby.[5]
Шаблон:Tt 5 05 Шаблон:Nowrap Enquiry Trigger a response at the receiving end, to see if it is still present.
Шаблон:Tt 6 06 TC6, ACK Acknowledge Indication of successful receipt of a message.
Шаблон:Tt 7 07 BELШаблон:Efn Bell, Alert Шаблон:Tt Call for attention from an operator.
Шаблон:Tt 8 08 Шаблон:AnchorFE0, BS Backspace Шаблон:Tt Move one position leftwards. Next character may overprint or replace the character that was there.
Шаблон:Tt 9 09 FE1, HT Character Tabulation,
Horizontal Tabulation
Шаблон:Tt Move right to the next tab stop.
Шаблон:Tt 10 0A FE2, LF Line Feed Шаблон:Tt Move down to the same position on the next line (some devices also moved to the left column).
Шаблон:Tt 11 0B FE3, VT Line Tabulation,
Vertical Tabulation
Шаблон:Tt Move down to the next vertical tab stop.
Шаблон:Tt 12 0C FE4, FF Form Feed Шаблон:Tt Move down to the top of the next page.
Шаблон:Tt 13 0D FE5, CR Carriage Return Шаблон:Tt Move to column zero while staying on the same line.
Шаблон:Tt 14 0E SO, LS0Шаблон:Efn Shift Out Switch to an alternative character set.
Шаблон:Tt 15 0F SI, LS1Шаблон:Efn Shift In Return to regular character set after SO.
Шаблон:Tt 16 10 Шаблон:Nowrap Data Link Escape Cause a limited number of contiguously following characters to be interpreted in some different way.[6][7]
Шаблон:Tt 17 11 DC1, XON Device Control One Turn on (DC1 and DC2) or off (DC3 and DC4) devices.

Teletype[8] used these for the paper tape reader and the paper tape punch. The first use became the de facto standard for software flow control.[9]

Шаблон:Tt 18 12 DC2, TAPE Device Control Two
Шаблон:Tt 19 13 DC3, XOFF Device Control Three
Шаблон:Tt 20 14 DC4, TAPE Device Control Four
Шаблон:Tt 21 15 TC8, NAK Negative Acknowledge Negative response to a sender, such as a detected error.
Шаблон:Tt 22 16 TC9, SYN Synchronous Idle Sent in synchronous transmission systems when no other character is being transmitted.
Шаблон:Tt 23 17 TC10, ETB Шаблон:Nowrap End of a transmission block of data when data are divided into such blocks for transmission purposes.
Шаблон:Tt 24 18 CAN Cancel Indicates that the data preceding it are in error or are to be disregarded.
Шаблон:Tt 25 19 EM End of medium Indicates on paper or magnetic tapes that the end of the usable portion of the tape had been reached.[3]
Шаблон:Tt 26 1A SUB Substitute Replaces a character that was found to be invalid or in error. Should be ignored.
Шаблон:Tt 27 1B ESC Escape Шаблон:Tt
Шаблон:Efn
Alters the meaning of a limited number of following bytes.
Nowadays this is almost always used to introduce an ANSI escape sequence.
Шаблон:Tt 28 1C IS4, FS Шаблон:Anchor File Separator Can be used as delimiters to mark fields of data structures. US is the lowest level, while RS, GS, and FS are of increasing level to divide groups made up of items of the level beneath it. SP (space) could be considered an even lower level.
Шаблон:Tt 29 1D IS3, GS Group Separator
Шаблон:Tt 30 1E IS2, RS Record Separator
Шаблон:Tt 31 1F IS1, US Unit Separator
While not technically part of the C0 control character range, the following two characters can be thought of as having some characteristics of control characters.
  32 20 SP Space Move right one character position.
Шаблон:Tt 127 7F DEL Delete Should be ignored. Used to delete characters on punched tape by punching out all the holes.

Шаблон:Notelist

Шаблон:AnchorC1 controls

In 1973, ECMA-35 and ISO 2022[10] attempted to define a method so an 8-bit "extended ASCII" code could be converted to a corresponding 7-bit code, and vice versa.[11] In a 7-bit environment, the Shift Out (Шаблон:Control code link) would change the meaning of the 96 bytes Шаблон:Tt through Шаблон:TtШаблон:Efn[12] (i.e. all but the C0 control codes), to be the characters that an 8-bit environment would print if it used the same code with the high bit set. This meant that the range Шаблон:Tt through Шаблон:Tt could not be printed in a 7-bit environment,[11] thus it was decided that no alternative character set could use them, and that these codes should be additional control codes, which become known as the C1 control codes. To allow a 7-bit environment to use these new controls, the sequences ESC @ through ESC _ were to be considered equivalent.[11] The later ISO 8859 standards abandoned support for 7-bit codes, but preserved this range of control characters.

The first C1 control code set to be registered for use with ISO 2022 was DIN 31626,[13] a specialised set for bibliographic use which was registered in 1979.[14]

The more common general-use ISO/IEC 6429 set was registered in 1983,[15] although the ECMA-48 specification upon which it was based had been first published in 1976[16] and JIS X 0211 (formerly JIS C 6323).[17] Symbolic names defined by Шаблон:IETF RFC and early drafts of ISO 10646, but not in ISO/IEC 6429 (Шаблон:Control code link, Шаблон:Control code link and Шаблон:Control code link) are also used.[18][19]

Except for Шаблон:Control code link and Шаблон:Control code link in EUC-JP text, and Шаблон:Control code link in text transcoded from EBCDIC, the 8-bit forms of these codes were almost never used. Шаблон:Control code link, Шаблон:Control code link and Шаблон:Control code link are used to control text terminals and terminal emulators, but almost always by using their 7-bit escape code representations. Nowadays if these codes are encountered it is far more likely they are intended to be printing characters from that position of Windows-1252 or Mac OS Roman.

ISO/IEC 6429 and RFC 1345 C1 control codes
Шаблон:Vert header Шаблон:Vert header Шаблон:Vert header Abbr Name Description[20]
@ 128 80 PAD[21] Padding CharacterШаблон:Efn Proposed as a "padding" or "high byte" for single-byte characters to make them two bytes long for easier interoperability with multiple byte characters. Extended Unix Code (EUC) occasionally uses this.[22]
A 129 81 HOP[21] High Octet PresetШаблон:Efn Proposed to set the high byte of a sequence of multiple byte characters so they only need one byte each, as a simple form of data compression.
B 130 82 BPH Break Permitted HereШаблон:Efn Follows a graphic character where a line break is permitted. Roughly equivalent to a soft hyphen or zero-width space except it does not define what is printed at the line break.
C 131 83 NBH No Break HereШаблон:Efn Follows the graphic character that is not to be broken. See also word joiner.
D 132 84 IND IndexШаблон:Efn Move down one line without moving horzontally, to eliminate ambiguity about the meaning of LF.
E 133 85 NEL Next Line Equivalent to CR+LF, to match the EBCDIC control character.
F 134 86 SSA Start of Selected Area Used by block-oriented terminals. In xterm Шаблон:Code moves to the lower-left corner of the screen, since certain software assumes this behaviour.[23]
G 135 87 ESA End of Selected Area
H 136 88 HTS Шаблон:Ubl Set a tab stop at the current position.
I 137 89 HTJ Шаблон:Ubl Right-justify the text since the last tab against the next tab stop.
J 138 8A VTS Шаблон:Ubl Set a vertical tab stop.
K 139 8B PLD Шаблон:Ubl To produce subscripts and superscripts in ISO/IEC 6429.
Subscripts use PLD text PLU while superscripts use PLU text PLD.
L 140 8C PLU Шаблон:Ubl
M 141 8D RI Шаблон:Ubl Move up one line.
N 142 8E SS2 Шаблон:Control code link Next character is from the G2 or G3 sets, respectively.
O 143 8F SS3 Шаблон:Control code link
P 144 90 DCS Device Control String Followed by a string of printable characters (0x20 through 0x7E) and format effectors (0x08 through 0x0D), terminated by ST (0x9C). Xterm defined a number of these.[24]
Q 145 91 PU1 Private Use 1 Reserved for private function agreed on between the sender and the recipient of the data.
R 146 92 PU2 Private Use 2
S 147 93 STS Set Transmit State
T 148 94 CCH Cancel character Destructive backspace, to eliminate ambiguity about meaning of Шаблон:Control code link.
U 149 95 MW Message Waiting
V 150 96 SPA Start of Protected Area Used by block-oriented terminals.
W 151 97 EPA End of Protected Area
X 152 98 SOS Start of StringШаблон:Efn Followed by a control string terminated by Шаблон:Control code link (0x9C) which (unlike Шаблон:Control code link, Шаблон:Control code link, Шаблон:Control code link or Шаблон:Control code link) may contain any character except SOS or ST.
Y 153 99 SGC,[21] SGCI[25] Single Graphic Character IntroducerШаблон:Efn Шаблон:AnchorIntended to allow an arbitrary Unicode character to be printed; it would be followed by that character, most likely encoded in UTF-1.[25]
Z 154 9A SCI Single Character IntroducerШаблон:Efn To be followed by a single printable character (0x20 through 0x7E) or format effector (0x08 through 0x0D), and to print it as ASCII no matter what graphic or control sets were in use.
[ 155 9B CSI Control Sequence Introducer Used to introduce control sequences that take parameters. Used for ANSI escape sequences.
\ 156 9C ST String Terminator Terminates a string started by Шаблон:Control code link, Шаблон:Control code link, Шаблон:Control code link, Шаблон:Control code link or Шаблон:Control code link.
] 157 9D OSC Operating System Command Followed by a string of printable characters (0x20 through 0x7E) and format effectors (0x08 through 0x0D), terminated by ST (0x9C), intended for use to allow in-band signaling of protocol information, but rarely used for that purpose.

Some terminal emulators, including xterm, use OSC sequences for setting the window title and changing the colour palette. They may also support terminating an OSC sequence with Шаблон:Control code link instead of ST.[26] Kermit used APC to transmit commands.[27]

^ 158 9E PM Privacy Message
_ 159 9F APC Application Program Command

Шаблон:Notelist

Other control code sets

The ISO/IEC 2022 (ECMA-35) extension mechanism allowed escape sequences to change the C0 and C1 sets. The standard C0 control character set shown above is chosen with the sequence Шаблон:Nowrap and the above C1 set chosen with the sequence Шаблон:Nowrap.[15]

Several official and unofficial alternatives have been defined, but this is pretty much obsolete. Most were forced to retain a good deal of compatibility with the ASCII controls for interoperability. The standard makes ESC,[28][29] SP and DELШаблон:Efn "fixed" coded characters, which are available in their ASCII locations in all encodings that conform to the standard.[30] It also specifies that if a C0 set included transmission control (TCn) codes, they must be encoded at their ASCII locations[28] and could not be put in a C1 set,[31] and any new transmission controls must be in a C1 set.[28]

Other C0 control code sets

  • ANPA-1312, a text markup language used for news transmission, replaces several C0 control characters.
  • IPTC 7901, the newer international version of the above, has its own variations.
  • Videotex has a completely different set.
  • Teletext also defines a set similar to Videotex.
  • T.61/T.51,[32] and others[33] replaced EM and GS with SS2 and SS3 so these functions could be used in a 7-bit environment.
  • Some sets replaced FS with SS2,[34] (same as ANPA-1312).
  • Шаблон:AnchorThe now-withdrawn JIS C 6225, designated JIS X 0207 in later sources.[35] replaced FS with CEX or "Control Extension"[36] which introduces control sequences for vertical text behaviour, superscripts and subscripts[37] and for transmitting custom character graphics.[35]

Replacement C1 character sets

  • A specialized C1 control code set is registered for bibliographic use (including string collation), such as by MARC-8.[14][38][39]
  • Various specialised C1 control code sets are registered for use by Videotex formats.[13]
  • EBCDIC defines up to 29 additional control codes besides those present in ASCII. When translating EBCDIC to Unicode (or to ISO 8859), these codes are mapped to C1 control characters in a manner specified by IBM's Character Data Representation Architecture (CDRA).[40][41] Although the New Line (NL) does translate to the ISO/IEC 6429 Шаблон:Control code link (although it is often swapped with LF, following UNIX line ending convention),[40] the remainder of the control codes do not correspond. For example, the EBCDIC control Шаблон:Control code link and the ECMA-48 control Шаблон:Control code link are both used to begin a superscript or end a subscript, but are not mapped to one another. Extended-ASCII-mapped EBCDIC can therefore be regarded as having its own C1 set, although it is not registered with the ISO-IR registry for ISO/IEC 2022.[13]

Unicode

Шаблон:Main Unicode inherits its first 256 code points from ISO 8859-1, hence also the 65 code points described above, giving them the general category Шаблон:Code (control). These are:

Unicode only specifies semantics for the C0 format controls HT, LF, VT, FF, and CR, (note BS is missing); the C0 information separators FS, GS, RS, US (and SP); and the C1 control NEL.[42] The rest of the codes are transparent to Unicode and their meanings are left to higher-level protocols, with ISO/IEC 6429 suggested as a default.[42]

Unicode includes many additional format effector characters besides these, such as marks, embeds, isolates and pops for explicit bidirectional formatting, and the zero-width joiner and non-joiner for controlling ligature use. However these are given the general category Шаблон:Code (format) rather than Шаблон:Code.

See also

Footnotes

Шаблон:Notelist

References

Шаблон:Reflist

Шаблон:Character encoding

de:Steuerzeichen

  1. Шаблон:Cite web
  2. Шаблон:Cite web
  3. 3,0 3,1 Шаблон:Cite iso-ir
  4. 4,0 4,1 4,2 Шаблон:Citation
  5. 5,0 5,1 5,2 Шаблон:Cite web
  6. Шаблон:Cite web
  7. Шаблон:Cite web
  8. Шаблон:Cite web
  9. Шаблон:Cite web
  10. Шаблон:Citation
  11. 11,0 11,1 11,2 Шаблон:Citation
  12. Шаблон:Citation
  13. 13,0 13,1 13,2 Шаблон:Citation
  14. 14,0 14,1 Шаблон:Cite iso-ir
  15. 15,0 15,1 Ошибка цитирования Неверный тег <ref>; для сносок 1stE не указан текст
  16. Шаблон:Citation
  17. Шаблон:Cite web
  18. Ошибка цитирования Неверный тег <ref>; для сносок Whistler2011 не указан текст
  19. Ошибка цитирования Неверный тег <ref>; для сносок Whistler2015 не указан текст
  20. Шаблон:Citation
  21. 21,0 21,1 21,2 Шаблон:Cite web
  22. Шаблон:Cite book
  23. Шаблон:Cite web
  24. Шаблон:Cite web
  25. 25,0 25,1 Шаблон:Cite web
  26. Шаблон:Cite web
  27. Шаблон:Cite book
  28. 28,0 28,1 28,2 Шаблон:Cite book
  29. Шаблон:Cite iso-ir
  30. Шаблон:Cite book
  31. Шаблон:Cite book
  32. Шаблон:Cite iso-ir
  33. Шаблон:Cite iso-ir
  34. Шаблон:Cite iso-ir
  35. 35,0 35,1 Шаблон:Cite web
  36. Шаблон:Cite iso-ir
  37. Шаблон:Citation
  38. Шаблон:Cite iso-ir
  39. Шаблон:Cite iso-ir
  40. 40,0 40,1 Шаблон:Cite web
  41. Шаблон:Citation
  42. 42,0 42,1 Шаблон:Cite book