Basic Latin (Unicode block): Difference between revisions
Unlinking circular redirects: Wikipedia:Articles for deletion/List of Basic Latin characters closed as redirect (XFDcloser) |
m Reverted edit by 178.168.35.252 (talk) to last version by Drmccreedy |
||
(22 intermediate revisions by 11 users not shown) | |||
Line 1: | Line 1: | ||
{{for|a list of all Latin characters encoded in Unicode|Latin script in Unicode}} |
|||
{{also|Latin-1 Supplement|l1=C1 Controls and Latin-1 Supplement (Unicode block)}} |
|||
{{Infobox Unicode block |
{{Infobox Unicode block |
||
|blockname = Basic Latin<br/>{{nobold|1=''or''}}<br/>C0 Controls and Basic Latin |
|blockname = Basic Latin<br/>{{nobold|1=''or''}}<br/>C0 Controls and Basic Latin |
||
Line 12: | Line 10: | ||
|controls = 33 |
|controls = 33 |
||
|sources = [[ISO/IEC 8859]], [[ISO 646]] |
|sources = [[ISO/IEC 8859]], [[ISO 646]] |
||
⚫ | |note = <ref>{{cite web|url=https://www.unicode.org/ucd/|title=Unicode character database|work=The Unicode Standard|accessdate=2023-07-26}}</ref><ref>{{cite web|url=https://www.unicode.org/versions/enumeratedversions.html|title=Enumerated Versions of The Unicode Standard|work=The Unicode Standard|accessdate=2023-07-26}}</ref> |
||
|codechart = https://www.unicode.org/charts/PDF/U0000.pdf |
|||
⚫ | |note = <ref>{{cite web|url=https://www.unicode.org|title=Unicode character database|work=The Unicode Standard|accessdate= |
||
}} |
}} |
||
The '''Basic Latin''' [[Unicode block]],<ref>{{cite web|url=https://www.unicode.org/Public/UCD/latest/ucd/Blocks.txt|title=block.txt|accessdate= |
The '''Basic Latin''' [[Unicode block]],<ref>{{cite web|url=https://www.unicode.org/Public/UCD/latest/ucd/Blocks.txt|title=block.txt|accessdate=2023-03-23|publisher=The Unicode Consortium}}</ref> sometimes informally called '''C0 Controls and Basic Latin''',<ref>{{cite web|url=https://www.unicode.org/charts/PDF/U0000.pdf|title=C0 Controls and Basic Latin|work=The Unicode Standard, Version 15.0|publisher=[[Unicode Consortium|Unicode, Inc.]]|year=2022|access-date=March 22, 2023}}</ref> is the first block of the [[Unicode]] standard, and the only block which is encoded in one byte in [[UTF-8]]. The block contains all the [[ISO basic Latin alphabet|letters]] and [[ASCII control character|control codes]] of the ASCII encoding. It ranges from U+0000 to U+007F, contains 128 characters and includes the [[C0 controls]], ASCII [[punctuation]] and [[symbol]]s, [[ASCII]] [[numerical digit|digits]], both the [[uppercase]] and [[lowercase]] of the [[English alphabet]] and a [[control character]]. |
||
The Basic Latin block was included in its present form from version 1.0.0 of the Unicode Standard, without addition or alteration of the character repertoire.<ref name=Unicode1.0>{{cite book|title=The Unicode Standard Version 1.0, Volume 1|year=1990|publisher=Addison-Wesley Publishing Company, Inc.|isbn=0-201-56788-1}}</ref> Its block name in Unicode 1.0 was '''ASCII'''.<ref>{{cite web |url=https://www.unicode.org/versions/Unicode1.0.0/CodeCharts2.pdf |work=The Unicode Standard |version=version 1.0 |title=3.8: Block-by-Block Charts |publisher=[[Unicode Consortium]]}}</ref> |
The Basic Latin block was included in its present form from version 1.0.0 of the Unicode Standard, without addition or alteration of the character repertoire.<ref name=Unicode1.0>{{cite book|title=The Unicode Standard Version 1.0, Volume 1|year=1990|publisher=Addison-Wesley Publishing Company, Inc.|isbn=0-201-56788-1}}</ref> Its block name in Unicode 1.0 was '''ASCII'''.<ref>{{cite web |url=https://www.unicode.org/versions/Unicode1.0.0/CodeCharts2.pdf |work=The Unicode Standard |version=version 1.0 |title=3.8: Block-by-Block Charts |publisher=[[Unicode Consortium]]}}</ref> |
||
Line 496: | Line 493: | ||
|U+005B |
|U+005B |
||
|[ |
|[ |
||
|[[Bracket# |
|[[Bracket#Square brackets|Left Square Bracket]] |
||
| |
| |
||
|- |
|- |
||
Line 506: | Line 503: | ||
|U+005D |
|U+005D |
||
|] |
|] |
||
|[[Bracket# |
|[[Bracket#Square brackets|Right Square Bracket]] |
||
| |
| |
||
|- |
|- |
||
Line 660: | Line 657: | ||
|U+007B |
|U+007B |
||
|{ |
|{ |
||
|[[Bracket#Curly brackets |
|[[Bracket#Curly brackets|Left Curly Bracket]] |
||
| |
| |
||
|- |
|- |
||
Line 670: | Line 667: | ||
|U+007D |
|U+007D |
||
| } |
| } |
||
|[[Bracket#Curly brackets |
|[[Bracket#Curly brackets|Right Curly Bracket]] |
||
| |
| |
||
|- |
|- |
||
Line 701: | Line 698: | ||
===C0 controls=== |
===C0 controls=== |
||
The [[C0 and C1 control codes|C0 Controls]], referred to as C0 ASCII control codes in version 1.0, are inherited from ASCII and other 7-bit and 8-bit encoding schemes. The Alias names for C0 controls are taken from the [[ISO/IEC 6429:1992]] standard.<ref name=charts /> |
The [[C0 and C1 control codes|C0 Controls]], referred to as C0 ASCII control codes in version 1.0, are inherited from ASCII and other 7-bit and 8-bit encoding schemes. The Alias names for C0 controls are taken from the [[ISO/IEC 6429|ISO/IEC 6429:1992]] standard.<ref name=charts /> |
||
===ASCII punctuation and symbols=== |
===ASCII punctuation and symbols=== |
||
Line 716: | Line 713: | ||
===Control character=== |
===Control character=== |
||
The Control Character subheading contains the "Delete" character.<ref name=charts /> |
The Control Character subheading contains the [[Delete character|"Delete" character]].<ref name=charts /> |
||
==Number of symbols, letters and control codes== |
==Number of symbols, letters and control codes== |
||
Line 744: | Line 741: | ||
A variant is defined for a zero with a short diagonal stroke: U+0030 DIGIT ZERO, U+FE00 VS1 (0︀).<ref>{{cite web|url=https://www.unicode.org/L2/L2015/15268-slashed-zero.pdf|title=L2/15-268: Proposal to Represent the Slashed Zero Variant of Empty Set|date=2015-10-30|first1=Barbara|last1=Beeton|first2=Asmus|last2=Freytag|first3=Laurențiu|last3=Iancu|first4=Murray|last4=Sargent}}</ref><ref name="uts51"/> |
A variant is defined for a zero with a short diagonal stroke: U+0030 DIGIT ZERO, U+FE00 VS1 (0︀).<ref>{{cite web|url=https://www.unicode.org/L2/L2015/15268-slashed-zero.pdf|title=L2/15-268: Proposal to Represent the Slashed Zero Variant of Empty Set|date=2015-10-30|first1=Barbara|last1=Beeton|first2=Asmus|last2=Freytag|first3=Laurențiu|last3=Iancu|first4=Murray|last4=Sargent}}</ref><ref name="uts51"/> |
||
Twelve characters (#, *, and the digits) can be followed by U+FE0E VS15 or U+FE0F VS16 to create [[emoji]] variants.<ref>{{cite web|url=https://www.unicode.org/L2/L2011/11438-emoji-var.pdf|title=L2/11-438: Emoji Variation Sequences (Revision of L2/11-429)|date=2011-12-22|first=Peter|last=Edberg}}</ref><ref>{{cite web|url=https://www.unicode.org/L2/L2015/15301-emoji-sequences.pdf|title=L2/15-301: A proposal for 278 standardized variation sequences for emoji|date=2015-11-01|first=Roozbeh|last=Pournader}}</ref><ref name="UTR51">{{Cite web|url=http://unicode.org/reports/tr51/|title=UTR #51: Unicode Emoji|publisher=Unicode Consortium|date= |
Twelve characters (#, *, and the digits) can be followed by U+FE0E VS15 or U+FE0F VS16 to create [[emoji]] variants.<ref>{{cite web|url=https://www.unicode.org/L2/L2011/11438-emoji-var.pdf|title=L2/11-438: Emoji Variation Sequences (Revision of L2/11-429)|date=2011-12-22|first=Peter|last=Edberg}}</ref><ref>{{cite web|url=https://www.unicode.org/L2/L2015/15301-emoji-sequences.pdf|title=L2/15-301: A proposal for 278 standardized variation sequences for emoji|date=2015-11-01|first=Roozbeh|last=Pournader}}</ref><ref name="UTR51">{{Cite web|url=http://unicode.org/reports/tr51/|title=UTR #51: Unicode Emoji|publisher=Unicode Consortium|date=2023-09-05}}</ref><ref name="EmojiData">{{Cite web|url=https://unicode.org/Public/UNIDATA/emoji/emoji-data.txt|title=UCD: Emoji Data for UTR #51|publisher=Unicode Consortium|date=2023-02-01}}</ref> |
||
They are [[keycap]] base characters, for example #️⃣ (U+0023 NUMBER SIGN U+FE0F VS16 U+20E3 COMBINING ENCLOSING KEYCAP). The VS15 version is "text presentation" while the VS16 version is "emoji-style".<ref name="uts51">{{Cite web|url=https://unicode.org/Public/UNIDATA/emoji/emoji-variation-sequences.txt|title=UTS #51 Emoji Variation Sequences | publisher=The Unicode Consortium}}</ref> |
They are [[keycap]] base characters, for example #️⃣ (U+0023 NUMBER SIGN U+FE0F VS16 U+20E3 COMBINING ENCLOSING KEYCAP). The VS15 version is "text presentation" while the VS16 version is "emoji-style".<ref name="uts51">{{Cite web|url=https://unicode.org/Public/UNIDATA/emoji/emoji-variation-sequences.txt|title=UTS #51 Emoji Variation Sequences | publisher=The Unicode Consortium}}</ref> |
||
Line 762: | Line 759: | ||
The following Unicode-related documents record the purpose and process of defining specific characters in the Basic Latin block: |
The following Unicode-related documents record the purpose and process of defining specific characters in the Basic Latin block: |
||
{{sticky header}} |
|||
{| class="wikitable" |
{| class="wikitable sticky-header" |
||
|- |
|- |
||
! [[Unicode#Versions|Version]] !! {{nobr|Final code points<ref group=lower-alpha name=final/>}} !! Count !! [[Unicode Consortium|UTC]] ID !! [[International Committee for Information Technology Standards|L2]] ID !! [[ISO/IEC JTC 1/SC 2|WG2]] ID !! Document |
! [[Unicode#Versions|Version]] !! {{nobr|Final code points<ref group=lower-alpha name=final/>}} !! Count !! [[Unicode Consortium|UTC]] ID !! [[International Committee for Information Technology Standards|L2]] ID !! [[ISO/IEC JTC 1/SC 2|WG2]] ID !! Document |
||
Line 811: | Line 809: | ||
==See also== |
==See also== |
||
{{portal|Internet|Language}} |
{{portal|Internet|Language}} |
||
*[[Latin script in Unicode]] |
|||
⚫ | |||
*[[ |
*[[Latin-1 Supplement]] |
||
⚫ | |||
*[[ISO/IEC 8859-1]] |
|||
*[[Latin script]] |
*[[Latin script]] |
||
*[[ISO basic Latin alphabet]] |
|||
*List of Basic Latin characters in [[English language|English]], [[German language|German]], [[French language|French]], [[Spanish language|Spanish]], and [[Latin language|Latin]] |
|||
{{clear}} |
|||
==References== |
==References== |
||
Line 821: | Line 820: | ||
==External links== |
==External links== |
||
{{Spoken Wikipedia|date=2023-11-08|En-Basic Latin (Unicode block)-article.ogg}} |
|||
{{sister project links|Unicode}} |
{{sister project links|Unicode}} |
||
{{Unicode navigation}} |
{{Unicode navigation}} |
Latest revision as of 10:51, 15 September 2024
Basic Latin or C0 Controls and Basic Latin | |
---|---|
Range | U+0000..U+007F (128 code points) |
Plane | BMP |
Scripts | Latin (52 characters) Common (76 characters) |
Major alphabets | English French German Spanish Vietnamese |
Symbol sets | Arabic numerals Punctuation |
Assigned | 128 code points 33 Control or Format |
Unused | 0 reserved code points |
Source standards | ISO/IEC 8859, ISO 646 |
Unicode version history | |
1.0.0 (1991) | 128 (+128) |
Unicode documentation | |
Code chart ∣ Web page | |
Note: [1][2] |
The Basic Latin Unicode block,[3] sometimes informally called C0 Controls and Basic Latin,[4] is the first block of the Unicode standard, and the only block which is encoded in one byte in UTF-8. The block contains all the letters and control codes of the ASCII encoding. It ranges from U+0000 to U+007F, contains 128 characters and includes the C0 controls, ASCII punctuation and symbols, ASCII digits, both the uppercase and lowercase of the English alphabet and a control character.
The Basic Latin block was included in its present form from version 1.0.0 of the Unicode Standard, without addition or alteration of the character repertoire.[5] Its block name in Unicode 1.0 was ASCII.[6]
Table of characters
[edit]- A The letter U+005C (\) may show up as a Yen(¥) or Won(₩) sign in Japanese/Korean fonts mistaking Unicode (especially UTF-8) as a legacy character set which replaced the backslash with these signs.[7]
Subheadings
[edit]The C0 Controls and Basic Latin block contains six subheadings.[8]
C0 controls
[edit]The C0 Controls, referred to as C0 ASCII control codes in version 1.0, are inherited from ASCII and other 7-bit and 8-bit encoding schemes. The Alias names for C0 controls are taken from the ISO/IEC 6429:1992 standard.[8]
ASCII punctuation and symbols
[edit]This subheading refers to standard punctuation characters, simple mathematical operators, and symbols like the dollar sign, percent, ampersand, underscore, and pipe.[8]
ASCII digits
[edit]The ASCII Digits subheading contains the standard European number characters 1–9 and 0.[8]
Uppercase Latin alphabet
[edit]The Uppercase Latin alphabet subheading contains the standard 26-letter unaccented Latin alphabet in the majuscule.[8]
Lowercase Latin alphabet
[edit]The Lowercase Latin Alphabet subheading contains the standard 26-letter unaccented Latin alphabet in the minuscule.[8]
Control character
[edit]The Control Character subheading contains the "Delete" character.[8]
Number of symbols, letters and control codes
[edit]The table below shows the number of letters, symbols and control codes in each of the subheadings in the C0 Controls and Basic Latin block.
Subheading | Number of symbols | Range of characters |
---|---|---|
C0 controls | 32 control codes | U+0000 to U+001F |
ASCII punctuation and symbols | 33 punctuation marks and symbols | U+0020 to U+002F, U+003A to U+0040, U+005B to U+0060 and U+007B to U+007E |
ASCII digits | 10 digits | U+0030 to U+0039 |
Uppercase Latin Alphabet | 26 unaccented Latin letters in the majuscule. | U+0041 to U+005A |
Lowercase Latin Alphabet | 26 unaccented Latin letters in the minuscule. | U+0061 to U+007A |
Control character | 1 control code containing the "Delete" character. | U+007F |
Chart
[edit]C0 Controls and Basic Latin[a] Official Unicode Consortium code chart (PDF) | ||||||||||||||||
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
U+000x | NUL | SOH | STX | ETX | EOT | ENQ | ACK | BEL | BS | HT | LF | VT | FF | CR | SO | SI |
U+001x | DLE | DC1 | DC2 | DC3 | DC4 | NAK | SYN | ETB | CAN | EM | SUB | ESC | FS | GS | RS | US |
U+002x | SP | ! | " | # | $ | % | & | ' | ( | ) | * | + | , | - | . | / |
U+003x | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | : | ; | < | = | > | ? |
U+004x | @ | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O |
U+005x | P | Q | R | S | T | U | V | W | X | Y | Z | [ | \ | ] | ^ | _ |
U+006x | ` | a | b | c | d | e | f | g | h | i | j | k | l | m | n | o |
U+007x | p | q | r | s | t | u | v | w | x | y | z | { | | | } | ~ | DEL |
|
Variants
[edit]Several of the characters are defined to render as a standardized variant if followed by variant indicators.
A variant is defined for a zero with a short diagonal stroke: U+0030 DIGIT ZERO, U+FE00 VS1 (0︀).[9][10]
Twelve characters (#, *, and the digits) can be followed by U+FE0E VS15 or U+FE0F VS16 to create emoji variants.[11][12][13][14] They are keycap base characters, for example #️⃣ (U+0023 NUMBER SIGN U+FE0F VS16 U+20E3 COMBINING ENCLOSING KEYCAP). The VS15 version is "text presentation" while the VS16 version is "emoji-style".[10]
U+ | 0023 | 002A | 0030 | 0031 | 0032 | 0033 | 0034 | 0035 | 0036 | 0037 | 0038 | 0039 |
base | # | * | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
base+VS15+keycap | #︎⃣ | *︎⃣ | 0︎⃣ | 1︎⃣ | 2︎⃣ | 3︎⃣ | 4︎⃣ | 5︎⃣ | 6︎⃣ | 7︎⃣ | 8︎⃣ | 9︎⃣ |
base+VS16+keycap | #️⃣ | *️⃣ | 0️⃣ | 1️⃣ | 2️⃣ | 3️⃣ | 4️⃣ | 5️⃣ | 6️⃣ | 7️⃣ | 8️⃣ | 9️⃣ |
History
[edit]The following Unicode-related documents record the purpose and process of defining specific characters in the Basic Latin block:
Version | Final code points[a] | Count | UTC ID | L2 ID | WG2 ID | Document |
---|---|---|---|---|---|---|
1.0.0 | U+0000..007F | 128 | (to be determined) | |||
UTC/1999-013 | Karlsson, Kent (1999-05-27), Tildes and micro sign decompositions | |||||
L2/99-176R | Moore, Lisa (1999-11-04), "Micro Sign Case Mappings", Minutes from the joint UTC/L2 meeting in Seattle, June 8-10, 1999 | |||||
L2/04-145 | Starner, David (2004-04-30), C with stroke character examples from BAE report 1884 (Dorsey) | |||||
L2/04-202 | Anderson, Deborah (2004-06-07), Slashed C Feedback | |||||
N3046 | Suignard, Michel (2006-02-22), Improving formal definition for control characters | |||||
N3103 (pdf, doc) | Umamaheswaran, V. S. (2006-08-25), "M48.33", Unconfirmed minutes of WG 2 meeting 48, Mountain View, CA, USA; 2006-04-24/27 | |||||
L2/11-043 | Freytag, Asmus; Karlsson, Kent (2011-02-02), Proposal to correct mistakes and inconsistencies in certain property assignments for super and subscripted letters | |||||
L2/11-160 | PRI #181 Changing General Category of Twelve Characters, 2011-05-02 | |||||
L2/11-261R2 | Moore, Lisa (2011-08-16), "Consensus 128-C3", UTC #128 / L2 #225 Minutes, Accept Ken Whistler's recommendations in L2/11-281 on name aliases for control characters with the addition of the abbreviations BEL and NUL. | |||||
L2/11-438[b][c] | N4182 | Edberg, Peter (2011-12-22), Emoji Variation Sequences (Revision of L2/11-429) | ||||
L2/15-107 | Moore, Lisa (2015-05-12), "Consensus 143-C5", UTC #143 Minutes, Add the 12 keycap sequences in emoji-data.txt as provisional named sequences in Unicode 8.0. | |||||
L2/15-268 | Beeton, Barbara; Freytag, Asmus; Iancu, Laurențiu; Sargent, Murray (2015-10-30), Proposal to Represent the Slashed Zero Variant of Empty Set | |||||
L2/15-301[d][c] | Pournader, Roozbeh (2015-11-01), A proposal for 278 standardized variation sequences for emoji | |||||
L2/15-254 | Moore, Lisa (2015-11-16), "B.12.1.2 Proposal to Represent the Slashed Zero Variant of Empty Set", UTC #145 Minutes | |||||
L2/17-294 | N4914 | Lunde, Ken (2017-08-14), Proposal to add standardized variation sequence for U+FF10 FULLWIDTH DIGIT ZERO | ||||
L2/22-019 | Scherer, Markus; et al. (2022-01-19), "F.2 F4: U+0019 in ISO vs. NameAliases.txt vs. chart/NamesList.txt", UTC #170 properties feedback & recommendations | |||||
L2/22-016 | Constable, Peter (2022-04-21), "Consensus 170-C24", UTC #170 Minutes, For U+0019, add a Name alias "EM" of type abbreviation, for Unicode version 15.0. | |||||
|
See also
[edit]- Latin script in Unicode
- Latin-1 Supplement
- Character encoding
- ISO/IEC 8859-1
- Latin script
- ISO basic Latin alphabet
References
[edit]- ^ "Unicode character database". The Unicode Standard. Retrieved 2023-07-26.
- ^ "Enumerated Versions of The Unicode Standard". The Unicode Standard. Retrieved 2023-07-26.
- ^ "block.txt". The Unicode Consortium. Retrieved 2023-03-23.
- ^ "C0 Controls and Basic Latin" (PDF). The Unicode Standard, Version 15.0. Unicode, Inc. 2022. Retrieved March 22, 2023.
- ^ The Unicode Standard Version 1.0, Volume 1. Addison-Wesley Publishing Company, Inc. 1990. ISBN 0-201-56788-1.
- ^ "3.8: Block-by-Block Charts" (PDF). The Unicode Standard. version 1.0. Unicode Consortium.
- ^ Michael S. Kaplan (2005-09-17). "When is a backslash not a backslash?". Sorting it all Out. Microsoft. Archived from the original on 2010-06-12. Also available at: http://archives.miloush.net/michkap/archive/2005/09/17/469941.html
- ^ a b c d e f g "Unicode 6.2 code charts" (PDF). The Unicode Standard. Retrieved 1 April 2013.
- ^ Beeton, Barbara; Freytag, Asmus; Iancu, Laurențiu; Sargent, Murray (2015-10-30). "L2/15-268: Proposal to Represent the Slashed Zero Variant of Empty Set" (PDF).
- ^ a b "UTS #51 Emoji Variation Sequences". The Unicode Consortium.
- ^ Edberg, Peter (2011-12-22). "L2/11-438: Emoji Variation Sequences (Revision of L2/11-429)" (PDF).
- ^ Pournader, Roozbeh (2015-11-01). "L2/15-301: A proposal for 278 standardized variation sequences for emoji" (PDF).
- ^ "UTR #51: Unicode Emoji". Unicode Consortium. 2023-09-05.
- ^ "UCD: Emoji Data for UTR #51". Unicode Consortium. 2023-02-01.