Character data

Got it! This site uses cookies. You consent to this by clicking on "Got it!" or by continuing to use this website.nbsp; Note: This appears on each machine/browser from which this site is accessed.

A computer character is not the same thing as cartoon character.

A computer character is a single letter, digit, etc., that is used to represent legible (i.e., readable) information. There are many different character sets.

7-bit standard ASCII character set (128 values)
8-bit extended ASCII character set (256 values)
16-bit Unicode character set (65536 values)

For a character set to be useful, everyone involved must agree on what the bit pattern for each character of a character set means.

A lot of data is either stored and/or sent as character data, or text. or can be thought of as being character data.

Standard email is transmitted as text.
Standard HTML web pages are text.

A coding scheme assigns each character to a bit pattern.

EBCDIC (Extended Binary Coded Decimal Interchange Code) is an older 9 bit character code, traditionally on IBM mainframe systems.
ASCII (American Standard Code for Information Interchange) is the character code on most personal computers.

standard ASCII - 7 bits (128 characters)
extended ASCII - 8 bits (256 characters)

The standard ASCII character set (7 bit encoding) has 128 characters.

The extended ASCII character set (8 bit encoding) has 256 characters.

The standard ASCII code for "A" is 65d or 41h. Everyone agrees on that.

How does the character code for "A" appear? It depends on the context (font, size, color, etc.).

In a bit-mapped font, bits that are 0 or 1 can be made to look like letters by appropriate display techniques.

Note: A vector font is handled differently than a bit-mapped font.

Unicode is a 16-bit character code that handles additional characters as found, for example, in Chinese and Japanese. The Unicode character set has 16 bits in order to represent 65536 characters.

The first 127 characters are the same as the standard ASCII character set.

More information is available at http://www.unicode.org [as of 1, 11 17, 2003].

Languages such as Arabic and Hebrew go from right to left.

By contrast, English goes from left to right.

Here is the decimal form (for HTML usage).

אבגדהוזחט יךכלםמןנס עףפץצקרשת

Here is how it appears when formatted (as HTML).

אבגדהוזחט יךכלםמןנס עףפץצקרשת

Note that each language uses part of the 65,536 Unicode character space. The ASCII character set uses the first 256 character slots.

Here is the hexadecimal form (for HTML usage).

אבגדהוזחט יךכלםמןנס עףפץצקרשת

Here is how it appears when formatted (as HTML).

אבגדהוזחט יךכלםמןנס עףפץצקרשת

Here is how it appears in ASCII when typed from a console window.

Notice the strange characters. That is because a 16-bit Unicode character set is being typed in a Window that is expecting 8-bit ASCII characters.

Here is how the above text looks in Notepad that, by default, will use a 16-bit Unicode character set.

Hebrew text

BabelMap is a Windows character map utility that allows you to find and copy any Unicode character. ... It is free and fully functional, and there are no disabled features or time restrictions. ... Whilst BabelMap is free, if you do use it for commercial purposes, please consider making a small donation by means of PayPal to help support its continued development.

http://www.babelstone.co.uk/software/babelmap.html