character encoding

I know that character encoding is the translation of a character (for example A ) into binaries 0 and 1 . However , I do not know which part in my computer does this translation ( is it the CPU or ....????) so who does this translation or encoding from characters to binaries ?? this is the first question

the second question : In which file in my computer are the characters , that will be encoded or translated to binaries , stored ?
Last edited on
Everything on a computer is binary data. The character encoding tells you how to interpret the binary data into characters.

ASCII is one character encoding in which A is represented by a byte with value 65. The computer works with numbers so it stores the value 65. The A is just a visual representation of 65 that is only relevant when it is shown on screen or printed to an image.
And when you need to classify character. byte 65 is an uppercase alphabetic non-digit non-space non-punctuation character.
Last edited on
"Everything on a computer is binary data. The character encoding tells you how to interpret the binary data into characters.

ASCII is one character encoding in which A is represented by a byte with value 65. The computer works with numbers so it stores the value 65. The A is just a visual representation of 65 that is only relevant when it is shown on screen or printed to an image"

my question was not what does the character encoding tell me ? my question was this " who does this translation or encoding from characters to binaries ??" . I did not asked about the encoding process otherwise about the part , in the computer , responsible for doing the encoding.

this :


"Everything on a computer is binary data. The character encoding tells you how to interpret the binary data into characters.

ASCII is one character encoding in which A is represented by a byte with value 65. The computer works with numbers so it stores the value 65. The A is just a visual representation of 65 that is only relevant when it is shown on screen or printed to an image"

can be an answer for my second question but it is a vague answer . In other words I did not understand exactly what do you mean by it
Last edited on
There is no such thing as character in your computer. It is just some number.
When you see character 'A' outputted it is your program asking terminal to print symbol №65. Terminal takes currently loaded font, takes out glyph number 65 and draws it. If you change terminal font to, way, windings, it will still output glyph number 65, but it will not look as 'A'.
When you enters letter it is just like that too: OS sends message that character 65 was typed and terminal draws corresponding glyph depending on loaded font, 65 is sent to your program and char-type variable (which is considered to be integral type by standard) assumes numeric value 65.

So no conversion to char happens, as there is no chars anywhere.
I'm sorry if I wasn't clear. Your initial question made it sound like you thought of character encoding as a verb. I guess "encoding" can be a verb too but usually when we talk about character encoding we use it as a noun.

In my previous (vague) answer I tried to explain that a character encoding is a format in which characters are stored. Don't think of it as the process of turning characters into this format. That is not what we mean when we say character encoding.

who does this translation or encoding from characters to binaries ??

To me this question is not clear because to me a character is binary when stored on the computer so there is no translation going on. Of course computer programs can use different character encoding so sometimes there is a need to change from one encoding to another but this is done by the program (or library used by the program).
Last edited on
"In my previous (vague) answer I tried to explain that a character encoding is a format in which characters are stored. Don't think of it as the process of turning characters into this format. That is not what we mean when we say character encoding. "

"To me this question is not clear because to me a character is binary when stored on the computer so there is no translation going on"

So you mean that characters , taking English letters as an example , are stored in the computer in binary format and the computer knows which letter is represented by these binaries ? if we take the English letters as an example

a = 01100001

b = 01100010

c = 01100011

So 01100001 and 01100010 and 01100011 are stored in the computer and the computer knows that a = 01100001 and b = 01100010 and c= 01100011?? am I right or am I wrong? if I am right :

1)where are the binaries that represents all the Unicode characters stored ? in the RAM???

2) we know that the letter 'a' is represented by this binary number 01100001 and b is represented by 01100010 because it is written in the Unicode table and because we know about the encoding process but how does the computer knows that 01100001 is 'a' and 01100010 is 'b'??
where are the binaries that represents all the Unicode characters stored ? in the RAM???

Who said anything about Unicode? If an application chooses not to store Unicode characters, then Unicode characters are not stored anywhere.

we know that the letter 'a' is represented by this binary number 01100001 and b is represented by 01100010 because it is written in the Unicode table and because we know about the encoding process but how does the computer knows that 01100001 is 'a' and 01100010 is 'b'??

The computer doesn't care. The application cares, and at some point there is just a table look-up happening within that application.
the computer knows that a = 01100001 and b = 01100010 and c= 01100011??
On lower level it does not know that. In modern terminals (encoding aware teminals and complex font files which are actually programs itself) it works roughtly like that: at the start of program it checks current encoding and loads corresponding glyphs in character table with additional info (like height and width for non-monospace fonts). And for more large encodings (like UTF-32) it usually loads character groups on demand.
It does it like that: "Ok, character number 65 is Latin letter A Capital, so we will request from current font to provide that letter at 48px size and store result in characters[65]". After that there is no additional information about which character is what. Only knowledge about character and its numeric representation is needed to fill glyph table, after that character values are only used as indexes for that table.

As to where it is stored, answer would be different depending on machine and program in question. In text mode all information in stored in video memory, in graphics mode it can be stored anywhere depending on program in question.

we know that the letter 'a' is represented by this binary number 01100001 and b is represented by 01100010
In KOI-7 H1 or JUS I.B1.003 there is no character 'b' anywhere. It does not exist it that encoding. We know that b is 01100010 in some concrete encodings.
Last edited on
"Who said anything about Unicode? If an application chooses not to store Unicode characters, then Unicode characters are not stored anywhere."

"The computer doesn't care. The application cares, and at some point there is just a table look-up happening within that application. "

When we format the computer and install for example the OS windows xp from the CD

how is this OS and its applications stored in the computer ? it is not stored in the form of 1 and 0 ??? I would like an answer to this question
it is not stored in the form of 1 and 0
It is. There is no other way to store it.
I did not write this

"it is not stored in the form of 1 and 0 "

I wrote this

it is not stored in the form of 1 and 0 ??? // it was a question :)

So NiiNiPaa your answer to this question :

is the OS system stored in the computer in the form of 1 and 0???

is

yes the OS system is stored in the computer in the form of 1 and 0

??? // again a question



Last edited on
Yes, OS, your pictures, movies, programs, and all other stuff is stored (logically) as sequence of binary digits, as computer cannot comprehend information in any other way.

So starting from this :


"Yes, OS, your pictures, movies, programs, and all other stuff is stored (logically) as sequence of binary digits, as computer cannot comprehend information in any other way"


encoding is translating all the possible characters - that I see in my computer files- in a specific binary form that the computer hardware can understand. Am I wrong?

and decoding - the opposite of encoding- is displaying a character , in its original form , by the computer hardware . Am I wrong?
Encoding (like ASCII, KOI, Shift_JIS) is an agreement on representing specific characters by specific binary sequence.
http://en.wikipedia.org/wiki/Character_encoding

You see a characters on screen because somebody wrote a programm which detects encoding used by that text, makes sure that correct glyths are selected for display and draws them for you. Asking how computer knows how to display characters is like asking how it knows how to build tanks in some strategy game. It does not know that. Somebody wrote a program which does that.

I suggest looking into how text video mode works, because it is closest to hardware and easy to understand its working. Patching glyph table on fly to use some rarely used characters for displaying GUI in pseudographics was pretty common in DOS era. Or to make program display in your native language.
closed account (SECMoG1T)
Hi all, wow this topic is quite interesting and i agree with you guys, also i got something that might help the OP even though am not very sure if it'll fits well into this context but i'll give it a trial.

 I do not know which part in my computer does this translation ( is it the CPU or ....????)

well looking at basic input output there is infact a section of the hardware available solely for this functionality , the BIOS houses a storage location{bios data area} which includes a ROM that store a lot of info including ASCI codes {+ others} and their corresponding scan codes{sc} from the keyboard and some extended keys{f1,fn,...}, it also store addresses to available ports, video mode currently in use, list of installed hardware e.t.c
This data area also houses some writable buffers such as video buffers, keyboard buffer{typhd buff}
so AFiK when you press a key on the keyboard there are a series of action that must occur

1. The keyboard sends a message{a scan code} to its local address port .
2. This message activates an interrupt routine of high priority which the cpu responds to
with a bios function call that retrieves the scan code from the port .
3. the code is used for lookup to retrieve it's corresponding asci symbol mostly stored in
hex/bin format.
4. Both the code and the asci value are then stored in the keyboard buffer awaiting
to be retrieved by the running application .

Note that some hardware might also allow you to change the software controlling the BIOS through upgrades...

hope that makes sense <>:)
Last edited on
Topic archived. No new replies allowed.