Computers

closed account (LN7oGNh0)

I was wondering whether the computers use a specific language. Like, I use C++, but but can a computer be reprogrammed by using code that is only in C++? If not, do computers even have language that was used to program them?

Thank you.

Duthomhas (13129)

Short answer:
Take a course in machine architecture.

Not-as-short answer:
Computers operate using machine language
http://en.wikipedia.org/wiki/Machine_code

The very first thing someone will typically do is program an "assembler", which takes a text file with a simplified set of human-friendly "opcodes" and converts them to machine code.
http://en.wikipedia.org/wiki/Assembly_language

Once that is done, it is easy to write compilers for higher level languages, like C++.

Last edited on

closed account (LN7oGNh0)

Thanks.

helios (17506)

I think Lisp Machines used to work like that, though I'm not sure.
The problem is that parsing text is a rather complex operation, and there's little point in designing a general purpose chip capable of doing that. After all, you're not going to hook up your keyboard directly to it.

If not, do computers even have language that was used to program them?

Yes. Machine language. Instead of the above, it's much simpler to design a processor that understands a very simple language.
For example, this (pretend the text is actually encoded into a compact binary representation):

push 1
load
push 2
load
add
push 2
div
push 0
store

is simpler than this (pretend this is valid C++):

(*0) = ((*1) + (*2)) / 2;

Modern x86 CPUs go even further than machine language. When you compile C++, what you get after an initial step is Assembly code. An assembler then assembles this into actual machine language, which is what the CPU reads from memory. Because this is the lowest level of communication you can have with the CPU, this is its machine language. However, internally the CPU performs its own sort of compilation into what is known as microcode. This divides the individual instructions into even simpler operations that deal directly with the internal components of the CPU. This is done to take advantage of instruction-level parallelism and such.

Last edited on

chrisname (7395)

It depends how deep into processor architecture you want to go. Since I love the subject, and explaining it will also help me to concrete the information in my own head, I'll explain it as thoroughly as I can. If I make any mistakes, hopefully someone can correct me.

High-level:
At the highest level of processor architecture (but below that of languages like C++) you have assembly languages. They are human-readable in that they use names instead of numbers to represent instructions to the processor. The names are referred to as mnemonics, and the numbers they represent as opcodes or operation codes. Assembly languages are almost unstructured in that the processor starts at the first instruction and works its way "down" the program (if you imagine a program as being a series of instructions oriented vertically; alternatively you could imagine them in memory oriented horizontally from left-to-right, in which case the CPU would read the instructions that way) towards the last instruction. In a file, an assembly program presents itself as a series of mnemonics (such as MOV, which is short for "move" and tells the CPU to move (copy) data from one place to another) with zero or more parameters (like function parameters in C++) that give the instruction context (e.g., on Intel and related processors, "MOV EBX, EAX" is converted into an opcode that tells the CPU to copy the contents of the register EAX into the register EBX. Registers are named areas of storage, either on the CPU die (circuitboard) itself or in main memory (RAM). On modern CPUs, the latter is uncommon because of the massive disparity between CPU and memory clock speeds (the clock will be explained later). Older CPUs (like the MOS 6502) used main memory instead of registers because it was cheaper/more abundant than CPU die space. "Virtual" assembly languages like Java or Python bytecode and C#'s Common Intermediate Language (CIL, pronounced "sill") tend not to use registers because there the registers are stored in memory anyway, so there's no speed advantage. These are referred to as stack-based architectures because they use the stack (explained later) to store data, as opposed to the register-based ones that are more common in hardware. There are also a number of registers used internally on virtually all processors that you as a programmer can't access directly; I'll explain those later). Assembly language is a 1:1 representation of a lower-level language which is called machine code. This is virtually identical to assembly language (which is what 1:1 representation means) except that the mnemonics (names for instructions) are converted into opcodes (the names that the mnemonics refer to).

Low-level:
Assembly languages can be loosely grouped into two categories, CISC (Complex Instruction Set Computer) and RISC (Reduced Iinstruction Set Computer) architectures. Processors that follow a CISC architecture (like Intel and AMD's x86 (16 and 32-bit) x86-64 (64-bit) processors) have more complex instruction sets than those that follow RISC architecture (like the MOS 6502 found in the NES and Commodore 64 and ARM processors found in most smartphones). All other things being equal, RISC processors are demonstrably faster than CISC, but while modern Intel processors present a CISC architecture to the programmer, it's actually converted into RISC microcode in hardware. The microprogram is then used to actually control the CPU's circuits.

Lower-level:
The two most important circuits involved in operation of the CPU are the ALU (Arithmetic/Logic Unit) and CU (Control Unit). The CU responds to clock cycles from the clock. The clock is generally a crystal (like quartz) which oscillates at an accurate and constant frequency when a current is passed through it. This frequency is the clock speed. Modern CPUs often have clock speeds upwards of 4 GHz, meaning there are 4,000,000,000 clock cycles per second. The CPU does some work in every clock cycle, although different functions require different amounts of time, and therefore different numbers of clock cycles. Every clock cycle, the CU will fetch an instruction from memory. The way it does this is dependent on the processor architecture, but generally it works as follows:
1. CU copies the value of the PC (Program Counter - the address in memory of the current instruction to read) into the MAR (Memory Address Register - tells the CPU where in memory to read), increments the PC (sets it to the next address) and then loads the data at the address specified by the MAR into the MDR (Memory Data Register - the last piece of data to be read from memory). The contents of the MDR are then copied into the IR (Instruction Register). The reason for this being so roundabout and using so many registers is to do with the different circuits interacting (the part of the CPU that reads and writes to and from memory is a different circuit to the CU). In practice some CPUs probably do it more directly for efficiency, but in theory this is how it works. This stage is called the fetch cycle and it starts the fetch-decode-execute instruction cycle that most processors use to execute instructions. The registers mentioned here are the registers I mentioned before that can't be accessed directly by the programmer.
2. The contents of the IR is decoded by the decoder (possibly a separate circuit to or sub-circuit of the CU). This involves finding out what addressing mode is being used (for example, whether the instruction references registers, values in memory, "immediate" values (ones embedded in the program) or a mixture of the three. Mnemonics actually assemble into different opcodes depending on the addressing mode) and on some processors what the address size is. On Intel processors, for example, if the CPU is operating in 16-bit mode and the opcode is 0x67, it tells the CPU that the next instruction should be executed in 32-bit mode (so assembly language isn't quite a 1:1 representation of machine code, but it's close enough). If the instruction is an I/O (Input/Output) or register instruction, then the CU moves on to the last stage and skips the rest of this one. If the instruction is a memory instruction, then the execution happens in the next clock cycle (i.e., the next stage isn't executed immediately). If the addressing mode is indirect, then before the next clock cycle, the CPU loads the effective address from memory and sets up the appropriate data registers in time for the last stage where the instruction is actually executed. This stage is referred to as the decode cycle.
3. Finally, the CU invokes other circuits of the CPU, such as the ALU, using control signals to pass the data it has gathered and execute the instruction. The ALU then performs any mathematical or logical operations and sends a signal back to the CU telling it about what happened so that it can set the appropriate registers to the appropriate values, changing the state of the computer. You can therefore model a CPU as a finite state machine, transitioning from state to state to get work done. This last stage is the execute stage. Note that although the PC was incremented in the fetch stage, depending on what the ALU does, it can be changed to somewhere else in memory (so the programmer can control the flow of the program).

Last edited on

chrisname (7395)

The cycle is then repeated until the CPU is powered off. The cycle starts when it is powered on; usually the CPU will start with a pre-determined value in the PC. On most computers this is the area in memory where ROM starts - on the IBM PC (personal computer, not program counter) it points to an address which invokes the BIOS and starts the boot process, which hopefully results in an OS being loaded and the user being able to do lots of useful stuff, like write about processor architecture.

Lowest-level:
At the lowest level, processors can be reduced to a series of circuits operating on binary logic, where everything is represented as either 0 or 1, off or on, false or true. It's very complicated to explain processors purely as digital circuits, but that's what they really are - a series of separate digital circuits operating on Boolean logic and interacting with one another. The CPU, therefore, is the emergent result of the combination of the circuits.

Conclusion:
Every CPU has its own language, its assembly/machine code language, and some even convert this into an even lower-level language called microcode before actually executing it. Below that, though, every CPU operates on Boolean (true or false) logic, and below even that, on physical phenomena such as the flow of electrons from negative to positive charge (which is what we call electric current). Also, virtually all modern processors are based on one architecture, the Harvard architecture, which is based on another one called the von Neumann architecture. The relevant Wikipedia articles are a good place to start for those.

Note: This isn't truly finished yet, I didn't get a chance to explain the stack or the concept of instruction pipelining (where instructions are executed almost in parallel and out-of-order) which are important concepts. It's just too much to write in one go. I basically wrote a 1600-word essay in about two hours. If only I could apply myself like this to actual school work...

Last edited on

closed account (3qX21hU5)

Well that was a wall of text must be what machine language looks like ;p

chrisname (7395)

Yeah, sorry about that, it's a lot more readable in a word processor.

Topic archived. No new replies allowed.