• Forum
  • Lounge
  • Decompile and recompile for another arch

 
Decompile and recompile for another architecture

I have a binary file which has been compiled for processors with 24-bit registers and 24-bit addressing. I'd like to "rehost" the application on an x86 or x64 environment.

Do you think it's easier to decompile the code, then recompile the code for the appropriate architecture, or to emulate the 24-bit architecture?
I think it would be easier to emulate it - with 24-it addressing, the calculations to access specific words (and bytes) would be a pain to adjust. Of course, it all depends on what tools are available to you - maybe there is software that makes one option or the other dramatically easier.
Last edited on
if its a class file or a pyc file then it should be easy to decompile, but ive never had much luck decompiling exectable binary files
@xkcd reference: he said it was a binary file, aka machine code file, and that it was in 24-bit architecture, which is a detail that would not be present for Python or Java.

Executable binary files can be disassembled quite easily with the right tools - the question is running the 24-bit arch code in an x86 or x64 arch environment.
Last edited on
he said it was a binary file
binary != machine code. class/pyc are binary files. just in bytecode instead of machine code.

and that it was in 24-bit architecture, which is a detail that would not be present for Python or Java.
got me there

Executable binary files can be disassembled quite easily with the right tools - the question is running the 24-bit arch code in an x86 or x64 arch environment.
could you link to some? i believe you, i just havent ever found any beyond boomerang
xkcd reference wrote:
binary != machine code. class/pyc are binary files. just in bytecode instead of machine code.
When a programmer uses the word "binary", one of the things it can mean is machine code executable. A JAR file or Class file is not a binary. From the context of his first post, I am sure he meant machine code executable file. The key is "a binary" or "binaries", which are invalid to use for the other interpretations of the word.
xkcd reference wrote:
could you link to some? i believe you, i just havent ever found any beyond boomerang
Have you ever used gdb?
Last edited on
Sure, but gdb is only capable of correctly disassembling code that's in front of it, as it's running it. Disassembling a binary in general is non-trivial. It may even be similar in difficulty to the halting problem.
Am I missing something here? How do computers even execute binaries with your logic?
Last edited on
Computers execute the code. They assume that the data the program counter points to is executable code and that the PC doesn't point to the middle of an instruction. Disassemblers don't run the program, and wouldn't be useful if they did. Without running the program, telling apart code from data is much harder, and that's not even taking into account self-modifying code and other such trickery.

The best you can do is what IDA does: start from the entry point and find all jumps from that location. The destinations are assumed to be valid code. Return instructions and jumps to the middle of another branch halt interpretation of the branch. The algorithm ends when all branches have been fully interpreted. Memory addresses that haven't been interpreted as part of a branch may be assumed to contain data or garbage.
This already is pretty complex, and doesn't even help that much when dealing with code that's modified at run time.
Don't some executable formats enforce strict separation of data and instructions? I thought all did...
Yes, but that doesn't mean you can't intertwine data and code if you really want to. At least, not for PE. I'm not sure about the default page permissions in the various Unices.
And even then, alignment and dynamic code is still a problem.
Here's my two cents.

OP haven't really specified what is that 24-bit architecture he's talking about. If it uses same-sized instructions (like PowerPC does), disassembling will be much easier. But even if the binary is successfully disassembled, there's a problem of translating it to a different architecture, which has a different instruction set, different registers, etc.

If I had a choice between writing such translator, or writing an emulator, I would probably chose the latter. Unless there would be a high risk of performance problems, i.e. the original processor is not an order of magnitude slower than the target processor.
Topic archived. No new replies allowed.