If you disassemble the results of that first does it not end up exactly like your second? I would bet the end machine instructions would be almost identical
If your assembler lets you do the first, there's no reason to do the longhand besides having readability, but I'd guess readability isn't your first concern if you're writing any significant portions of your code in assembly
And here's another one, keeping in mind X86 is a family of processors that is quite 'old' and how 64 bit fits into to the tables is not immediately clear. Intel and AMD don't appear to be all that forthcoming on the info.
other than code size and having to modify a register?
two registers: ecx and flags (imul will clear cf/of, add will clear sf)
as for relevant execution speeds, go to http://www.agner.org/optimize/instruction_tables.pdf -
looking at the Intel Skylake table there, a memory mov has latency 2 for all addressing modes, while imul alone is latency 3 (and is fixed to just one execution channel). Of course it only matters if your data are ready in L1 cache (such as because you're using [ebx+ecx*4] in a loop!)
So if SHL was used instead of IMUL, does that mean that, since each instruction needs the result of the previous one, that doing SHL-ADD-MOV has a total latency of 3.5, while doing just MOV has a total latency of 2?