Not really a beginners question but.....

Pages: 12
So I'm working on a dll that is supposed to be like an addon to a scripting language. The scripting language (autoit) can be unbearably slow during certain operations. I'm finding out alot of the reasons why during this journey....like for some ungodly reason a 4 byte int gets stored in a 40-48 byte sized block...lmao do I need say more?

To the point at hand I'm using heapwalk() to search my processes memory once its found read this and that the purpose is to sort arrays. Which I'm accomplishing. Its done and close to 100x faster than the scripting language. Mission accomplished yippee.

What I would like to be able to do is create my own heap and have the changes in the actual heap mirror my changes and vice versa? I can't quite wrap my mind around how to accomplish this. I've seen such a thing as a mapped file, that seems to maybe suite my needs. I'm looking for performance and efficiency. Creating a new heap probably isn't that. Can I get some guidance or ideas as to how to go about this?

I will say something that would be amazing is if I was able to do the above AND simply pluck a pointer out of that heap and dereference it. That's another question. I'm using readprocessmemory and writeprocessmemory to read and write the relevant data. Why is it since that I'm in the same process that I can't set a pointer to a memory address and dereference it? I'm assuming that it has something to do with the protections. I could of sworn I tried using virutalprotect and it didn't solve my issues on that, it was a very early test so it may have been something else. Idk let me know. Any ideas would be great thanks.
there is a large gap between where you started to your question and the question.
you went from sorting an array, which you said is done, to a heap (what is this for? is this a data structure or your own memory pile?) which mirrors your changes (to what??)

a memory mapped file is an efficient tool for dealing with disk files. Do you have a disk file in play?

reading and writing the memory of another process is normally prevented by the operating system for a number of reasons (stability and security related for most of them). WriteProcessMemory is doing some behind the scenes work to bypass this; a raw pointer can't do that. You can bypass yourself, but then you are just re-creating the WPM function....

There is probably a way to tell the OS that your program is 'special' and allowed to do whatever. I think it is bad practice to do things this way unless your program really is a system program -- those protections are in place for good reasons and if everyone bypassed them we are back to DOS like behavior and all the problems it had. And your virus protection may go haywire over it too. I would not go down this path just to have off a few nanos.
readprocessmemory requires PROCESS_VM_READ access for the specified process. You set lpBaseAddress to the start address of the required process and then you specify the number of bytes to read. If you're doing this for another process, then that is a different ball game and requires SeDebugPrivilege for that process. To get the base address for a process use VirtualQueryEx()
Ok just to clarify the cpp I'm working on is a dll that I manually open a handle to or attach my process to. The environment is my own process but I'm viewing it through the lense of a dll that my process has loaded into memory.

Trust me I'm able to read and write etc no problems I'm trying to find better ways. The array I'm referring to is an array that is created by the scripting language and I'm manipulating it outside of their sandbox.

I mean what's the difference between making calls to a dll and loading into my address space? Do I need to use load library() ? I'm fairly decent at this by now but all these intricacies get confusing.
Last edited on
so far so good, but what is the problem you are trying to solve?
is it avoidance of a copy of the array? You can use shared memory for that. Or you can make a getter that just passes back the pointer. If a DLL hands you a pointer to data it owns, you can use it as if it were yours, or should be able to, just from a standard function call (not trying to find and poke at it from process memory or anything).
That's the thing this scripting language everything is a copy or a reference. There's absolutely no way to pass a pointer to the proprietary data type that this language uses. So even if I pass a reference to the dll There's like a wall that isolates the reference from being changed. I call it the great wall of autoit inefficiency.

I've figured out how everything is laid on in the memory so finding what I need to is simple. I've basically done what I set out to which was to blow the doors off the built in sort function. Which even though my dll does have some gristle left in it I managed to beat it in a speed comparison of roughly 100x faster maybe more depending on how you figure it. My sort uses qsort and takes about 300ms for 200000 ints from there and back vs the built in takes like 30000ms. I know not as fast as could be but There's other loops and checks that need to happen for it to work right.. The people looking at my script think it's magic or something bc the script that calls my dll doesn't appear to even touch the array and boom its sorted.

Keep in mind I don't have access to the source code of the interpreter that executes the scripts. So I only have outside in access but my dll is either running in the same process or a child process. That's why I don't understand why I don't own the memory. Maybe I'll go back and try changing the protection on it again. Can I only change protection on memory I own? Do I need to allocate the memory in order to change the protection? I'm guessing this function exists for this reason. Idk. I'm stuck in a windows system programming rabbit hole.

yea this kind of thing can be a brick wall.

you can try to find a c++ interpreter for the language, and add to it some getter/setter functions that let you have full access to the data mid-script...

you can see if the scripting language supports call outs to other programs or libraries. Maybe going in reverse (mid script, call the program you wrote) is more viable?

you can see if another script language or getting rid of scripting in favor of real code is an option. Most scripting languages are terribly slow. JS isnt bad and has automatic threading...

you can look at other alternatives. If you just want to sort something, maybe you can... provide the data sorted, so the script does not need to do that anymore?

there are probably other alternatives along these lines that go around the problem instead of facing it head on that may work.

I am not sure -- anyone else? -- if a memory mapped file can share across processes in the same ram? If not, you can cheeze this with a 'ram disk' and share a temporary 'disk file' that is really just a section of ram... then both processes can open the 'file' and so on.
Last edited on
It does support callouts that's how the dll knows when to do what. Maybe I should have been more clear. The dll I created gets opened into the script via their dllopen(). Then whenever a function call from that script is made to my dll. I'm assuming that the scripts running process or a new child executes in my dll what I have it set to do. That's what's strange bc my dll should have hypervisor access to everything. When I call heapwalk I'm using getprocessheap() to grab the handle to my own process default heap which is where the relevant data is stored. It's completely intentional, it's not some backdoor injection or anything.

Let me rephrase I call getprocessheap() using getcurrentprocess() as the handle to the process.

Edit I'm almost thinking that the memory belongs to another thread and that might be the issue. Maybe if I grab handles to the other threads or something. Might have to put a mutex on it and change the protections...idk. I got lots of playing around to do.
Last edited on
Last edited on
I just figured it out... its so dumb. Like I can't even believe that this wasn't like the first thing that I tried. I guess i was more interested in running down other details but as this project gets more complicated needing to simply deference a pointer is really starting to be a thorn in my ass.

So the moral of the story is I know i was in the same process and I just checked to see if it was the same thread as the calling thread... and it was... So im like duhh... what could possibly stop me from accessing the memory. So I simply opened a handle using open process with process all access and inherited handles and boom it works. Its so dumb completely rediculous that using GetCurrentProcess() doesn't just return a duplicate handle or something I have no idea why that makes a difference. I thought handles were like keys you put into a slot but apparently you're registering yourself for certain privilege's with the system itself bc i can't believe that my dll wouldn't just be automatically granted all access considering its just the same everything... I can't even wrap my mind around why it would matter at all... Will definitely make things way more efficient.

Its just really annoying to traverse down memory paths without the ability. It will really simplify some things. Was just super annoying and i was having to hold onto too many variables bc i had to save so much extra information... freaking unreal
Last edited on
You are usually doing yourself a favor to assume that if something seems 'dumb' or poorly designed or slow or whatnot, to try to unravel WHY it was done the way it was and whether you are right or not. Plenty of bugs and trash out there, and a lot of great stuff too. It can go either way.

at that point, if you are right, you can code around it and/or lodge a complaint to the authors that details the problem and so on.
If you are wrong, you will gain insight into something you had overlooked and will realize they did it right and why.
Either conclusion benefits you, if you have time to check it out. If you lack the time, its probably wise to hold off on judging it as well, though it does feel good to blow off the steam once in a while :)

Its likely all these hoops are exactly as you noted, part of the OS/security infrastructure at the process level. Some stuff is just annoying, because bad people ruined it for everyone.
Last edited on
I've been digging around in the memory long enough to almost wonder if the ridiculous size of things was done intentionally. I've figured out the different pattern for how things are stored. It's really not that miraculous bc it follows a similar pattern to everything else except its just bigger. Stupidly big. For instance.... the way I search for the arrays in the current virtual memory is by the size. The array pp** is stored in a memory block that is 568 kb for plain old script. If it's compiled that number drops to like 532 or something... the pattern is as follows. [ 4byte int that is always 1...., 4byte pp** , 4 byte int size, 4 byte int size] the rest is mainly zeros, I did see one number incrementing, maybe it's the stack frame for the function aswell but it stays there forever bc it's being used by reference and other arrays that aren't being used still solely live in blocks that big. *(pp**) is just a contiguous length of pointers that point to the data if it's some number type but again each one is 40-48bytes. I'm guessing bc its a variant that they want it big incase a string gets pushed in there but that's a big if that is the destroyer of all things efficient. The interpreter obviously has no foresight bc it could literally just read the script and determine if a string will be pushed at some point or if it's even possible.

If you ask me it's the age old tale of a partnership gone wrong and the other guy was the low level guy. B4 he left he probably added in a bunch of bloat to ensure terrible performance. The terrible performance isn't some well guarded secret, everyone knows. They avoid the built in arraysort like it's covid. I was told to seek other alternatives buhahahaha...k. Their devs are all scripters with little to no lower level experience. I've been a forum member over there for 9+ years I think and I've been laughed at , ridiculed, made to feel inferior.
... now whose laughing. I'm handing out humble pie like I'm having a bake sale.

The sweetest part out of this whole deal is that this one wannabe super brain over there spit roasted me about 2 years ago bc I was talking about pointers and wrote a script in pure autoit that would spit out a pointer and behave somewhat similar to a pointer. The mods came in and put a disclaimer on the post and said it was a bunch of misinformation that autoit variables didn't have pointers.... the original roaster had the nerve to comment on my post showing off a very beta dllsort and even linked to the post where this original buisness all happened. I replied, "ya the post where you acted like I was a crazy person who had no idea what I was talking about "....here you'll live this test... and the test returned a pointer to the autoit type variable and I demonstrated reading and writing to it in the script where it was plain as day that I was writing over here-----> but the variable I was writing to was no where near what I was doing but it was mirroring the changes I was making. They allow the use of a DllStructCreate to get and use pointers but theres no included way to grab a pointer to the autoit proprietary datatype. Until now anyways. Happy days.
I've been digging around in the memory long enough to almost wonder if the ridiculous size of things was done intentionally.

Clearly, it was intentional:
LarsJ wrote:
Data in AutoIt variables is internally stored in variants. Data in AutoIt arrays is internally stored in safearrays of variants.

https://www.autoitscript.com/forum/topic/201355-pointers-and-dereferencing/

This isn't particularly unreasonable.
Yep that's me. I was incorrect in saying that autoit variables are pointers, they're references which as we all know is equivalent to a derefenced pointer. I was still wrapping my mind around alot of these technical concepts at the time...but close enough and there wasn't any reason for them to act like I was just talking like a babbling mad man. Then that jerk had the nerve to post in here. https://www.autoitscript.com/forum/topic/207285-why-is-_arraysort-so-broken-updated-1922-630pm-g2g/page/2/#comments

Yes the storage space is intentional but explain what a memory block that just contains [int, int*, int, int] needs an additional 500+ bytes on space...that is all initialized to zeros. Trust me a "variant" is not any different than std::variant it basically just takes a pointer a size and some kinda type id to make a variant which only needs 12-16 bytes? Tops. But what im saying is the memory block where the data lives for ints or most basic types is 40-48 bytes. Strings are stored in a completely different manner so an allocation would be inevitable to set a variant variable equal to a string value.

Edit on the original topic. So this is kinda weird to me.

I thought I had fixed my issue with dereferenceing a heap pointer via open process but that didn't actually fix it.....if I grab a LPVOID out of the heap via heapwalk and save that pointer I'm able to dereference it. if I dereference the lpvoid cast to an int** and save that int* value and attempt to dereference that pointer it crashes. It's just really weird. It results in code that looks like *pp[3]. Which works but it's not the greatest starting so far back in the chain. If I do int* p=*pp and attempt to do *(p+3) it crashes.
Last edited on
they're references which as we all know is equivalent to a derefenced pointer

sometimes. the optimizer often just puts the referred thing directly to where it is referred, without any intermediary. As much as possible, a reference IS the thing, and collapses down to just a handy alias in the text-code at the programmer's read/write level.

there are probably some cases where the compiler has to use a pointer to implement a reference. I don't know what you would have to do to force that to happen, though.

Bc when you have 2 things interacting the pointer represents a unique id to that reference. When you have a script, an editor window that shows information about the executing script and an interpreter any reference has to have a pointer otherwise there would be no way for one to reference the other.

During this process I've been forced to think about memory in a slightly different way. A reference basically is just representative of the memory address where it lives. A pointer is actually a reference that contains the address of another reference and so on. The fact that a pointer can't be passed by reference in kinda annoying bc instead of param just being int*& you use int** and pass the pointer via &int*....id rather use the first option, they're basically describing the same thing.
rust me a "variant" is not any different than std::variant it basically just takes a pointer a size and some kinda type id to make a variant which only needs 12-16 bytes? Tops

A variant could be a union-like class
struct variant { int id; union { type1 t1; type2 t2; }; };
The size of this variant is
sizeof(int) + max(sizeof type1, sizeof type2) + padding.

std::variant is always implemented using essentially the same strategy, although real implementations are more complicated. This design choice avoids a memory allocation:
cppreference wrote:
As with unions, if a variant holds a value of some object type T, the object representation of T is allocated directly within the object representation of the variant itself. Variant is not allowed to allocate additional (dynamic) memory.

https://en.cppreference.com/w/cpp/utility/variant

If autoit uses a similar data structure, it could explain why the objects you're dealing with are so large.
Last edited on
The fact that a pointer can't be passed by reference

they can. Its ugly..
1
2
3
4
5
void allocate(char *& cp)
{
    cp = new char[1000]; 
}  //if not passed by reference, this would leak the memory and the passed in 
//pointer would still be junk!.   Passed by reference, the value created by the 'new' is kept.  


this is the sort of thing we try to avoid, though.

---------------
Bc when you have 2 things interacting the pointer represents a unique id to that reference.

I have no idea what you are trying to say here.
...
there would be no way for one to reference the other.

at the assembly / compiler level, this is not true. Its all effectively global down there. This is a c++ problem, and the reference is a token you can use to access the thing, yes. But it does not HAVE to become an entity in the binary code. It can be directly loaded into a register by value without 'existing' on the stack or in ram (its in the code segment, though, as a hard value).

A pointer isn't a type of reference. Pointers are integer values that hold a number that is akin to an array index into ram:
if your data is in ram[100042];
then a pointer would be
x = 100042;
ram[x] is akin to *x and c++ supports that syntax (array notation)
but the key here is that x is an actual variable on the stack or heap somewhere. You can change its value.

a reference is something else.
say variable z is at location 4400.
then thing &y = z
y and z are the same thing. there is no pointer.
y and z both read or write location 4400, but 4400 isnt stored anywhere you can touch. Its a token in the assembly language, and y and z both get replaced with that token. Unless you did something to prevent this. You can't change the 4400, you can't even change what y refers to later if you changed your mind, it has to be redefined (eg a loop can contain thing &tmp = container[i] and each iteration tmp is rebuilt)
Last edited on
See this is what i'm talking about

"
A pointer isn't a type of reference. Pointers are integer values that hold a number that is akin to an array index into ram:
if your data is in ram[100042];
then a pointer would be
x = 100042;
ram[x] is akin to *x and c++ supports that syntax (array notation)
but the key here is that x is an actual variable on the stack or heap somewhere. You can change its value."

this is where our point of views differ. Every variable is a reference to a memory address until its destroyed. the difference between passing by reference and by value is the by value is a copy that gets its own memory address hence any changes to it don't reflect the original variable.

if the value 2 is stored at ram [10000] and pointer x=&2 or 10000 how is 2 being a reference to ram[10000] different than x being a reference to ram[1999]=10000 which is the memory address where the value x holds lives? when you go x=&2 you are just getting the address where 2 lives... same as sending &x to fill a int** parameter.

@ mbozzi The union idea has actually crossed my plate I just kinda figured it out on my own for use with a variant but I was thinking about explicitly allocating memory every time but I know that is the least efficient way to go about it. During this project i've been working on my own variant class and iterators. I put the variant class aside bc I'm not entirely sure that I need it. Things have just been changing rapidly as I come across new developments.


I just did some tests on what exactly was allowing me to dereference from the heap and the openprocess was what did it. the reason why I was having a problem is bc I was attempting to iterate though the addresses stored in the array and they aren't sequential. duh lol I was just so excited to get some dereference action I kinda forgot about that... lol
Last edited on
markyrocks wrote:
During this process I've been forced to think about memory in a slightly different way. A reference basically is just representative of the memory address where it lives. A pointer is actually a reference that contains the address of another reference and so on. The fact that a pointer can't be passed by reference in kinda annoying bc instead of param just being int*& you use int** and pass the pointer via &int*....id rather use the first option, they're basically describing the same thing.


A number of times in the past I have mentioned what I thought a reference is, no one seemed to argue, so I am hoping that may mean I am not wrong about it:

I take a reference to be a const pointer wrapped in some TMP (Template Meta Programming) code. As jonnin mentioned they have the same address. The const-ness means they can't point somewhere else. It is not mandated in the standards how to implement things, but I am not sure how else one would go about it.

I tried to look for some code to prove this, but haven't succeeded. The only thing I could find is things like std::add_lvalue_reference

1
2
template< class T >
struct add_lvalue_reference;


This is TMP, but I guess what it actually does is buried somewhere in the compiler code.

markyrocks wrote:

A pointer is actually a reference that contains the address of another reference and so on.


A pointer is simply a variable that holds a memory address. Obviously one can have multiple levels. They can be const or non const.
Last edited on
Pages: 12