Arrest people for using "new" and "delete"

Pages: 12
Some people at this forum say that programmers must never use "new" and "delete".
I think they say something like "never drive car by yourself".
Sure, newbies should not use it in first day of programming. But it's official part of c++. So forbid to use "new" and "delete" sounds as forbid to drive cars. If I use malloc() and free(), these people would catch heart infarct and say that I has ho rights to repair my car by myself.
What about allocating memory directly from system by HeapAlloc() in windows or brk() in linux?
My computer is my own. I have freedom to use it as i want. Avoid use "new" and "delete" is recomendations for newbies, not The State Low.

Example. I have to compare eight bytes strings.
Recommended way is to use string class and == operator.
I walked through stl source and find that it calls something like __builtin_memcmp(). I debuged through binary and found many hundreds of CPU instructions. Why should my program execute these hundreds. Eight bytes easy compared by CPU by couple instructions.

Questions:
1. Is it forum for elementary school programmers of for researchers and forensic?
2. If you make everyone to use STL, why do u think that my program in this example must execute hundreds of instructions instead of couple?
3. If you think that programmers in this forum should never use "new" and "delete", do you think that people should be arrested for using these? And do you think that people should be arrsted for driving and repairing cars by themselves?
Whoa !

If I may suggest that you calm down a little? I wish to antagonise you any further or anything, but I do think that is a huge reaction to what I said. Essentially I missed 1 word IMO.

see my post in the other topic, where I suspect this all came from:

http://www.cplusplus.com/forum/general/157423/#msg809708
If you set up your GPS to a location but you want to use a different road, do you still follow your GPS' path?

We give you the advice, you do what you want to (and, personally, I couldn't care less if you used operator== or your own function which will probably (not) speed up your program by 32x (64x on a 64bit machine; kudos to whoever gets the reference)).
Why should my program execute these hundreds. Eight bytes easy compared by CPU by couple instructions.
Why do you care? Is it a bottleneck? Are you sure? Isn't it is the case of premature optimisation, which is the root of all evil? Won't naive approach kill perfomance even more? 14 unaligned uncached accesses in a row might kill perfomance way more than "many hundreds of CPU instructions" ever will.
Usual way is to minimise memory access time (as it is a common bottleneck) by reading aligned blocks of memory same as register size. Or even coerce CPU to cache it.
Do you think that library — C library which was polished for decades — writers are that dumb that they would create a subpar implementation that even naive approach will outperform?
Last edited on
MiiNiPaa:
Why do you care? Is it a bottleneck?

Yes. In my database analyzer i speed up my program by 10% by using own string comparsion. Even if i didn't optimize it yet.

Isn't it is the case of premature optimisation, which is the root of all evil?

Why should i use slow implementations?
Something slower here. Somethink slower there. As result, modern computers run their programs much slower then old computers run their old porgrams. Cause earlier programmers use some hints and hacks to speedup their programs. And they tell to others how did they do it.

Won't naive approach kill perfomance even more? 14 unaligned uncached accesses

Did you ever see string implementation where data is unaligned proerly?

Do you think that library — C library which was polished for decades — writers are that dumb that they would create a subpar implementation that even naive approach will outperform?

Yes, i think. Moreover, i'm sure.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
#include <iostream>
#include <string>

using std::string;
using std::cout;
using std::endl;

long long rdtsc()
{
	long long x;
	asm(
		"rdtsc \n"
		"shl $32, %%rdx \n"
		"or %%rdx, %%rax \n"
		"mov %%rax, %0"
		: "=o"(x)::"%eax","%edx");
	return x;
}

bool same(const string& s1, const string& s2)
{
	const size_t l1 = s1.length();
	const size_t l2 = s2.length();
	if (l1 != l2) return false;

	const char* p1c = s1.data();
	const char* p2c = s2.data();
	const char* const endc = reinterpret_cast<const char*>(p1c + l1);

	const size_t*& p1 = reinterpret_cast<const size_t*&>(p1c);
	const size_t*& p2 = reinterpret_cast<const size_t*&>(p2c);
	const size_t* const end = reinterpret_cast<const size_t*>
		(endc - sizeof(size_t) + 1);
	while(p1 < end)
		if(*p1++ != *p2++) return false;

	while(p1c < endc)
		if(*p1c++ != *p2c++) return false;

	return true;
}

int main()
{
	string s1 = "text1";
	string s2 = "text2";
	string s3 = "texttexttexttexttexttexttexttexttexttexttexttexttexttexttext"
		"texttexttexttexttexttexttexttexttexttexttexttexttexttexttexttexttext"
		"texttexttexttexttexttexttexttexttexttexttexttexttexttexttexttexttext"
		"texttexttexttexttexttexttexttexttexttexttexttexttexttexttexttexttext"
		"texttexttexttexttexttexttexttexttexttexttexttexttexttexttexttexttext1";
	string s4 = "texttexttexttexttexttexttexttexttexttexttexttexttexttexttext"
		"texttexttexttexttexttexttexttexttexttexttexttexttexttexttexttexttext"
		"texttexttexttexttexttexttexttexttexttexttexttexttexttexttexttexttext"
		"texttexttexttexttexttexttexttexttexttexttexttexttexttexttexttexttext"
		"texttexttexttexttexttexttexttexttexttexttexttexttexttexttexttexttext2";

	long long t1 = rdtsc();
//	bool res = s3 == s4;
	bool res = same(s1, s3);
	t1 = rdtsc() - t1;
	cout << "res: " << res << "\nTime: " << t1 << endl;
}


s1==s2 => 3165
s1==s3 => 133
s3==s4 => 4691
same(s1, s2) => 562
same(s1, s3) => 248
same(s3, s4) => 1067

Yes. In my database analyzer i speed up my program by 10% by using own string comparsion
If you did profiling, tested and your implementation does speed up whole program, then yes, it is a good candidate for optimisation.

Why should i use slow implementations?
Because portability and readability > speed. If you ever maintained large project you should already experience that.
Your code is not portable, as it contains undefined behavior and attempt to dereference end might easily result in crash on RISC CPU (or ARM processors, depending on compiler).

Did you ever see string implementation where data is unaligned proerly?
32bit libstdc++ bundled with GCC 4.3, I believe, on 64bit platform. It takes result of allocator (which is aligned by maxalign) and sets first character to be in position 5*4 bytes ahead, which is out of aligment for 8-byte access.
If you did profiling, tested and your implementation does speed up whole program

Yes. I run it with rdtsc. I use profiler. I check it's real work with timer (on mobile phone :-).

Because portability and readability > speed.

Readability achieved by well written documentation.
Portability. Fast low level blocks for every target better than universal slow one.

Your code is not portable, as it contains undefined behavior..

Please, point me. Tell me the line number and explain.

and attempt to dereference end

Where?

32bit libstdc++ bundled with GCC 4.3

Yes, 32-bit version of my function will compare by four bytes only. Using something like SSE or AVX will be better. So should i have different implementations and check speed of every of it on different targets? It's still better than usual universal slow implementation.
Fun, I ran that piece of code on my old first-gen core i7 and intel is holding up (numbers are best of 5)
        intel 15   clang 3.6  gcc 4.9
s1==s2      148     151    5855
s1==s3       18      77      18
s3==s4      274    8806    6240

same(s1,s2)  107     148   194
same(s1,s3)   35      77    40
same(s3,s4)  252     237   288

There was a lot of variance though. In any case, what MiiNiPaa said, if you found a bottleneck, profiled, and improved, good job. It's what I do for living. But I still don't recommend new/delete in C++ code, thanks to make_unique.
Last edited on
Please, point me. Tell me the line number and explain.
reinterpret_cast<const size_t*&> Result of reinterpret_cast to anything (aside from char* and void*) which is not its original type is unspecified (Not undefined actually). So results are unknown.
I encountered implementation which automatically align resulting pointer, which might make comparison to not work properly (however I doubt that you will ever need to analyse databases on industrial machines).

Fast low level blocks for every target better than universal slow one.
I doubt that anyone can cover all potential targets. There are hundreds of them.

I run it with rdtsc.
I am not to discredit your results here, but rdtsc() is not the silver bullet: 800 cycles of actual work are executed in less time than 100 cycles mostly spend on stalling waiting for some IO operation. Also it is mostly useless in multithreaded applications, as another thread can interrupt your execution and you will get exagerrated results, or you can even get start and end results from different cores.

I say it again: If it is a proven problem, fix it. Optimise, refactor, etc.
Just do not do that before it is a problem. Do not start juggling pointers to GUI elements instead of packing it into neatly managing class in the very beginning because it will be more efficient. As I saw projects which descent in support hell and it was found that rewriting module from scratch would cost less that to let new member to understand how is it works.
Umm can you explain how this code works?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
#include <iostream>
#include <string>

using std::string;
using std::cout;
using std::endl;

long long rdtsc()
{
	long long x;
	asm(
		"rdtsc \n"
		"shl $32, %%rdx \n"
		"or %%rdx, %%rax \n"
		"mov %%rax, %0"
		: "=o"(x)::"%eax","%edx");
	return x;
}

bool same(const string& s1, const string& s2)
{
	const size_t l1 = s1.length();
	const size_t l2 = s2.length();
	if (l1 != l2) return false;

	const char* p1c = s1.data();
	const char* p2c = s2.data();
	const char* const endc = reinterpret_cast<const char*>(p1c + l1);

	const size_t*& p1 = reinterpret_cast<const size_t*&>(p1c);
	const size_t*& p2 = reinterpret_cast<const size_t*&>(p2c);
	const size_t* const end = reinterpret_cast<const size_t*>
		(endc - sizeof(size_t) + 1);
	while(p1 < end)
		if(*p1++ != *p2++) return false;

	while(p1c < endc)
		if(*p1c++ != *p2c++) return false;

	return true;
}

int main()
{
	string s1 = "text1";
	string s2 = "text2";
	string s3 = "texttexttexttexttexttexttexttexttexttexttexttexttexttexttext"
		"texttexttexttexttexttexttexttexttexttexttexttexttexttexttexttexttext"
		"texttexttexttexttexttexttexttexttexttexttexttexttexttexttexttexttext"
		"texttexttexttexttexttexttexttexttexttexttexttexttexttexttexttexttext"
		"texttexttexttexttexttexttexttexttexttexttexttexttexttexttexttexttext1";
	string s4 = "texttexttexttexttexttexttexttexttexttexttexttexttexttexttext"
		"texttexttexttexttexttexttexttexttexttexttexttexttexttexttexttexttext"
		"texttexttexttexttexttexttexttexttexttexttexttexttexttexttexttexttext"
		"texttexttexttexttexttexttexttexttexttexttexttexttexttexttexttexttext"
		"texttexttexttexttexttexttexttexttexttexttexttexttexttexttexttexttext2";

	long long t1 = rdtsc();
//	bool res = s3 == s4;
	bool res = same(s1, s3);
	t1 = rdtsc() - t1;
	cout << "res: " << res << "\nTime: " << t1 << endl;
}


What are you doing differently that is causing such a speedup? I'm guessing the numbers you have shown is number of instructions executed in order to do the comparison?
What are you doing differently that is causing such a speedup?


The basic idea is that instead of the traditional comparison of one character at a time, he's accessing the character memory in size_t-sized blocks and comparing those (effectively dividing the number of comparisons done by 4, assuming sizeof(size_t)==4)

I'm guessing the numbers you have shown is number of instructions executed in order to do the comparison?


It's the number of cycles. rdtsc returns the number of cycles elapsed since system reset.
Cubbi (3720):
Thank you for there numbers. There are many implementations of STL. Some are quick enough, some are slow. I should better bugreport to authors of my implementation.

make_unique is good. Btw, how can i use it with my own allocators?

I doubt that anyone can cover all potential targets. There are hundreds of them.

So lets have one universal and many specific. If we have specific and quicker implementation, let's use it.

rdtsc() is not the silver bullet

Average values of many runs give common view. Ok. slow implementation is not problem, which i should discuss here. I should bugreport to authors.
So lets have one universal and many specific.
Good idea, did you try to write your own specific char_trait implementation and provide it as string template parameter instead of relying on external function? This will let all standard library use your own implementation instead of standard one everywhere.
MiiNiPaa:
Yes. It looks like i should do so. Thanks.
Disch wrote:
The basic idea is that instead of the traditional comparison of one character at a time, he's accessing the character memory in size_t-sized blocks and comparing those (effectively dividing the number of comparisons done by 4, assuming sizeof(size_t)==4)


Got it! I remember doing this before but not in C. Also it reminds me of something I saw earlier
http://qr.ae/Em78U
Konstantin2 wrote:
make_unique is good. Btw, how can i use it with my own allocators?

As the original proposal http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3588.txt says, "Expert users with custom allocators can easily write their own wrappers with corresponding custom deleters.". It kind of annoys me that allocators are regarded to be expert material, because it leads to libraries that go after the system heap when I don't want them to (my old job viciously gutted and re-wrote all such libraries, the system heap was off-limits).
There is basic_string and allocator is it's third parameter. It's ok.
When i use make_unique, this function allocates memory. I want that memory allocated using my own allocator. Where should i put my allocator when using make_unique?
> When i use make_unique, this function allocates memory.

No. It does not.

Misread the question. Thanks, Dish.

If you want custom memory allocation for your string, write a custom allocator and specify that as the third template argument for std::basic_string<>

With C++11, writing a stateless custom allocator has become quite trivial.
http://en.cppreference.com/w/cpp/memory/allocator_traits

With a conforming implementation of the standard C++ library:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
#include <iostream>
#include <string>
#include <cstddef>
#include <algorithm>

#ifdef _MSC_VER // Microsoft
   #define noexcept
   #define constexpr  
#endif   

#ifdef __GNUG__ // LLVM or GNU
   #if ! defined __clang__ // not LLVM, must be GNU
       #error "this won't work with the non-conforming GNU library" 
   #endif 
#endif

template < typename T > struct my_allocator
{
    using value_type = T ;

    my_allocator() noexcept = default ;
    template < typename U > my_allocator( const my_allocator<U>& ) noexcept {}
    template < typename U > my_allocator<T>& operator= ( const my_allocator<U>& ) noexcept { return *this ; }
    template < typename U > my_allocator( my_allocator<U>&& ) noexcept {}
    template < typename U > my_allocator<T>& operator= ( my_allocator<U>&& ) noexcept { return *this ; }
    ~my_allocator() noexcept = default ;

    // replace these with your own allocate / deallocate functions
    T* allocate( std::size_t n ) const { return static_cast<T*>( ::operator new( sizeof(T) * n ) ) ; }
    void deallocate( T* ptr, std::size_t ) const { ::operator delete(ptr) ; }
};

template < typename T, typename U >
constexpr bool operator== ( const my_allocator<T>&, const my_allocator<U>& ) { return true ; }

template < typename T, typename U >
constexpr bool operator!= ( const my_allocator<T>&, const my_allocator<U>& ) { return false ; }

int main()
{
    using my_string = std::basic_string< char, std::char_traits<char>, my_allocator<char> > ;

    my_string str = "abracadabra" ;
    std::cout << str << '\n' ;
    
    std::sort( str.begin(), str.end() ) ;
    str.erase( std::unique( str.begin(), str.end() ), str.end() ) ;
    std::cout << str << '\n' ;
}

http://coliru.stacked-crooked.com/a/0f94e6704d4b13c6
http://rextester.com/TSL79039

With the GNU library, which does not support C++11 well, we have to write a C++98 allocator, which is described here: http://www.angelikalanger.com/Articles/C++Report/Allocators/Allocators.html
Last edited on
I think you guys are misunderstanding his question.

He knows that you can pass a custom allocator to basic_string to have the string data allocated off the heap.

He's asking how to you pass a custom allocator to make_unique to have the object allocated off the heap.


make_unique uses new to allocate the new object, right? How do you direct that to use your own allocator?
If you want for all objects of your own type to be allocated with said allocator, you can overload member operator new.
Otherwise you will have to write your own wrapper like make_allocated_unique. make_unique sees no love from standard comitee, as it deemed trivial to write :(.
Pages: 12