How to quicken a table look up?

Pages: 12
https://en.wikipedia.org/wiki/Locality_of_reference
1. In the absence of any jumps at all, any new label is going to be pretty close the previous lookup.
Caching the previous result gives you a good place to look next time.

2. Loops are visited frequently.
1
2
3
4
5
label:
  opc
  opc
  opc
  branch label

Labelled branch targets can be cached, as they will be hit multiple times.

3. If this machine also has any form of 'call subroutine', then it might be worth having a cache at each stack frame.
unordered map is faster, I believe, for this kind of work.

To confirm your believe: :cit.
unordered_map containers are faster than map containers to access individual elements by their key, although they are generally less efficient for range iteration through a subset of their elements.
:end_cit. found in: http://www.cplusplus.com/reference/unordered_map/unordered_map/

For this reason I now decided to go for unordered map. Problem: as a C++ novice I am able to modify examples and stich several examples together, alas I am still not capable to read the documentation and apply it to code something the complier accepts.

In addition compilers act differntly. For instance, I adopted the example from
http://www.cplusplus.com/reference/unordered_map/unordered_map/begin/
to my needs:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
// unordered_map::begin/end example
#include <iostream>
#include <unordered_map>

int main ()
{
  std::unordered_map<std::uint_fast16_t,std::string> mymap;
  mymap =
  {
 {0X0004,"ADRFCH"},
 {0X0018,"OVRSTK"},
 {0X0020,"FCHRTN"},
 {0X0026,"X<>ROW"},
 {0X002F,"XROW0"},
 {0X0031,"XROW10"},
 {0X0033,"XROW11"},
 {0X0035,"XROW12"},
 {0X0037,"XROW13"},
 {0X0039,"XROW14"},
 {0X003B,"XROW2"},
 {0X003B,"XROW3"},
 {0X0042,"XROW9"},
 {0X0045,"REGADR"},
 {0X004C,"ADRGSB"},
 {0X004F,"BIGBRC"},
 {0X0054,"TONSTF"},
 {0X0060,"ROWTBL"},
 {0X0071,"TONETC"},
 {0X0074,"XROW1"},
 {0X007F,"ROW7"},
 {0X0081,"XROW5"},
 {0X0081,"XROW6"},
 {0X0086,"XROW4"},
 {0X008B,"MATH"},
 {0X0090,"XCUTEB"},
 {0X0091,"XCUTB1"},
 {0X0098,"RSTKB"},
// ... nearly another 3000 entries
 {0X71C3,"FNSTSA"},
 {0X71DB,"FSP04"},
 {0X71DF,"FSP06"},
 {0X71E6,"FSP10"},
 {0X71ED,"FSP30"},
 {0X71F2,"FSP45"},
 {0X71F6,"FSP48"},
 {0X71F7,"FSP50"},
 {0X71FB,"FSPER"},
 {0X71FD,"FSPEX"},
 {0X71FF,"FSPEX4"},
 {0X720A,"FSPEX6"},
 {0X720F,"FSP60"},
 {0X721B,"FSP65"},
 {0X721E,"FSP68"},
 {0X7220,"FSP70"},
 {0X7225,"FSP75"},
 {0X7229,"FSP80"},
 {0X722B,"FSP85"},
 {0X722D,"OOPCHK"}};

  std::cout << "mymap's buckets contain:\n";
  for ( unsigned i = 0; i < mymap.bucket_count(); ++i) {
    std::cout << "bucket #" << i << " contains:";
    for ( auto local_it = mymap.begin(i); local_it!= mymap.end(i); ++local_it )
      std::cout << " " << local_it->first << ":" << local_it->second;
    std::cout << std::endl;
  }
  std::cout << "Test [55]:" << mymap[55]; // test the "direct" access by []
  return 0;
}


If I try the same on M$VS2010 I get error messages but no idea how to solve it.
1>HP41Trace.cpp(790): error C4430: missing type specifier - int assumed. Note: C++ does not support default-int

If I set type int:
1>HP41Trace.cpp(789): error C2589: 'int' : illegal token on right side of '::'
-- but no clue what identifiers are possible. When I set it to string I am told:
1>HP41Trace.cpp(790): error C2371: 'GlobalName' : redefinition; different basic types
1>          HP41Trace.cpp(789) : see declaration of 'GlobalName'


Then I find the hint
This page describes a feature introduced by the latest revision of the C++ standard (2011). Older compilers may not support it.
and doubt if I shouldn't get rid of MSVS2010.
just DIY. It will outperform unordered map, its not that hard ... if the data is static, you don't need collision detection nonsense, and can make it really hot.

you just need an array, a simple and fast conversion of your data to an array index, and its done.
Last edited on
You could make your own hash table if you can't upgrade to C++11. Use a simple array/vector for the size of the table, and chain collisions with linked lists. Or find an implementation online (GitHub or somewhere) that already exists and supports pre-C++11 compilers, I'm sure they exist.
To confirm your believe: :cit.
What is meant by this statement is that insert/remove is faster. Searching for a certain element is faster within an ordered map. Since you have rather static map std::map is more what you want.
1. [...] Caching the previous result gives you a good place to look next time.

2. Loops are visited frequently. [...] Labelled branch targets can be cached, as they will be hit multiple times.

3. If this machine also has any form of 'call subroutine', then it might be worth having a cache at each stack frame.

Very sophisticated. And for sure worth to consider for systems where response time is an important aspect of the MMI. No question.

But what I do is just enlarge the list of labels from about 700 to nearly 3000. The current implementation is a linked list, which is scanned serial for each command. The effect of the four times larger list is remarkable only when writing the trace.log and running the emulated HP41 in turbo mode (= faster than real machine). It is quite rare (and not very useful) to trace lengthy sequences like iterating user programs, tpyically you trace a single function like SIN or LOG or a MCode routine of yours. Typically such functions end before you are able to switch to turbo mode.

Ad 1: With 3000 labels defined, what cache size will reasonable?
Ad 2: Short jumps (JC and JNC, Jump if Carry and Jump if No Carry) change the PC in a range -123..124. For those I removed the indicaton of the target label (for reason of the trace log layout). If the jump is executed the label is shown in next line, if it is not executed, well, hard cheese.
Ad 3: Yes it has, PC and a 4 level return stack. But again same question as for ad 1.

To give you an impression what I am talking about, here an excerpt of a trace log
0000  201*006		NCGO	0180	LSWKUP
0002  2B5*006		NCGO	01AD	DSWKUP
01AD  001*100	DSWKUP	NCXQ	4000
01AF  2E0		DISOFF
01B0  3D5*00C		NCXQ	03F5	PACH11
03F5  130*2FD	PACH11	LDI	2FD
03F7  270		RAMSLCT
03F8  3F0		PERSLCT
03F9  0B8		RDC12L
03FA  015*00A		NCGO	0205	MEMCHK
0205  3C8	MEMCHK	CLRKEY
0206  3CC		?KEY
0207  260		SETHEX
0208  046		C=0	X
0209  3F0		PERSLCT
020A  270		RAMSLCT
020B  130*169		LDI	169
020D  106		A=C	X
020E  378		C=REG	13
020F  17C		RCR	6
0210  366		?A#C	X
0211  10F		JC	+33	COLDST (0232)
0212  1BC		RCR	11
0213  266		C=C-1	X
0214  270		RAMSLCT
0215  038		RDATA
0216  10E		A=C	ALL
0217  2BA		C=-C-1	M
0218  2F0		WDATA
0219  038		RDATA
021A  2BA		C=-C-1	M
021B  36E		?A#C	ALL
021C  0B7		JC	+22	COLDST (0232)
021D  2F0		WDATA
021E  046		C=0	X
021F  270		RAMSLCT
0220  3B8		C=REG	14
0221  358		ST=C
0222  0CC	CHKRPC	?FS	10
0223  3A0		NCRTN
01B2  3CC		?KEY
01B3  03F		JC	+7	WKUP25 (01BA)
01BA  3B8	WKUP25	C=REG	14
01BB  3F1*00C		NCXQ	03FC	PACH12
03FC  046	PACH12	C=0	X
03FD  3A8		REG=C	E
03FE  309*0BA		NCGO	2EC2	DECMPL
2EC2  244	DECMPL	CF	9
2EC3  3A1*080	DCPL00	NCXQ	20E8	GTFEND
20E8  046	GTFEND	C=0	X
20E9  270		RAMSLCT
20EA  378		C=REG	13
20EB  01C	GTFEN1	PT=	3
20EC  110		LC	4
20ED  270		RAMSLCT
20EE  01C		PT=	3
20EF  10A		A=C	WPT
20F0  038		RDATA
20F1  3E0		RTN
2EC5  0AA		A<>C	WPT
2EC6  070		N=C
2EC7  004		CF	3
2EC8  10A	DCPL05	A=C	WPT
2EC9  34D*0A4		NCXQ	29D3	INCAD2
29D3  1A2	INCAD2	A=A-1	PT
29D4  037		JC	+6	INC21 (29DA)
29D5  1A2		A=A-1	PT
29D6  1A2	INCADA	A=A-1	PT
29D7  02F		JC	+5	INC1 (29DC)
29D8  1A2		A=A-1	PT
29D9  3E0		RTN
2ECB  2ED*0A4		NCXQ	29BB	GTBYTA
29BB  0AA	GTBYTA	A<>C	WPT
29BC  10A		A=C	WPT
29BD  270		RAMSLCT
29BE  07C		RCR	4
29BF  130*221		LDI	221
29C1  0FC		RCR	10
29C2  1E0		GTOC
2210  038	TBLGBA	RDATA
2211  3E0		RTN
2ECD  3D8		C<>ST
2ECE  30C		?FS	1
2ECF  027		JC	+4	DCPL07 (2ED3)
2ED0  104		CF	8
2ED1  3D8		C<>ST
2ED2  033		JNC	+6	DCPL11 (2ED8)
2ED8  0B0	DCPL11	C=N
2ED9  10E		A=C	ALL
2EDA  139*088		NCXQ	224E	GTLINK
224E  0AA	GTLINK	A<>C	WPT
224F  3C3		JNC	-8	GTLNKA (2247)
2247  10A	GTLNKA	A=C	WPT
2248  270		RAMSLCT
2249  07C		RCR	4
224A  130*220		LDI	220
224C  0FC		RCR	10
224D  1E0		GTOC
2204  038		RDATA
2205  103		JNC	+32	GBA1 (2225)
2225  23C	GBA1	RCR	2
2226  3E0		RTN
2EDC  2E6		?C#0	X
2EDD  303		JNC	-32	DCPL20 (2EBD)
2EDE  0D5*088	DCPL15	NCXQ	2235	UPLINK
2235  042	UPLINK	C=0	PT
2236  1EA		C=C+C	WPT
2237  1EA		C=C+C	WPT
2238  1EA		C=C+C	WPT
2239  222		C=C+1	PT
223A  1E2		C=C+C	PT
223B  1E6		C=C+C	X
223C  01F		JC	+3	ULINK1 (223F)
223D  3C6		CSR	X
223E  01B		JNC	+3	ULINK2 (2241)
2241  20A	ULINK2	C=A+C	WPT
2242  01B		JNC	+3	ULINK3 (2245)
2245  262	ULINK3	C=C-1	PT
2246  262		C=C-1	PT
2247  10A	GTLNKA	A=C	WPT
2248  270		RAMSLCT
2249  07C		RCR	4
224A  130*220		LDI	220
224C  0FC		RCR	10
224D  1E0		GTOC
220A  038		RDATA
220B  0A3		JNC	+20	GBA4 (221F)
221F  13C	GBA4	RCR	8
2220  3E0		RTN
2EE0  23E		C=C+1	S
2EE1  2D7		JC	-38	DCPL17 (2EBB)
2EE2  0AA		A<>C	WPT
2EE3  10A		A=C	WPT
2EE4  070		N=C
2EE5  34D*0A4		NCXQ	29D3	INCAD2
29D3  1A2	INCAD2	A=A-1	PT
29D4  037		JC	+6	INC21 (29DA)
29D5  1A2		A=A-1	PT
29D6  1A2	INCADA	A=A-1	PT
29D7  02F		JC	+5	INC1 (29DC)
29D8  1A2		A=A-1	PT
29D9  3E0		RTN
2EE7  10C	DCPL24	?FS	8
2EE8  0EB		JNC	+29	DCPL70 (2F05)
2F05  0B0	DCPL70	C=N
2F06  2E6		?C#0	X
2F07  20F		JC	-63	DCPL05 (2EC8)
2EC8  10A	DCPL05	A=C	WPT
2EC9  34D*0A4		NCXQ	29D3	INCAD2
29D3  1A2	INCAD2	A=A-1	PT
29D4  037		JC	+6	INC21 (29DA)
29D5  1A2		A=A-1	PT
29D6  1A2	INCADA	A=A-1	PT
29D7  02F		JC	+5	INC1 (29DC)
29D8  1A2		A=A-1	PT
29D9  3E0		RTN
2ECB  2ED*0A4		NCXQ	29BB	GTBYTA
29BB  0AA	GTBYTA	A<>C	WPT
29BC  10A		A=C	WPT
29BD  270		RAMSLCT
29BE  07C		RCR	4
29BF  130*221		LDI	221
29C1  0FC		RCR	10
29C2  1E0		GTOC
2216  038		RDATA
2217  053		JNC	+10	GBA3 (2221)
2221  17C	GBA3	RCR	6
2222  3E0		RTN
2ECD  3D8		C<>ST
2ECE  30C		?FS	1
2ECF  027		JC	+4	DCPL07 (2ED3)
2ED0  104		CF	8
2ED1  3D8		C<>ST
2ED2  033		JNC	+6	DCPL11 (2ED8)
2ED8  0B0	DCPL11	C=N
2ED9  10E		A=C	ALL
2EDA  139*088		NCXQ	224E	GTLINK
224E  0AA	GTLINK	A<>C	WPT
224F  3C3		JNC	-8	GTLNKA (2247)
2247  10A	GTLNKA	A=C	WPT
2248  270		RAMSLCT
2249  07C		RCR	4
224A  130*220		LDI	220
224C  0FC		RCR	10
224D  1E0		GTOC
220A  038		RDATA
220B  0A3		JNC	+20	GBA4 (221F)
221F  13C	GBA4	RCR	8
2220  3E0		RTN
2EDC  2E6		?C#0	X
2EDD  303		JNC	-32	DCPL20 (2EBD)
2EDE  0D5*088	DCPL15	NCXQ	2235	UPLINK
2235  042	UPLINK	C=0	PT
2236  1EA		C=C+C	WPT
2237  1EA		C=C+C	WPT
2238  1EA		C=C+C	WPT
2239  222		C=C+1	PT
223A  1E2		C=C+C	PT
223B  1E6		C=C+C	X
223C  01F		JC	+3	ULINK1 (223F)
223D  3C6		CSR	X
223E  01B		JNC	+3	ULINK2 (2241)
2241  20A	ULINK2	C=A+C	WPT
2242  01B		JNC	+3	ULINK3 (2245)
2245  262	ULINK3	C=C-1	PT
2246  262		C=C-1	PT
2247  10A	GTLNKA	A=C	WPT
2248  270		RAMSLCT
2249  07C		RCR	4
224A  130*220		LDI	220
224C  0FC		RCR	10
224D  1E0		GTOC
220A  038		RDATA
220B  0A3		JNC	+20	GBA4 (221F)
221F  13C	GBA4	RCR	8
2220  3E0		RTN
2EE0  23E		C=C+1	S
2EE1  2D7		JC	-38	DCPL17 (2EBB)
2EBB  2E6	DCPL17	?C#0	X
2EBC  117		JC	+34	DCPL15 (2EDE)
2EDE  0D5*088	DCPL15	NCXQ	2235	UPLINK
2235  042	UPLINK	C=0	PT
2236  1EA		C=C+C	WPT
2237  1EA		C=C+C	WPT
2238  1EA		C=C+C	WPT
2239  222		C=C+1	PT
223A  1E2		C=C+C	PT
223B  1E6		C=C+C	X
223C  01F		JC	+3	ULINK1 (223F)
223D  3C6		CSR	X
223E  01B		JNC	+3	ULINK2 (2241)
2241  20A	ULINK2	C=A+C	WPT
2242  01B		JNC	+3	ULINK3 (2245)
2243  226		C=C+1	X
2244  01B		JNC	+3	GTLNKA (2247)
2247  10A	GTLNKA	A=C	WPT
2248  270		RAMSLCT
2249  07C		RCR	4
224A  130*220		LDI	220
224C  0FC		RCR	10
224D  1E0		GTOC
2202  038		RDATA
2203  123		JNC	+36	UPLB1 (2227)
2227  086	UPLB1	B=A	X
2228  0EA		B<>C	WPT
2229  266		C=C-1	X
222A  270		RAMSLCT
222B  038		RDATA
222C  0CA		C=B	WPT
222D  3E0		RTN
2EE0  23E		C=C+1	S
2EE1  2D7		JC	-38	DCPL17 (2EBB)
2EBB  2E6	DCPL17	?C#0	X
2EBC  117		JC	+34	DCPL15 (2EDE)
2EDE  0D5*088	DCPL15	NCXQ	2235	UPLINK
...
009D  3C8	RST10	CLRKEY
009E  3CC		?KEY
009F  266		C=C-1	X
00A0  3EB		JNC	-3	RST10 (009D)
009D  3C8	RST10	CLRKEY
009E  3CC		?KEY
009F  266		C=C-1	X
00A0  3EB		JNC	-3	RST10 (009D)
00A1  3E0		RTN
11D1  241*006		NCGO	0190	DRSY50
0190  3D9*01C	DRSY50	NCXQ	07F6	ENLCD
07F6  130*010	ENLCD	LDI	010
07F8  270		RAMSLCT
07F9  130*0FD		LDI	0FD
07FB  3F0		PERSLCT
07FC  3E0		RTN
0192  060*000		POWOFF		deep sleep

So for the long distant jumps the chance is 50:50 the label is reused in next line.
Edit: just notice that this example does not correspond to the description of JC and JNC.
Edit 2: withdraw statement of 1st edit
Last edited on
Mike, this is Dave Hayden. You may have seen me on hpmuseum.org or at HHC if you've been there in the past few years. My 41C, 41CL, 29C, 48gii and 34s are all literally within arms reach right now. ;)

Just sort the table by address and use binary search to lookup by address. Since lookup by name is rare, I think you can leave that as-is (linear search).

Point me at the code and I'd be happy to help put this together.

Dave
Problem solved, Dave sent me a binary search solution.
Hmm, solved with a binary search?


How novel.
Last edited on
Hmm, solved with a binary search?


How novel.
What's your point?
What's your point?

Probably just that that was said four days ago... :0)
How novel.

LOL -- my intention of the append was to give a reason for the green checkmark. Besides this I'll try unordered map but to test this it seems I have to quit VS2010. And in case I am bored to death I will give boost::bimaps a try as I was adviced earlier.
Probably just that that was said four days ago... :0)
Ah. Right. I see that was the fist suggestion, made by Duthomhas himself.
A bit of benchmarking.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
#include <iostream>
#include <cstdio>
#include <cstdlib>
#include <ctime>
#include <cstring>
#include <string>
#include <unordered_map>
using namespace std;

typedef unsigned short word;
const int NameSize = 10;
const int Globals = 2916;

#ifdef USE_BSEARCH
struct Glob_type          // Global routines names
  {
  char name[NameSize];    //name
  word addr;              //address
  };

struct Glob_type GlobalName[Globals];
#endif
#ifdef USE_MAP
std::unordered_map<word,std::string> mymap;
#endif

void init() {
    for ( word i = 0 ; i < Globals ; i++ ) {
        char buff[NameSize];
        snprintf(buff,NameSize,"Lbl %d",i);
#ifdef USE_BSEARCH
        strcpy(GlobalName[i].name,buff);
        GlobalName[i].addr = i * 10;
#endif
#ifdef USE_MAP
        pair<word,std::string> v(i*10,buff);
        mymap.insert(v);
#endif
    }
}

#ifdef USE_BSEARCH
int cmp(const void *a, const void *b) {
    const Glob_type *pa = reinterpret_cast<const Glob_type*>(a);
    const Glob_type *pb = reinterpret_cast<const Glob_type*>(b);
    if ( pa->addr < pb->addr ) return -1;
    if ( pa->addr > pb->addr ) return +1;
    return 0;
}

const Glob_type *finder(word addr) {
    Glob_type key = { "", addr };
    void *result = bsearch(&key,GlobalName,Globals,sizeof(Glob_type),cmp);
    return reinterpret_cast<const Glob_type*>(result);
}
#endif

int main(int argc, char* argv[])
{
    init();
#ifdef USE_MAP
    {
        int hits = 0;
        auto c1 = clock();
        for ( int i = 0 ; i < 1000000 ; i++ ) {
            word input = rand()%32000;
            std::unordered_map<word,std::string>::const_iterator r = mymap.find(input);
            if ( r != mymap.end() ) {
                hits++;
            }
        }
        auto c2 = clock();
        std::cout << "Map: Total hits=" << hits << std::endl;
        double elapsed = double((c2-c1)) / CLOCKS_PER_SEC;
        std::cout << "Map: Elapsed=" << elapsed << std::endl;
    }
#endif
#ifdef USE_BSEARCH
    {
        int hits = 0;
        auto c1 = clock();
        for ( int i = 0 ; i < 1000000 ; i++ ) {
            const Glob_type *r = finder(rand()%32000);
            if ( r ) {
                hits++;
            }
        }
        auto c2 = clock();
        std::cout << "Bin: Total hits=" << hits << std::endl;
        double elapsed = double((c2-c1)) / CLOCKS_PER_SEC;
        std::cout << "Bin: Elapsed=" << elapsed << std::endl;
    }
#endif
    return 0;
}


A mixed bag of results.

$ g++ -std=c++11 -DUSE_MAP foo.cpp && size a.out && ./a.out
   text	   data	    bss	    dec	    hex	filename
  18855	    800	    352	  20007	   4e27	a.out
Map: Total hits=90848
Map: Elapsed=0.174506
$ g++ -std=c++11 -DUSE_BSEARCH foo.cpp && size a.out && ./a.out
   text	   data	    bss	    dec	    hex	filename
   3609	    680	  35288	  39577	   9a99	a.out
Bin: Total hits=90848
Bin: Elapsed=0.128281
$ 
$ g++ -std=c++11 -DUSE_MAP -O2 foo.cpp && size a.out && ./a.out
   text	   data	    bss	    dec	    hex	filename
   6159	    776	    352	   7287	   1c77	a.out
Map: Total hits=90848
Map: Elapsed=0.04482
$ g++ -std=c++11 -DUSE_BSEARCH -O2 foo.cpp && size a.out && ./a.out
   text	   data	    bss	    dec	    hex	filename
   3367	    664	  35288	  39319	   9997	a.out
Bin: Total hits=90848
Bin: Elapsed=0.102001

The binary search has the edge when it comes to code space.

Performance wise, optimisation has a BIG effect on the map. The compiler does a good job of reducing down the template code bloat into what's strictly necessary.

The bss size is large for binary search, because of the large static array.
However, valgrind reports that the map allocates ~200K of memory, approx 5x times that of the static array.
Just installed VS2015. Now I may play with unordered map. In case this has no noticable effect I'll stay with Dave's binary search -- or... maybe I'll knit something similar, using as "hash function" nothing but % 16 and a pre-selection bitset. But not tomorrow. And only if unordered map did not pan out.
Registered users can post here. Sign in or register to post.
Pages: 12