Binary Search issue

So my program is working how I want it to and all while using a linear search. But I'd like to use a binary search instead but it isn't working properly and I can't seem to find where the issue lies. I've already sorted and everything.


https://pastebin.com/3pXCp5iE
Last edited on
May I see a sample “formula.txt” file?
https://pastebin.com/AHnpfPGU Here it is. My linear search works fine with all of it is that is confusing me.
Last edited on
And element.txt? Don't wait until someone ask you ...
Names make a difference, and keeping like things together makes a difference.

For example, your files "element.txt" and "elements.txt": what clue is there to what difference there is between them? Unless you have read and understand the code or have opened the files there isn’t any. Be as descriptive as possible. Perhaps one of the following?

  element-abbreviations-and-weights.txt
  periodic-table-NameAbbreviation-AtomicWeight.txt
  periodic-table-extract.txt
  input-element-info.txt

In the same vein, you have very generic names for your input variables: inf and inf2. Hint: whenever you find yourself reusing names, you are making a mistake. Back to this in a second.

You have global fstream objects, which appear to have been moved globally because of the read() function's need to access inf.

At this point, all these generic names leave the reader wondering what any of it does.
Since read() and inf have the single purpose of reading your input “element.txt” file, why not move everything to that function?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
struct Element
{
  std::string abbreviation;
  double      weight;
};

vector <Element> read_elements()
{
  vector <Element> elements;
  std::ifstream f( "element-abbreviations-and-weights.txt" );
  Element element;
  while (f >> element.abbreviation >> element.weight)
    elements.emplace_back( element );
  return elements;
}

If you MUST stick to an array[], then make sure you cannot read more than there is available space to store them. A periodic table has 118 elements. You reserved space for 30. You did this, of course, because you already know your input contains 30 elements or fewer. But what happens when you send it off to your professor and he uses the entire 118-element periodic table as input? Your program fails, and he marks you with a low grade. Be pedantic about buffer overflow:

1
2
3
const int Maxumum_Number_Of_Elements = 118;
Element elements[ Maxumum_Number_Of_Elements ];
int number_of_elements = 0;
1
2
3
4
5
6
7
8
9
10
11
int read_elements( Element elements[], const int max )
{
  std::ifstream f( "element-abbreviations-and-weights.txt" );
  int n = 0;
  while (f >> elements[n].abbreviation >> elements[n].weight)
  {
    n += 1;
    if (n == max) break;  // IF I RUN OUT OF ROOM, I MUST BE DONE. (I could have thrown an error, if it mattered.)
  }
  return n;
}

Notice how f is never explicitly closed? It is closed when it goes out of scope, just like it should. And it does not exist anywhere else in the program — just in the one small function it needs to. This highlights one of the errors with using global variables in general, and in this case, a global file stream: you forgot to close inf2. It only closes when your program terminates. For this program, of course, it is not an error, but in larger projects it can become a serious resource leak.

Notice also how I have written the elements[] array and the related information together? In your code, not only do you have the array of elements and the number of elements in that array declared in totally different spots, you have named the number of elements “max”, which in most minds means the maximum size of the array, leading your readers to misunderstand the intent of the code.

NAMING MATTERS.

It is the reason you are posting here. Your sort() algorithm sorts the array of elements by weight. But your binary search (called “locate”) searches by name (“elem”). See what went wrong?

65
66
67
68
69
70
71
	sort(elements, max);
  
  // Let's look at how these were sorted (verifying they were sorted properly)
  std::cout << "-------\n";
  for (int n = 0; n < max; n++)
    std::cout << elements[n].elem << " " << elements[n].weight << "\n";
  std::cout << "-------\n";

By running that code, I see that your Bubble sort worked correctly, but that you sorted on the wrong thing. A simple fix:

180
181
//        if(a[x].weight > a[x + 1].weight){
        if (a[x].elem > a[x + 1].elem){  // sort on name, not weight 

Some simple name improvements might have obviated the issue:

1
2
void sort_by_weight( Element elements[], int number_of_elements );
int  binary_search_for_name( Element elements[], int number_of_elements, std::string name );

Hmm, binary searching for the name in an array sorted by weight? Non-sequitur.

I will stop here. The way you are reading the file, and all the single-letter variable names to do it, works, but it is not helping you any. But that is a topic for another thread.

Oh, I almost forgot, you do not need to #include <stdlib.h> if you have already #included <cstdlib>. (And you do not need ctime/time.h.)

Hope this helps.
You're sorting by weight and then trying to do a binary search by name. The sort key and search key must be the same for binary search to work.

You have some off-by-one errors in your binary search function (locate()). Define low and high such that the element you're looking for is inside the range [low,high). So low is the lowest known index that might contain the element and high is the highest known index that does not contain it. Now fix the code to ensure this invariant.
elements-by-abbreviation-and-atomic-weight.txt:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
H  1.008     Mo 95.95     Ag 107.87    Sb 121.76
Li 6.94      W  183.84    Au 196.97    Bi 208.98
Na 22.990    Sg 269       Rg 282       Mc 290
K  39.098    Nd 144.24    Tb 158.93    Tm 168.93
Rb 85.468    U  238.03    Bk 247       Md 258
Cs 132.91    Mn 54.938    Zn 65.38     O  15.999
Fr 223       Tc 97        Cd 112.41    S  32.06
Be 9.0122    Re 186.21    Hg 200.59    Se 78.971
Mg 24.305    Bh 270       Cn 285       Te 127.60
Ca 40.078    Pm 145       Dy 162.50    Po 209
Sr 87.62     Np 237       Cf 251       Lv 293
Ba 137.33    Fe 55.845    B  10.81     Yb 173.05
Ra 226       Ru 101.07    Al 26.982    No 259
Sc 44.956    Os 190.23    Ga 69.723    F  18.998
Y  88.906    Hs 269       In 114.82    Cl 35.45
La 138.91    Sm 150.36    Tl 204.38    Br 79.904
Ac 227       Pu 244       Nh 286       I  126.90
Ti 47.867    Co 58.933    Ho 164.93    At 210
Zr 91.224    Rh 102.91    Es 252       Ts 294
Hf 178.49    Ir 192.22    C  12.011    Lu 174.97
Rf 267       Mt 278       Si 28.085    Lr 266
Ce 140.12    Eu 151.96    Ge 72.630    He 4.0026
Th 232.04    Am 243       Sn 118.71    Ne 20.180
V  50.942    Ni 58.693    Pb 207.2     Ar 39.88
Nb 92.906    Pd 106.42    Fl 289       Kr 83.798
Ta 180.95    Pt 195.08    Er 167.26    Xe 131.29
Db 268       Ds 281       Fm 257       Rn 222
Pr 140.91    Gd 157.25    N  14.007    Og 294
Pa 231.04    Cm 247       P  30.974
Cr 51.996    Cu 63.546    As 74.922
Hello,

it's my first program with chemical elements. Interesting :) So I did a little work:

* reformating elements-by-abbreviation-and-atomic-weight.txt so it can easy parsed
* move code for parsing the files into function
* new datatypes for elements, groups and Molecule
* complete rewrite parser for formular.txt as a line-by-line state machine
* replace erroneous C-like array with std::vector
* sort elements with std::sort function
* short interface for sorting elements by weight and name
* find elements with std::find_is for the linear search
* find elements with std::lower_bound for the binary search
* add checks for properly sorted data
* reorder program
1) readin elements
2) readin formular
3) then do the search
* litte bit code for nice output
* litte bit error handling with exceptions
* and a nice C++17 interface with std::optional for std::find_if and std::lower_bound

I also found a format for chemical species named "SMILES" https://en.wikipedia.org/wiki/Simplified_molecular-input_line-entry_system

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
#include <iostream>
#include <fstream>
#include <vector>
#include <algorithm>
#include <optional>
#include <cassert>

struct Element {
    std::string name;
    double weight = 0;
};

struct ElementGroup {
    std::string name;
    int number;
};

struct Molecule : std::vector<ElementGroup> {

};

std::ostream& operator<<(std::ostream& s, const Molecule& mol) {
    for(const ElementGroup& ele : mol) {
        s << ele.name;
        if(ele.number > 1) {
            s<< "(" << ele.number << ")";
        }
    }
    return s;
}


std::vector<Element> readElements(const std::string& fileName) {
    std::ifstream fs;
    fs.open(fileName);
    if(not fs) throw std::system_error(errno, std::system_category(), "failed to open " + fileName + " ");

    std::string name;
    double weight;
    std::vector<Element> elements;
    while(fs >> name >> weight){
        elements.push_back(Element{name, weight});
    }

    return elements;
}

std::vector<Molecule> readFormula(const std::string& fileName) {
    std::ifstream fs;
    fs.open(fileName);
    if(not fs) throw std::system_error(errno, std::system_category(), "failed to open " + fileName + " ");

    std::vector<Molecule> molecules;
    std::string line;
    while(fs >> line) {
        // format: Al(2)O(3)SiO(2)
        // The end of an atom is eiter indicated by '(' or an upper case char
        // state 0: parse name
        // state 1: parse number

        Molecule mol;
        int state = 0;
        std::string substring, atomstring;
        for(char c : line) {
            if(state == 0) {
                if(c == '(') {
                    atomstring = substring;
                    substring.clear();
                    state = 1;
                } else if(std::isupper(c)) {
                    if(not substring.empty()) {
                        mol.push_back(ElementGroup{substring, 1});
                        substring.clear();
                    }
                    substring.push_back(c);
                } else if(std::islower(c)) {
                    substring.push_back(c);
                } else {
                    std::cout << "corrupt format: " << line << "\n";
                }
            } else if(state == 1) {
                if(c == ')') {
                    int number = std::stoi(substring);
                    mol.push_back(ElementGroup{atomstring, number});
                    atomstring.clear();
                    substring.clear();
                    state = 0;
                } else {
                    substring.push_back(c);
                }
            }
        }

        if(state == 1) {
            std::cout << "missing closing ')'\n";
        }

        if(not substring.empty()) {
            mol.push_back(ElementGroup{substring, 1});
        }

        molecules.push_back(mol);
    }

    return molecules;
}

void sort_weight(std::vector<Element>& elements) {
    std::sort(elements.begin(), elements.end(), [](const Element& lhs, const Element&rhs){ return lhs.weight < rhs.weight; });
}

void sort_name(std::vector<Element>& elements) {
    std::sort(elements.begin(), elements.end(), [](const Element& lhs, const Element&rhs){ return lhs.name < rhs.name; });
}

std::optional<Element> find(const std::vector<Element>& elements, const std::string& word) {
    auto res = std::find_if(elements.begin(), elements.end(), [&](const Element& element){ return element.name == word; } );
    if(res != elements.end()) {
        return *res;
    } else {
        return {};
    }
}

std::optional<Element> find_binary(const std::vector<Element>& elements, const std::string& word) {
    // check is elements are properly sorted, otherwise the binary search std::lower_bound dosen't work
    assert(std::is_sorted(elements.begin(), elements.end(), [](const Element& lhs, const Element&rhs){ return lhs.name < rhs.name; }));

    auto res = std::lower_bound(elements.begin(), elements.end(), word, [](const Element& element, const std::string& word){ return element.name < word; } );
    if(res != elements.end()) {
        return *res;
    } else {
        return {};
    }
}


int main() {
    try {
        std::vector<Element> elements = readElements("/home/ite/elements-by-abbreviation-and-atomic-weight.txt");

        //        sort_weight(elements);
        //        std::cout << "Sort elements by weight\n";
        //        for(const Element& ele : elements) {
        //            std::cout << ele.name << " " << ele.weight << "\n";
        //        }
        //        std::cout << "\n";

        sort_name(elements);
        //        std::cout << "Sort elements by name\n";
        //        for(const Element& ele : elements) {
        //            std::cout << ele.name << " " << ele.weight << "\n";
        //        }
        //        std::cout << "\n";

        std::vector<Molecule> molecules = readFormula("/home/ite/formula.txt");

//        std::cout << "formulas\n";
//        for(const Molecule& mol : molecules) {
//            std::cout << mol << "\n";
//        }

        std::cout << "calc molecul weight\n";
        for(const Molecule& mol : molecules) {
            double sumWeight = 0;
            for(const ElementGroup& ele : mol) {
                if(std::optional<Element> res = find_binary(elements, ele.name); res.has_value()) {
                    sumWeight += res.value().weight;
                } else {
                    throw std::runtime_error("Element not found in database: " + ele.name);
                }
            }
            std::cout << mol << " " << sumWeight << "\n";
        }

        return EXIT_SUCCESS;
    }
    catch(std::system_error& ex) {
        std::cerr << ex.what() << "\n";
    }
    catch(std::runtime_error& ex) {
        std::cerr << ex.what() << "\n";
    }

    return EXIT_FAILURE;
}
elements-by-abbreviation-and-atomic-weight.txt

H 1.008
Li 6.94
Na 22.990
K 39.098
Rb 85.468
Cs 132.91
Fr 223
Be 9.0122
Mg 24.305
Ca 40.078
Sr 87.62
Ba 137.33
Ra 226
Sc 44.956
Y 88.906
La 138.91
Ac 227
Ti 47.867
Zr 91.224
Hf 178.49
Rf 267
Ce 140.12
Th 232.04
V 50.942
Nb 92.906
Ta 180.95
Db 268
Pr 140.91
Pa 231.04
Cr 51.996
Mo 95.95
W 183.84
Sg 269
Nd 144.24
U 238.03
Mn 54.938
Tc 97
Re 186.21
Bh 270
Pm 145
Np 237
Fe 55.845
Ru 101.07
Os 190.23
Hs 269
Sm 150.36
Pu 244
Co 58.933
Rh 102.91
Ir 192.22
Mt 278
Eu 151.96
Am 243
Ni 58.693
Pd 106.42
Pt 195.08
Ds 281
Gd 157.25
Cm 247
Cu 63.546
Ag 107.87
Au 196.97
Rg 282
Tb 158.93
Bk 247
Zn 65.38
Cd 112.41
Hg 200.59
Cn 285
Dy 162.50
Cf 251
B 10.81
Al 26.982
Ga 69.723
In 114.82
Tl 204.38
Nh 286
Ho 164.93
Es 252
C 12.011
Si 28.085
Ge 72.630
Sn 118.71
Pb 207.2
Fl 289
Er 167.26
Fm 257
N 14.007
P 30.974
As 74.922
Sb 121.76
Bi 208.98
Mc 290
Tm 168.93
Md 258
O 15.999
S 32.06
Se 78.971
Te 127.60
Po 209
Lv 293
Yb 173.05
No 259
F 18.998
Cl 35.45
Br 79.904
I 126.90
At 210
Ts 294
Lu 174.97
Lr 266
He 4.0026
Ne 20.180
Ar 39.88
Kr 83.798
Xe 131.29
Rn 222
Og 294


formular.txt

Al(2)O(3)SiO(2)
NiC(4)H(6)O(4)
Na(2)VO(4)
Na(3)P(5)O(4)
TeCl(4)
V(2)O(5)
HCONHCH(2)CH(2)OH
ZnC(4)H(6)O(4)
UO(2)N(2)O(6)H(2)O
FeTiO(3)
MgC(4)H(6)O(4)
C(16)N(33)OH
AgSiF(6)MgOH(6)O(3)
NiSO(4)N(2)H(4)SO(4)H(12)O(6)
Tl(2)S(3)O(12)H(14)O(7)
CBr(4)
ZnCO(3)
UO(2)CO(3)
ThS(2)O(8)
H(4)P(2)O(5)
H(2)SeO(5)
K(2)Cr(2)O(7)
K(2)PtI(6)

Topic archived. No new replies allowed.