Reading in one character at a time

Hey guys, I have something where we have to solve how to read in chemical formulas and use a binary search to find the specific elements and multiply them to get the correct answer for atomic weight. Please please please be nice...I have gotten some rough people on here. This is only my second semester and I am trying to study as best I can and google things before coming to forums.

Instructions:

The Chemistry Dept. has asked you to develop a program that will calculate molecular weights of compounds given the chemical formula. For example H2O (water) would contain 2 hydrogen atoms weighting 1.008 and 1 oxygen weighting 15.999. The weight of water is 2 x 1.008 + 1 x 15.999 yielding 18.015 atomic weight.

The input to you program will be in the following form:
element ( number of atoms) element (number of atoms) ...
for example H2O would be represented H(2)O. If the ()'s are not present, assume one (1) atom. Another example acetic acid would be represented CH(3)COOH or C(2)H(4)O(2).

If an element has two letters symbol representation ie Silver is Ag, the second letter will be lower case indicating it is part of the symbol representation.

A list of all the chemical elements can be found in the data file 'Element.dat'. There is one input element per line where the element name appears first followed by its atomic weight.

Ex. Al 26.98
Sb 121.75
S 32.06
Ba 137.34
...

A second input file 'Formula.dat' contains the test formulae to use in testing your program. There will be one formula per line. For the output, print out the formula and its Molecular weight in a nice table form (ie line up the columns).


Restrictions: You are to use an array of structure to hold the Symbol and its weight. ‘Formula.txt’ file.
Use a Binary search to look up elements in the element table.
You are to use Functions/Procedures in your implementation.
Format your output in a table form (ie headings and straight columns)

Element.txt file:

H 1.008
C 12.011
N 14.0067
O 15.9994
F 18.9984
Na 22.9898
Mg 24.305
Al 26.9815
Si 28.086
P 30.9738
S 32.06
Cl 35.453
K 39.102
Ti 47.9
V 50.9414
Cr 51.996
Fe 55.847
Ni 58.71
Zn 65.37
As 74.9216
Se 78.96
Br 79.904
Ag 107.868
I 126.905
Te 127.6
Pt 195.09
Au 196.967
Tl 204.37
Th 232.038
U 238.029


formula text file:

Na(3)P(5)O(4)
TeCl(4)
V(2)O(5)
HCONHCH(2)CH(2)OH
ZnC(4)H(6)O(4)
UO(2)N(2)O(6)H(2)O
FeTiO(3)
MgC(4)H(6)O(4)
C(16)N(33)OH
AgSiF(6)MgOH(6)O(3)
NiSO(4)N(2)H(4)SO(4)H(12)O(6)
Tl(2)S(3)O(12)H(14)O(7)
CBr(4)
ZnCO(3)
UO(2)CO(3)
ThS(2)O(8)
H(4)P(2)O(5)
H(2)SeO(5)
K(2)Cr(2)O(7)
K(2)PtI(6)

***********************************************************************
You are to assume if there is no {} then it is considered one. We are supposed to do this as an array of structs. We HAVE NOT learned pointers, vectors, or other advanced code. We just learned structs this week.

I have already made a struct called element properties that stores the element name and atomic weight and it has already been sorted lowest to highest for the binary search. I also read each formula as a complete line and stored them in a string array just so they could be displayed in my table.

My problem is: we are supposed to read the file for the formula in one character at a time, some are one uppercase letter and some are an uppercase with a lowercase. Also I have to read the (num) inside the punctuation and be able to use that to know how many times to multiply. How do I read one at a time and flag it so that the system knows to either temporarily store it in an array (my thought) or as soon as its read, use the binary search to look up, but still be able to read the ().


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
struct elementProp  //element properties 
{
    string name;
    float weight;
};

//writing all the formulas to an array 
string formulas[25];

for(i=0;i<25;i++)
{
  getline(infile2,formula);
  formulas[i]=formula;
  cout<<"     "<<formulas[i]<<endl;
}


Someone else online shows doing this, but it only reads it as complete
strings. //someone else's code, not mine.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
void StringMake()
{
string NEL;
char FR;
inputFile2 >> FR;
while(inputFile2.get(FR))
{
if (FR == '\n')
{
break;
}
else if (islower(FR))
{
NEL = NEL + FR;
}
if (isupper(FR))
{
NEL = NEL + FR;
}
if (isdigit(FR))
{
NEL = NEL + FR;
}
if (ispunct(FR))
{
NEL = NEL + FR;
}
}
}
What do you do for H(23)S(11) or something with 2 digits worth of atoms (dunno if such really exist, my chemistry is rusty... but some carbon things get fat).


you can read one letter with char ch; cin >> ch;

hang on... rereading

I do not see the 1 letter at a time restriction, and that is a self imposed complexity that makes the problem MUCH harder.
read a string and parse it.
something like
for each letter
if next letter is lowercase, consume it also
look up element
check if next thing is (
get integer
multiply
etc

you have to load your array up front so you have it to search.
Last edited on
Use getline() to read successive whole lines into a string - each will be a molecule.

Split this line/molecule into new strings at each UPPER-case letter - these will be individual element contributions.

Use string find routines to split off any bracketed quantity in that element contribution (if there is one). If you stream in the bracketed substring as char - int - char then you will pick up the multiplier.

Personally, I'd store element symbol and atomic weight in a map<string,double>, but it's up to you.

I definitely wouldn't bother with your friend's code.
Last edited on
I cant use any sort of vector map since we haven't learned those. Could you guys show me an example or point to what I need to look up to parse a string? The teacher hasnt showed us how to do any of that. Chegg study has all the solutions wrong as well. Thank you so much for replying.
On the one character at a time restriction, he didn't write it in the instruction sheet, but its on the lecture instructions for blackboard. I'll post it here.

A note: You have to read in the 'Formula' data file one character at a time. After reading one char, say a element, you don't know if it is one char or two char name. You also don't know if a number in parentheses follows the element or not. So to help you see what the next char is (without actually reading the next char) C++ has a 'peek' command that will return the next char form the input file, but does not advance the pointer in the input file, ie read the next char without reading the next char. Syntax khar = infile.peek() So by seeing the next char on your input , may save you some coding in solving the Molecular program. Cool!

So I'm assuming it is required since he mentions it here.
I'm trying to figure out how to use the peek function to be able to check before writing it? and then what do I do with it if it meets certain flags in the multiple if statements?
Using peek() smacks of voyeurism. I should do it the easy way and read a line at a time. That will give you the whole of a molecule.
1
2
3
4
5
6
7
8
int main()
{
 char x[100] = {0};
 int dx = 0;
while(cin>>x[dx++]); //required: user must type ctrl z in windows, or whatever it is in unix
 
	cout << x << endl; 
}


there are other ways. this one is simple.

or with string, you read a char variable and += that to the string every time. same thing. So used to seeing school work using c-strings that I used that here.
Last edited on
So I found a code online and I am trying to deciper it to see what they did. What's weird to me on the code is that only 4 of the values are right since I manually calculated this myself.

This is not mine, I found it on Chegg to help. I just need some explanations on what is going on in this code. I tested the getname function and saw what was coming out was only the single uppercase elements so I made a version of my own to return Ag, etc... but I cannot get the digits working no matter what I do. We haven't learned how to pass a string and then pass back an int so not sure how this works, but the numbers are wrong. It should be 2. 3. 1, 2 for the first line, but when I remove all the *10+in[i]-48 crap it returns like 50, 52, 49....big numbers for some reason.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
#include <iostream>
#include <string>
#include <fstream>
#include<iomanip>

using namespace std;

struct element {
string name;
double weight;
};

/******************************************/
int getchemicals(element[]);
int binarySearch(element [],int ,string );
int docompounds(element[],int);
string getname(string,int&);
int getnum(string,int&);
/********************************************/

int main ()
{
element chemical[50];
int numchemicals=0;


numchemicals=getchemicals(chemical);

   
docompounds(chemical,numchemicals);


   return 0;
}
int getnum(string in,int& i)
{int num=0;
i++;
while(in[i]!=')')
    {num=num*10+(in[i]-48);  //this weird calculation gives me the wrong numbers 
    i++;
    }
return num;
}
int docompounds(element e[],int n)
{ifstream in;
string input,c;
int num,i,index;
double weight=0;
in.open("C:\\data\\input\\Formula.txt");      
if(in.fail())          
   { cout<<"formula file did not open please check it\n";
   system("pause");
 
   }
cout<<"compound\tweight\n";
in>>input;
while(in)
   {
   cout<<setw(15)<<left<<input;
   weight=0;
    c="";
    for(i=0;input[i]!='\0';i++)
        {if(isupper(input[i]))
             {c=getname(input,i);
             //cout<<c<<" "<<i<<endl;
             index=binarySearch(e,n,c);
             //cout<<"***"<<index<<endl;
             num=1;
              }
         else
            if(input[i]=='(')
                {num=getnum(input,i)-1;
                //cout<<num<<"^^^^ "<<i<<endl;  //I use this occasionally to check my num
              
                }
        //cout<<"&&&"<<e[index].weight*num<<" "<<num<<"%%%%%"<<weight<<endl;
        weight+=e[index].weight*num;
        //cout<<"&&&"<<e[index].weight*num<<" "<<num<<"@@@@@"<<weight<<endl;      
        }
    cout<<setw(15)<<left<<" "<<weight<<endl;
    in>>input;
   }
in.close();
return 0;
}
string getname(string in,int& i)
{string c;
c=in[i];
i++;
if(isupper(in[i])||in[i]=='(')
    {i--;
    c[1]='\0';
     return c;
     }
if(islower(in[i]))
    {c[1]=in[i];
     c[2]='\0';
     i++;
     return c;
     }
}
  
   


int getchemicals(element c[])
{ifstream in2;
int n=0,i,j;
element t;
in2.open("C:\\data\\input\\Element.txt");      
if(in2.fail())          
   { cout<<"element file did not open please check it\n";
   system("pause");
   return -1;
   }
in2>>c[n].name;
while(in2)
{in2>>c[n].weight;
   in2.ignore();
   n++;
   in2>>c[n].name;
   }
   
   //bubble sort the names 
for(i=0;i<n-1;i++)
    for(j=i+1;j<n;j++)
        if(c[i].name.compare(c[j].name)>0)
            {t=c[i];
            c[i]=c[j];
            c[j]=t;
            }
//for(i=0;i<n;i++)
    //cout<<c[i].name<<" "<<c[i].weight<<endl;
in2.close();
return n;
}
int binarySearch(element c[],int max,string key)
{int low=0,mid;
//cout<<key<<endl;
max--;
while(low<=max )
    {mid=(low+max)/2;
    //cout<<"*"<<c[mid].name<<"*"<<key<<"*\n";
    if(c[mid].name.compare(key)<0)
       low = mid + 1;
    else
        {if( c[mid].name.compare(key)>0 )
            max = mid - 1;                  
        else
           return mid;
        }
     }
return -1;
}



compound                                     weight
Al(2)O(3)SiO(2)                              112.118
NiC(4)H(6)O(4)                               126.072
Na(2)VO(4)                                    156.959
Na(3)P(5)O(4)                                260.887
TeCl(4)                                           0
V(2)O(5)                                        181.88
HCONHCHCOH                               88.0865
ZnC(4)H(6)O(4)                            70.0456
UO(2)N(2)O(6)H(2)O                      412.053
FeTiO(3)                                           85.995
MgC(4)H(6)O(4)                              70.0456
C(16)N(33)OH                                  672.413
AgSiF(6)MgOH(6)O(3)                       168.037
NiSO(4)N(2)H(4)SO(4)H(12)O(6)       314.2
Tl(2)S(3)O(12)H(14)O(7)                   414.281
CBr(4)                                             12.011
ZnCO(3)                                          47.9982
UO(2)CO(3)                                    330.037
ThS(2)O(8)                                    127.995
H(4)P(2)O(5)                                145.977
H(2)SeO(5)                                   162.316
K(2)Cr(2)O(7)                              226.233
K(2)Pt(1)I(6)                               932.552




should be:

Formula:                           Molecular Weight:

Al(2)O(3)SiO(2)                    162.046
NiC(4)H(6)O(4)                     176.7996
Na(2)VO(4)                          160.9186
Na(3)P(5)O(4)                      287.836
TeCl(4)                                 269.412
V(2)O(5)                              181.8789
HCONHCHCOH                      89.0945
ZnC(4)H(6)O(4)                   183.4596
UO(2)N(2)O(6)H(2)O            412.053
FeTiO(3)                              151.7452
MgC(4)H(6)O(4)                   1034.721
C(16)N(33)OH                      142.3946
AgSiF(6)MgOH(6)O(3)           294.1918
NiSO(4)N(2)H(4)SO(4)H(12)O(6)    
Tl(2)S(3)O(12)H(14)O(7)              
CBr(4)                                            etc....
ZnCO(3)                                          
UO(2)CO(3)                                     
ThS(2)O(8)                                 
H(2)SeO(5)                                      
K(2)Cr(2)O(7)                                  
K(2)Pt(1)I(6)                               


****my code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
string getname(string input,int& i)  //I changed this so if its lower it will add the upper to it.
{
   string c;
   c=input[i];
   i++;
if(isupper(input[i]))
    {
    
       c=input[i];
    }
   
if(islower(input[i]))
    {
     c=c+input[i];
    
     i++;
    
     }
   return c;
}

infile2>>input;
while(infile2)
{
    
    for(i=0;input[i]!='\0';i++)
    {
    if(isupper(input[i])||islower(input[i]))
      {
        c=getname(input, i);
//cout<<c<<" ";                              //this lets me see what outputs and its Al O Si O...
             index=binarySearch(All_elements,30,c);
             //cout<<"***"<<index<<endl;
             digit=1;
       }
    
    
    else if(input[i]=='(')
          {
           digit=getNum(input,i)-1;
           
          }
Last edited on
the character '1' does not have an integer value of 1.
you can use built in stuff:
stoi and stod (string to int, string to double) for this.
you can do it yourself too -- a hand rolled integer one is faster than built in one esp if you have a subset of the full problem (eg, only base 10 inputs) but writing your own for doubles is best avoided without a very good reason.

if you only have 1 digit, the numeric value is simply charval-'0' .. making '0' give zero, and so on.
Last edited on
Well, feel free to add
- structs instead of map
- binary search (preceded by sorting) instead of map
- single character read and peek() instead of getline and strings

Or not.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
#include <iostream>
#include <fstream>
#include <sstream>
#include <iomanip>
#include <string>
#include <map>
#include <cctype>
using namespace std;

map<string,double> elements;                     // Global (since one-of) map of atomic weights

//======================================================================

void getElements()                               // Read atomic weights into map
{
   string symbol;
   double weight;
   ifstream in( "element.dat" );
   while( in >> symbol >> weight ) elements[symbol] = weight;
}

//======================================================================

double elementWeight( string symbol )            // Deal with one element (possibly with multiplier)
{
   int multiplier = 1;                           
   int p = symbol.find( '(' );                   
   if ( p != string::npos )
   {
      stringstream( symbol.substr( p + 1 ) ) >> multiplier;
      symbol = symbol.substr( 0, p );
   }
   return multiplier * elements[symbol];      
}

//======================================================================

double molecularWeight( const string &molecule ) // Deal with one molecule
{
   double weight = 0.0;
   int p = 0, q = 1;
   while ( q < molecule.size() )
   {
      if ( isupper( molecule[q] ) )
      {
         weight += elementWeight( molecule.substr( p, q - p ) );
         p = q;
      }
      q++;
   }
   weight += elementWeight( molecule.substr( p, q - p ) );
   return weight;
}

//======================================================================

int main()
{
   getElements();
// for ( auto e : elements ) cout << e.first << ' ' << e.second << '\n';

   ifstream in( "formula.dat" );
   for ( string molecule; getline( in, molecule ); ) 
   {
      cout << setw( 30 ) << left << molecule << " " << molecularWeight( molecule ) << '\n';
   }
}


Al(2)O(3)SiO(2)                162.046
NiC(4)H(6)O(4)                 176.8
Na(2)VO(4)                     160.919
Na(3)P(5)O(4)                  287.836
TeCl(4)                        269.412
V(2)O(5)                       181.88
HCONHCHCOH                     86.0705
ZnC(4)H(6)O(4)                 183.46
UO(2)N(2)O(6)H(2)O             412.053
FeTiO(3)                       151.745
MgC(4)H(6)O(4)                 142.395
C(16)N(33)OH                   671.405
AgSiF(6)MgOH(6)O(3)            344.295
NiSO(4)N(2)H(4)SO(4)H(12)O(6)  390.963
Tl(2)S(3)O(12)H(14)O(7)        823.021
CBr(4)                         331.627
ZnCO(3)                        125.379
UO(2)CO(3)                     330.037
ThS(2)O(8)                     424.153
H(2)SeO(5)                     160.973
K(2)Cr(2)O(7)                  294.192
K(2)Pt(1)I(6)                  1034.72
Last edited on
Topic archived. No new replies allowed.