Need assistance with input from file with delimiters

I am working on an assignment and I have to read in data from a file about chemicals and store the data as chemical objects in an array. I'm having trouble regarding the format of some of the lines of the file. Here is a small sample:

Chemical,CAS #/Comp ID,Clean Air Act Chemical,Classification,Metal,Metal Category,Carcinogen,Form Type,Unit of Measure,Risk Factor
STYRENE,100425,YES,NON-PBT,NO,0,YES,R,Pounds,Unknown
N-METHYLOLACRYLAMIDE,924425,NO,NON-PBT,NO,0,NO,R,Pounds,Unknown
"TOLUENE-2,4-DIISOCYANATE",584849,YES,NON-PBT,NO,0,YES,A,Pounds,Unknown
"HYDROCHLORIC ACID (1995 AND AFTER ""ACID AEROSOLS"" ONLY)",7647010,YES,NON-PBT,NO,0,NO,R,Pounds,Unknown


The first line I am ignoring, and the majority of the lines in the file look like 2 and 3. My entire program is actually finished, and if I use an input file that only has lines that are similar to 2 and 3 (and 5), and I parse the data using the comma as a delimiter, it works fine. However, some of the lines look like line 4 above, where there are commas in the chemical name, so I cannot use the comma as a delimiter for these lines or the full name will not be stored in the name variable, and then all the other variables for that chemical object contain wrong data. However, all chemicals that have commas in their name have quotes around the name. So I need to be able to read in the name of the chemical using something other than a comma delimiter when a quotation mark is encountered. BUT, some lines of the file look like line 5 above. So I cannot just read in the name of the chemical until I reach another quotation mark, because some chemicals have quotation marks within the name (and I need to include that entire data).

I've considered using the peek() function and if a quotation mark is encountered use the get() function to read in the quotation mark, but not store it (as I don't want the names of chemicals to start or end with quotes). But I can't figure out how to get the program to read in all of the name. The one consistency with lines 4 and 5 are that if a quotation mark is followed by a comma (",) then that denotes the end of the chemical name. But the peek() function only accepts a single character so I can't just read in data until ", is peeked. I have absolutely no ideas at this point what to do.
State NORMAL: read one character at a time, if comma or line end is encountered, token read end, do not add comma to result. If quotes are encountered, do not add quotes to the result, change state to INSTRING.

State INSTRING: read one char at a time, if quotes are encountered, peek next symbol. If it is a quotes too, read it, put single quotes to result. If not — do not add quotes to the result, change state to NORMAL.

Hope it helps
Last edited on
Topic archived. No new replies allowed.