Modifying and converting a txt file to a csv file.

I need help in modifying and converting a txt file that is written with the vertical bar ("|"). The first step is to search in each line of the file and find a in each field between the "|" and find words with double quotes and put double quotes around it. Ex 1. "quote" is supposed to look like ""quote"". The next step is to put double quotes in the whole sentence or if there is a comma. Ex 2.1: |Beginning "quote"|. It is supposed to look like |"Beggining ""quote"""|. Ex 2.2: If what's between the vertical bars contains a comma like |Seattle, Washington|, it is supposed to look like |"Seattle, Washington"|. The last step is just to change the vertical bar into a comma but I believe I can do that myself. I am not supposed to use the library sstream since the professor wants us to create the code ourselves. Not all of them need to be solved but if I can get the help with just one, I will be very thankful. Thank you.

I do not have a code because it is way too long and do not know what part to show.
Last edited on
is this for fun and practice or practical application? Because excel will let you CHOOSE the delimiter when you import a text file -- you can turn off comma and turn on pipe and just import it without writing a program.
@jonnin This is for a project I am working on. I have to write a program for it.
I do not have a code because it is way too long and do not know what part to show.

Well you're going to need to show relevant content, for example you need to show a small sample of your input file, 10 complete records would be adequate. A small minimal program that shows how you're trying to open the file, read the file, and opening the output and outputting the data in comma separated format. You will also need to show what you expect the output to be with the supplied input.

ah, the old 'professor made us do it in C' routine. A lot of that going around lately.

"quote" : this one can be solved with a master replacer. If the whole file were in one big string variable, you can just do a string replace of " with "" blanket, I believe (?). Check to see if this works for your data. If you did that, you can do | to comma later the same way.

Ex 2.1 is nonsense: it has mismatched quotes, it should be "Beginning ""quote""" and makes no sense as given. Clarify this!

the comma is no special case here. if you put "" around the whole thing, comma inside or not, its covered. this is red herring requirement as far as I can tell.

so that leaves me thinking to dump it all to one big string, find the " and replace with "" (important order of operations), then replace all | with "|". Next replace all end of lines with "+end of line + ". I believe this covers all the requirements. Do you agree? Be aware: windows end of line is usually 2 characters, \r and \n. Look at your file to see what it uses in a hex editor so this final replacement is correct!


** the above assumes I am right about the """ vs "" 2.1 problem.
Last edited on
I'd just read the input file one character at a time and run it through a finite state machine. No strings needed at all.
I just wrote some code for this and there are some wicked special cases. Consider this input:
"quote"|no quote|Beginning "quote"|Seatle, Washington|
This line|has a "quoted | bar" in|the middle
Field 1|This field "has a quoted
newline" in it| field 3
Empty fields|||there
|starts with empty field
|
||
one field

blank line above
File ends without a newline

The cases it shows are:
- quoted "|" character
- quoted newline
- line ending with |
- line beginning with |
- empty fields
- blank line
- line consisting of a single bar
- line consisting of two bars
- No newline at the end of file

I believe the correct output is:
"""quote""","no quote","Beginning ""quote""","Seatle, Washington",
"This line","has a ""quoted | bar"" in","the middle"
"Field 1","This field ""has a quoted
newline"" in it"," field 3"
"Empty fields",,,"there"
,"starts with empty field"
,
,,
"one field"

"blank line above"
"File ends without a newline"

My code uses a finite state machine and two boolean state variables. One indicates whether you're inside a quote. The other indicates whether you're at the beginning of a field. The whole program is less than 50 lines of code.
Topic archived. No new replies allowed.