removing commas in web data

Hi, I was wondering how I could remove the commas and organize the data so its not all over the place in the console.

data I'm using is from this site https://www.kaggle.com/mylesoneill/game-of-thrones#battles.csv. the one I choose from is battles.csv



#include <iostream>
#include <string>
#include <array>
#include <fstream>

using namespace std;

int main() {
char comma;
string battles[975];

ifstream input;

string junk;

input.open("battles.csv");

if (input) {

for (int i = 0; i < 200; i++) {
input >> battles[i];
input.ignore();

cout << battles[i];

}

}


return 0;
}

Last edited on
I don't know how the *.csv files are been formatted, but here some code which removes all commas from a file:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#include <fstream>
#include <string>

int main()
{
    std::ifstream input("battles.csv");
    if (!input) return 1;

    std::string result;

    while( true )
    {
        char tmp = input.get();
        if( !input.good()) break;
        if (tmp == ',') continue;
        result += tmp;
     }
     // The whole file content without commas is now within 'result'.
}
Hello arodrigues245,

Why would you want to remove the commas from a CSV file? They are there for a reason. It does make reading the file easier.

Instead of removing the commas you need to learn how to read the file using "std::getline" with the proper usage being std::getline(input, name, ',');. This will read everything up to and including the comma, but it discards the comma.

Your use of "input >> battles[i]" will read up to the first white space and stop.
Given the line, which is only part of the line:
Battle of the Golden Tooth,298,1,Joffrey/Tommen Baratheon,Robb Stark
only "Battle" would be read. Using "getline" it would read
Battle of the Golden Tooth,
And leave you with
Battle of the Golden Tooth
And leave the file pointer ready to read the "298".

Two things I do not know is; What the input file you are using looks like and what exactly you want to do with the information?

Looking at the link I can see that each line has empty cells. Not a problem, but it does mean that you will need to account for the empty cells.

My first thoughts are to replace #include <array> , which you are not using, with #include <vector> , create a struct to hold the information and put the struct into the vector for later processing. Also using a string stream may be useful.

The first line of the table is where you can get the names of the variables to use in the struct. After that I do not know what is in the file that you are working with, line two may be something you will need to read and ignore to get to the real information.

Hope that helps,

Andy
Hello arodrigues245,

Forgot this last time.


PLEASE ALWAYS USE CODE TAGS (the <> formatting button), to the right of this box, when posting code.

It makes it easier to read your code and also easier to respond to your post.

http://www.cplusplus.com/articles/jEywvCM9/
http://www.cplusplus.com/articles/z13hAqkS/

Hint: You can edit your post, highlight your code and press the <> formatting button.
You can use the preview button at the bottom to see how it looks.

I found the second link to be the most help.



I put this together to give you an idea of what can be done with a CSV file.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
#include <iostream>
#include <string>
#include <array>
#include <fstream>
#include <chrono>
#include <thread>


//using namespace std;  // <--- Best not to use.

int main()
{
	std::string battles;

	std::string inFileName{ "battles.csv" };

	std::ifstream inFile(inFileName);

	if (!inFile)
	{
		std::cout << "\n File " << inFileName << " did not open" << std::endl;
		std::this_thread::sleep_for(std::chrono::seconds(3));  // <--- Needs header files chrono" and "thread".
		return 1;
	}

	while (std::getline(inFile, battles, ','))
	{
		std::cout << battles << std::endl;
	}

	// <--- Used mostly for testing in Debug mode. Removed if compiled for release.
	// <--- Used to keep the console window open in Visual Studio Debug mode.
	// The next line may not be needid. If you have to press enter to see the prompt it is not needed.
	//std::cin.ignore(std::numeric_limits<std::streamsize>::max(), '\n');  // <--- Requires header file <limits>.
	std::cout << "\n\n Press Enter to continue";
	std::cin.get();

	return 0;
}

The output I get is:

Battle of the Golden Tooth
298
1
Joffrey/Tommen Baratheon
Robb Stark
Lannister



Tully



win
pitched battle
1
0
15000

4000
Jaime Lannister



 Press Enter to continue:


The blank lines are intentional. This occurs when you see ",,,,,," in the file. Since the commas represent the ending of a cell it means there is nothing in that cell, but you still have to account for it.
This is a simple program, but it could be expanded on to do more. It is also a way to remove the commas and allow you to create a different (output file).

Hope that helps,

Andy
Proper csv file handling must deal with fields that are contained in double-quotes (and therefore may have commas in them that are not field separators, and may also have escaped double-quotes).

I have no idea whether your csv file has this extra complexity or not since I'm goingto join a site just to download a file. If you want help you should make it available without that hassle.
Last edited on
I have no idea whether your csv file has this extra complexity

Looking at the last field I would say that it probably does have extra complexity.

From the fourth record:
Roose Bolton, Wylis Manderly, Medger Cerwyn, Harrion Karstark, Halys Hornwood


I have no idea whether your csv file has this extra complexity or not since I'm not pathetic enough to join a site just to download a file.


I agree.

If you want help from the better programmers here, you should make it available without that hassle.

Or at least post a small sample (10 - 20 records should be sufficient for a start) of the file, inside code tags to preserve formatting, that illustrates most of the variations that are possible.

I'm never able to remove the snarkiest parts of my posts before someone quotes them! I apologize to Andy for suggesting it was "pathetic" to join the site to download the file. It was actually very kind of him to do that for you. But it is not very kind/intelligent of you to expect us to do that.

However, as penitence for my rudeness I joined the site and dl'ed the file, so here it is (all but last 7 lines). But I've discovered something a little odd. The line endings in the file are '\r' (the old Mac line ending). If you copy/paste the text below, you will probably end up with whatever the default line ending is on your system. But to read a line from the original file you need to say getline(file, line, '\r').
name,year,battle_number,attacker_king,defender_king,attacker_1,attacker_2,attacker_3,attacker_4,defender_1,defender_2,defender_3,defender_4,attacker_outcome,battle_type,major_death,major_capture,attacker_size,defender_size,attacker_commander,defender_commander,summer,location,region,note
Battle of the Golden Tooth,298,1,Joffrey/Tommen Baratheon,Robb Stark,Lannister,,,,Tully,,,,win,pitched battle,1,0,15000,4000,Jaime Lannister,"Clement Piper, Vance",1,Golden Tooth,The Westerlands,
Battle at the Mummer's Ford,298,2,Joffrey/Tommen Baratheon,Robb Stark,Lannister,,,,Baratheon,,,,win,ambush,1,0,,120,Gregor Clegane,Beric Dondarrion,1,Mummer's Ford,The Riverlands,
Battle of Riverrun,298,3,Joffrey/Tommen Baratheon,Robb Stark,Lannister,,,,Tully,,,,win,pitched battle,0,1,15000,10000,"Jaime Lannister, Andros Brax","Edmure Tully, Tytos Blackwood",1,Riverrun,The Riverlands,
Battle of the Green Fork,298,4,Robb Stark,Joffrey/Tommen Baratheon,Stark,,,,Lannister,,,,loss,pitched battle,1,1,18000,20000,"Roose Bolton, Wylis Manderly, Medger Cerwyn, Harrion Karstark, Halys Hornwood","Tywin Lannister, Gregor Clegane, Kevan Lannister, Addam Marbrand",1,Green Fork,The Riverlands,
Battle of the Whispering Wood,298,5,Robb Stark,Joffrey/Tommen Baratheon,Stark,Tully,,,Lannister,,,,win,ambush,1,1,1875,6000,"Robb Stark, Brynden Tully",Jaime Lannister,1,Whispering Wood,The Riverlands,
Battle of the Camps,298,6,Robb Stark,Joffrey/Tommen Baratheon,Stark,Tully,,,Lannister,,,,win,ambush,0,0,6000,12625,"Robb Stark, Tytos Blackwood, Brynden Tully","Lord Andros Brax, Forley Prester",1,Riverrun,The Riverlands,
Sack of Darry,298,7,Joffrey/Tommen Baratheon,Robb Stark,Lannister,,,,Darry,,,,win,pitched battle,0,0,,,Gregor Clegane,Lyman Darry,1,Darry,The Riverlands,
Battle of Moat Cailin,299,8,Balon/Euron Greyjoy,Robb Stark,Greyjoy,,,,Stark,,,,win,pitched battle,0,0,,,Victarion Greyjoy,,1,Moat Cailin,The North,
Battle of Deepwood Motte,299,9,Balon/Euron Greyjoy,Robb Stark,Greyjoy,,,,Stark,,,,win,siege,0,0,1000,,Asha Greyjoy,,1,Deepwood Motte,The North,
Battle of the Stony Shore,299,10,Balon/Euron Greyjoy,Robb Stark,Greyjoy,,,,Stark,,,,win,ambush,0,0,264,,Theon Greyjoy,,1,Stony Shore,The North,"Greyjoy's troop number based on the Battle of Deepwood Motte, in which Asha had 1000 soldier on 30 longships. That comes out to ~33 per longship. In the Battle of the Stony Shore, Theon has 8 longships, and just we can estimate that he has 8*33 =265 troops."
Battle of Torrhen's Square,299,11,Robb Stark,Balon/Euron Greyjoy,Stark,,,,Greyjoy,,,,win,pitched battle,0,0,244,900,"Rodrik Cassel, Cley Cerwyn",Dagmer Cleftjaw,1,Torrhen's Square,The North,Greyjoy's troop number comes from the 264 estimate to have arrived on the stony shore minus the 20 Theon takes to attack Winterfell. Thus 264-20=244
Battle of Winterfell,299,12,Balon/Euron Greyjoy,Robb Stark,Greyjoy,,,,Stark,,,,win,ambush,0,1,20,,Theon Greyjoy,Bran Stark,1,Winterfell,The North,"It isn't mentioned how many Stark men are left in Winterfell, other than ""very few""."
Sack of Torrhen's Square,299,13,Balon/Euron Greyjoy,Balon/Euron Greyjoy,Greyjoy,,,,Stark,,,,win,siege,0,1,,,Dagmer Cleftjaw,,1,Torrhen's Square,The North,
Sack of Winterfell,299,14,Joffrey/Tommen Baratheon,Robb Stark,Bolton,Greyjoy,,,Stark,,,,win,ambush,1,0,618,2000,"Ramsay Snow, Theon Greyjoy ","Rodrik Cassel, Cley Cerwyn, Leobald Tallhart",1,Winterfell,The North,"Since House Bolton betrays the Starks for House Lannister, we code this battle as between these two houses. Greyjoy men, numbering only 20, don't play a major part in the fighting and end up dying anyway."
Battle of Oxcross,299,15,Robb Stark,Joffrey/Tommen Baratheon,Stark,Tully,,,Lannister,,,,win,ambush,1,1,6000,10000,"Robb Stark, Brynden Tully","Stafford Lannister, Roland Crakehall, Antario Jast",1,Oxcross,The Westerlands,
Siege of Storm's End,299,16,Stannis Baratheon,Renly Baratheon,Baratheon,,,,Baratheon,,,,win,siege,1,0,5000,20000,"Stannis Baratheon, Davos Seaworth","Renly Baratheon, Cortnay Penrose, Loras Tyrell, Randyll Tarly, Mathis Rowan",1,Storm's End,The Stormlands,
Battle of the Fords,299,17,Joffrey/Tommen Baratheon,Robb Stark,Lannister,,,,Tully,,,,loss,pitched battle,0,0,20000,10000,"Tywin Lannister, Flement Brax, Gregor Clegane, Addam Marbrand, Lyle Crakehall, Leo Lefford","Edmure Tully, Jason Mallister, Karyl Vance",1,Red Fork,The Riverlands,
Sack of Harrenhal,299,18,Robb Stark,Joffrey/Tommen Baratheon,Stark,,,,Lannister,,,,win,ambush,1,0,100,100,"Roose Bolton, Vargo Hoat, Robett Glover",Amory Lorch,1,Harrenhal,The Riverlands,
Battle of the Crag,299,19,Robb Stark,Joffrey/Tommen Baratheon,Stark,,,,Lannister,,,,win,ambush,0,0,6000,,"Robb Stark, Smalljon Umber, Black Walder Frey",Rolph Spicer,1,Crag,The Westerlands,
Battle of the Blackwater,299,20,Stannis Baratheon,Joffrey/Tommen Baratheon,Baratheon,,,,Lannister,,,,loss,pitched battle,1,1,21000,7250,"Stannis Baratheon, Imry Florent, Guyard Morrigen, Rolland Storm, Salladhor Saan, Davos Seaworth","Tyrion Lannister, Jacelyn Bywater, Sandor Clegane, Tywin Lannister, Garlan Tyrell, Mace Tyrell, Randyll Tarly",1,King's Landing,The Crownlands,
Siege of Darry,299,21,Robb Stark,Joffrey/Tommen Baratheon,Darry,,,,Lannister,,,,win,siege,0,0,,,Helman Tallhart,,1,Darry,The Riverlands,
Battle of Duskendale,299,22,Robb Stark,Joffrey/Tommen Baratheon,Stark,,,,Lannister,,,,loss,pitched battle,1,0,3000,,"Robertt Glover, Helman Tallhart","Randyll Tarly, Gregor Clegane",1,Duskendale,The Crownlands,
Battle of the Burning Septry,299,23,,,Brotherhood without Banners,,,,Brave Companions,,,,win,pitched battle,0,0,,,,,1,,The Riverlands,
Battle of the Ruby Ford,299,24,Joffrey/Tommen Baratheon,Robb Stark,Lannister,,,,Stark,,,,win,pitched battle,0,0,,6000,Gregor Clegane,"Roose Bolton, Wylis Manderly",,Ruby Ford,The Riverlands,
Retaking of Harrenhal,299,25,Joffrey/Tommen Baratheon,,Lannister,,,,Brave Companions,,,,win,pitched battle,1,0,,,Gregor Clegane,Vargo Hoat,1,Harrenhal,The Riverlands,
The Red Wedding,299,26,Joffrey/Tommen Baratheon,Robb Stark,Frey,Bolton,,,Stark,,,,win,ambush,1,1,3500,3500,"Walder Frey, Roose Bolton, Walder Rivers",Robb Stark,1,The Twins,The Riverlands,"This observation refers to the battle against the Stark men, not the attack on the wedding"
Siege of Seagard,299,27,Robb Stark,Joffrey/Tommen Baratheon,Frey,,,,Mallister,,,,win,siege,0,1,,,Walder Frey,Jason Mallister,1,Seagard,The Riverlands,
Battle of Castle Black,300,28,Stannis Baratheon,Mance Rayder,Free folk,Thenns,Giants,,Night's Watch,Baratheon,,,loss,siege,1,1,100000,1240,"Mance Rayder, Tormund Giantsbane, Harma Dogshead, Magnar Styr, Varamyr","Stannis Baratheon, Jon Snow, Donal Noye, Cotter Pyke",0,Castle Black,Beyond the Wall,
Fall of Moat Cailin,300,29,Joffrey/Tommen Baratheon,Balon/Euron Greyjoy,Bolton,,,,Greyjoy,,,,win,siege,0,0,,,Ramsey Bolton,,0,Moat Cailin,The North,
Sack of Saltpans,300,30,,,Brave Companions,,,,,,,,win,razing,0,0,,,Rorge,,0,Saltpans,The Riverlands,
Retaking of Deepwood Motte,300,31,Stannis Baratheon,Balon/Euron Greyjoy,Baratheon,Karstark,Mormont,Glover,Greyjoy,,,,win,pitched battle,0,0,4500,200,"Stannis Baratheon, Alysane Mormot",Asha Greyjoy,0,Deepwood Motte,The North,

Here's a program that will convert the line endings of a text file from *nix ('\n') or windows ('\r','\n') to old mac ('\r'), in case someone wants to copy/paste the above text but would like to work with the original's line endings.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
#include <fstream>

int main() {
    std::ifstream fin("battles.csv", fin.in | fin.binary);
    std::ofstream fout("battles2.csv", fout.out | fout.binary);
    char ch;
    while (fin.get(ch)) {
        if (ch == '\r' || ch == '\n') {
            if (ch == '\r' && fin.peek() == '\n') fin.get();
            fout.put('\r');
        }
        else
            fout.put(ch);
    }
}

Last edited on
Here's a simple program that will parse a CSV file using a good old fashioned state machine. It handles quotes and quoted characters. It calls process() on each field value and calls newline() when it sees a new line.

A better implementation would be to put this in a class and make process() and newline() virtual methods.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
#include <iostream>
#include <string>

using std::string;
using std::cin;
using std::cout;
using std::cerr;

// Called when a token has been assembled
void process(const string &token)
{
    cout << "token: " << token << '\n';
}


// Called when a newline is seen
void newline()
{
    cout << "newline\n\n";
}


int
main()
{
    char ch;
    bool inQuote = false;
    string token;
    
    while (ch=cin.get(), cin.good()) {
	switch(ch) {
	case '\\':
	    token += cin.get();
	    if (!cin.good()) {
		cerr << "Warning: unterminated \\ at end of input\n";
	    }
	    break;
	case '"':
	    inQuote = !inQuote;
	    break;
	case ',':
	case '\n':
	    if (inQuote) {
		token += ch;
	    } else {
		process(token);
		token.clear();
		if (ch == '\n') {
		    newline();
		}
	    }
	    break;
	default:
	    token += ch;
	}
    }
    
    if (inQuote) {
	cerr << "Warning: unterminated string at end of input\n";
    }
    if (token.size()) {
	process(token);
    }
    return 0;
}

Topic archived. No new replies allowed.