Reading from csv file

I am trying to read from an excel file, then only print out the unique values. My code, however, prints out all values. This code works when using an istringstream of letters so why not when I'm reading in from a file. What do i have to change to
make it work?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <cstdio>
#include <vector>
#include <algorithm>

using namespace std;

string trim(const string& s)
{
	if (s.empty())
		return "";

	size_t i = 0;
	while (s[i] == ' ' && i < s.size())
		i++;

	return s.substr(i);
}

void convert(string& s)
{
	for (int i = 0; i < s.length(); i++)
	{
		s[i] = tolower(s[i]);
	}
}

int main()
{
	ifstream src("sample.csv"); {
		if (!src)
		{
			perror("Error opening file sample.csv: ");
			return -1;
		}

		std::vector<std::string> names;
		string name;

		//std::istringstream name;
		//std::istream& src = name;

		while (getline(src, name, ','))
			names.push_back(name);

		std::sort(names.begin(), names.end());

		std::size_t i = 0;
		while (i < names.size() - 1)
		{
			if (names[i] != names[i + 1])
				std::cout << names[i] << '\n';

			while (i < names.size() - 1 && names[i] == names[i + 1])
				++i;

			++i;
		}

		if (i == names.size() - 1) std::cout << names[i] << '\n';
	}
}
Your code seems unnecessarily complicated to achieve a trivial task:
http://stackoverflow.com/questions/1041620/whats-the-most-efficient-way-to-erase-duplicates-and-sort-a-vector

And there are 2 functions trim() and convert () that don't seem to have any use
For unique words you better use a std::set, not a vector.
http://www.cplusplus.com/reference/set/set/
What happened to this part,
Anon99 wrote:
ie if sam appears twice I want to discard this name entirely.

http://www.cplusplus.com/forum/general/205684/
Does that still apply or was it a misunderstanding of the requirements?
The reason for the convert is because I have sam and SAM and I need to get rid of both. This is correct however with the code I have at the minute it just prints out all the values and doesn't remove the two sams.
The reason for the convert is because I have sam and SAM and I need to get rid of both.

So where do you call convert()? Hint: You should be calling convert() at line 47.

with the code I have at the minute it just prints out all the values and doesn't remove the two sams.

So where do you remove the duplicate entries?
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
int main()
{   ifstream src("sample.csv"); 
	if (!src)
	{   perror("Error opening file sample.csv: ");
		return -1;
	}

	std::vector<std::string> names;
	string name;
    vector<string>::iterator    iter;
    
	while (getline(src, name, ','))
	{   convert (name);
		names.push_back(name);
    }		
	std::sort(names.begin(), names.end());
	std::size_t i = 0;
	while (i < names.size())
	{   iter = names.begin() + i;
             if (i < names.size() -1 && names[i] == names[i+1])
	    {   names.erase (iter+1);
	        names.erase (iter);
	        continue;   //  Don't increment i
	    }
		std::cout << names[i] << '\n';
		i++, iter++;
	}
}



@AbstractionAnon,

wouldn't it make more sense to check if a name is in the vector before inserting it?
Sorting and removing doesn't sound efficient.
The reason for the convert is because I have sam and SAM and I need to get rid of both


try this ...
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
#include <iostream>
#include <locale>
#include <vector>
#include <algorithm>
#include <string>

bool caseCompare (std::string& lhs, std::string& rhs)
{
        std::locale loc;
        std::string left{}, right{};
        for (size_t i = 0; i < lhs.size(); i++)
        {
            left  += tolower(lhs[i], loc);
        }
        for (size_t i = 0; i < rhs.size(); i++)
        {
            right += tolower(rhs[i], loc);
        }
        lhs = left;
        rhs = right;
        return lhs < rhs;
}
int main()
{
    std::vector <std::string> vec { "john", "John", "Mary", "james", "Charles", "chaRles", "Julia"};
    std::sort(vec.begin(), vec.end(), caseCompare);
    vec.erase(std::remove_if(vec.begin(), vec.end(), [&](const std::string x)
            {return (std::count(vec.begin(), vec.end(), x) > 1);}), vec.end());
    for (auto& elem : vec)
    {
        std::cout << elem << " ";
    }
}

Output
 
james julia mary 
Last edited on
Thomas1965 wrote:
wouldn't it make more sense to check if a name is in the vector before inserting it? Sorting and removing doesn't sound efficient.

This is a beginners exercise with only a few names, so I doubt efficiency is an issue. Scanning the vector for duplicates on every insertion is probably less efficient that sorting it.

However, you make a good point. With efficiency in mind, I would not use a vector at all.
I would use a std::set. Not clear if the OP has covered std:set.
Thomas, Anon - you're interpreting it somewhat differently to OP's statements - we don't check if a name is already present in a container to decide whether or not to insert a newly arrived name but rather if a name is already in the container and another one arrives that is exactly the same (case insensitive) as one within then we take out the one that is already IN the container.

The reason for the convert is because I have sam and SAM and I need to get rid of both
Last edited on
So I took your advice and started using sets and this works removing the duplicate but it still doesn't remove both. Is there any way to amend this so that it will erase both from the set?

1
2
3
4
5
6
7
8
if (seenAlready.find(name) != seenAlready.end())
		{
			continue;
		}
		else
		{
			seenAlready.insert(name);
		}

Last edited on
OP: your last post seems to suggest that you might have missed my earlier one (I wish cplusplus would number the posts on a thread like cboard does for eg) sent about 2.5 hrs back. Take a look
I am trying to read from an excel file, then only print out the unique values. My code, however, prints out all values. This code works when using an istringstream of letters so why not when I'm reading in from a file. What do i have to change to
make it work?

It doesn't work, because your requirements are different here than they were in the previous thread, given subsequent posts in this thread.


Modifying the original code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <cstdio>
#include <vector>
#include <algorithm>

using namespace std;

string trim(const string& s)
{
    if (s.empty())
        return "";

    size_t i = 0;
    while (s[i] == ' ' && i < s.size())
        i++;

    return s.substr(i);
}

void convert(string& s)
{
    for (int i = 0; i < s.length(); i++)
    {
        s[i] = tolower(s[i]);
    }
}

std::string sanitize(std::string s) {
    s = trim(s);
    convert(s);
    return s;
}

int main()
{
    std::istringstream stream{ "SAM, saM, Grace, MaRy, MarY, julio" };
    // std::ifstream stream{ "sample.csv" };
    if (!stream)
    {
        perror("Error opening file sample.csv: ");
        return -1;
    }

    std::istream& src = stream;

    std::vector<std::string> names;
    string name;

    //std::istringstream name;
    //std::istream& src = name;

    while (getline(src, name, ','))
        names.push_back(sanitize(name));

    std::sort(names.begin(), names.end());

    std::size_t i = 0;
    while (i < names.size() - 1)
    {
        if (names[i] != names[i + 1])
            std::cout << names[i] << '\n';

        while (i < names.size() - 1 && names[i] == names[i + 1])
            ++i;

        ++i;
    }

    if (i == names.size() - 1) std::cout << names[i] << '\n';
}


The primary change being the introduction of the utility function sanitize which utilizes trim and convert to produce a sanitized version of a name, and using that function on line 56. You can comment line 39 and uncomment line 40 to use your file for input. That change was done so I could quickly test the code.

In essence, this is what AbstractionAnon recommended in his first post in this thread.

Note that this code does not actually remove the duplicate names from the container, it just skips them when producing output. It's still not clear to me if that's what you were trying to accomplish or not, but it should be fairly straightforward to populate a new container using the same logic - just add to the container where a unique name is being fed to the output stream currently.
Just seen it there now, this does not work when I try to read in from my csv file though.
That would be a file reading problem then as long as you've got your std::vector<std::string> set up
edit: assuming OP's post above was reply to me and not @cire, of course
Last edited on
1
2
3
4
5
6
7
8
9
if (seenAlready.find(name) != seenAlready.end())
		{
			seenAlready.erase(name);
			continue;
		}
		else
		{
			seenAlready.insert(name);
		}


why does this keep the first name from my duplicate. How, using sets, do I erase both?
Last edited on
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
#include <iostream>
#include <string>
#include <cctype>
#include <vector>
#include <set>
#include <algorithm>
#include <iterator>
#include <sstream>
#include <iomanip>

std::string trim( std::string str )
{
    while( !str.empty() && std::isspace( str.back() ) ) str.pop_back() ;

    std::size_t beg = 0 ;
    while( beg < str.size() && std::isspace( str[beg] ) ) ++beg ;

    return str.substr(beg) ;
}

std::string to_lower( std::string str )
{
   for( char& c : str ) c = std::tolower(c) ;
   return str ;
}

std::string sanitise( std::string str ) { return to_lower( trim(str) ) ; }

std::vector<std::string> get_unique_values_from( std::istream& stm, char delimiter = ',' )
{
    std::vector<std::string> unique_values ;

    std::set<std::string> all_values ;
    std::set<std::string> duplicated_values ;

    std::string token ;
    while( std::getline( stm, token, delimiter ) )
    {
        const std::string stok = sanitise(token) ;
        if( stok.empty() ) continue ; // ignore empty tokens
        if( !all_values.insert(stok).second ) // if it is a duplicate
            duplicated_values.insert(stok) ;
    }

    // http://en.cppreference.com/w/cpp/algorithm/set_difference
    // http://en.cppreference.com/w/cpp/iterator/back_inserter
    std::set_difference( all_values.begin(), all_values.end(),
                         duplicated_values.begin(), duplicated_values.end(),
                         std::back_inserter(unique_values) ) ;
    return unique_values ;
}

int main()
{
    std::istringstream file{ "SAM, saM , , Grace, MaRy , Anon99  ,  ,, MarY, julio" };

    for( std::string str : get_unique_values_from(file) )
        std::cout << std::quoted(str) << '\n' ;
}

http://coliru.stacked-crooked.com/a/e45bbb79ba84c732
http://rextester.com/SET43127
Topic archived. No new replies allowed.