Trying to analyze the bible

So I'm trying to make a program that will analyze a text document and output the most frequent words. I'm using visualstudio 2011. The file paths are correct, I have double checked them. I'm getting vector: subscript out of range. I realize that normally this would mean I'm trying to access an element of a vector that doesn't exist, but I do not think that is the case, as I have checked over this several times. Is there some sort of maximum size on a vector? If so, how do I bypass this?


#include <iostream>
#include <fstream>
#include <string>
#include <cstring>
#include <vector>
#include <array>
#include <sstream>
#include <cctype>

using namespace std;

void top_X(vector <int> &freq_vect, int num_words, vector <string*> &name_vector)
{
int tempfreq;
string* tempname;
for (int i = 0; i < num_words; i++)
{
for (int h = i; h < num_words; h++) // searches freq_vect for the largest frequency, then stores that into the top_ten array
{
if (freq_vect[i] < freq_vect[h])
{
tempfreq = freq_vect[i];
freq_vect[i] = freq_vect[h];
freq_vect[h] = tempfreq;
tempname = name_vector[i];
name_vector[i] = name_vector[h];
name_vector[h] = tempname;
}
}
}
}

int main()
{
int desired_outputs = 10;
ifstream in_file;
in_file.open("C:\\bible.txt"); //get file
string line; //process file into individual words
vector <string*> name_vector;
vector <int> freq_vect;
int* freq_point;
int num_words = 0;
string tmpstr;
bool duplicate = false;

while (in_file >> tmpstr) // analyzes document and stores words once each in a vector of string*
{
duplicate = false;
string newstr = "";
for (int strpos = 0; strpos < tmpstr.length(); strpos++)
{
if (isalpha(tmpstr[strpos]))
{
char foo = tolower(tmpstr[strpos]);
newstr = newstr + foo;
}
}

for (int i = 0; i < num_words; i++)
{
if (newstr == *name_vector[i])
{
duplicate = true;
freq_vect[i]++;
}
}

if (duplicate == false)
{
string* heapstr = new string;
*heapstr = newstr;
name_vector.push_back(heapstr); //use vector of string* to store word addresses
freq_vect.push_back(1); //use vector of integers to store word frequencies
num_words++;
}
}

top_X(freq_vect, num_words, name_vector); //calls organizing function
ofstream out_file;
out_file.open("C:\\top_bible_words.txt");
for (int j = 0; j < 100; j++) // displays top ten words
{
cout << j+1 << ": " << *name_vector[j] << "; " << freq_vect[j] << endl;
}
for (int j = 0; j < 100; j++) // displays top ten words
{
out_file << j+1 << ": " << *name_vector[j] << "; " << freq_vect[j] << endl;
}
system("pause");
return 0;
}
First, edit your post so it uses code tags - the <> button on the right

IMO, a vector is not a good container to use when there is lots of data. If there are lots of words to count, then you might be better with a <map> or <set>

A map stores a look up value (in your case the word) and some other value (the count).

A set stores objects that are always sorted. You could make a class which stores a word and a count. Put the object created form this class into the set.

Both of these STL containers are very efficient at finding values.

Hop all goes well.
I believe you are dereferencing a pointer before allocating memory if (newstr == *name_vector[i])
Topic archived. No new replies allowed.