Create matrix of chars

Pages: 12
Hi there,

I have been trying to do a function that can read in a certain type of file, this file contains data on genptypes and therefore has a number of columns with single letters denoting the genotype (the take away message is that these are chars).

So therefore I have created this struct that can store this type of file:
1
2
3
4
5
6
7
8
9
10
11
12
typedef struct{
  char** familyID; 
  char** individualID; 
  char** paternalID; 
  char** maternalID; 
  int* sex;
  double* phenotype;
  char** genotypes;
  int sites;
  int individuals;
  
}pFile;


In order to store all these chars, I thought of creating a matrix for them, as I have n lines with m chars. So I am used to this approach of a pointers to an array of pointers, which I have done previously for double and int. Like this:
1
2
3
4
5
6
7
char** allocChar(int x,int y){
  char** ret = new char*[x]; (LINE 25)
  for(int i=0;i<x;i++){
    ret[i] = new char[y]; 
  }
  return ret;
}


But then when I try and allocate this matrix I get this error:
1
2
3
4
5
6
7
8
9
10
11
ped.genotypes = allocChar(lines,columns-6); (LINE 148)

==2536== Invalid read of size 8
==2536==    at 0x401060: dallocChar (readPlink.cpp:36)
==2536==    by 0x401060: dallocPFile(pFile&) (readPlink.cpp:90)
==2536==    by 0x400D6D: main (readPlink.cpp:250)
==2536==  Address 0x5a234a0 is 0 bytes after a block of size 80 alloc'd
==2536==    at 0x4C2B800: operator new[](unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==2536==    by 0x40148C: allocChar (readPlink.cpp:25)
==2536==    by 0x40148C: readPFile(char const*) (readPlink.cpp:148)
==2536==    by 0x400CDA: main (readPlink.cpp:233) 



So why is this? And is this a viable way of doing a matrix of chars?

And please do say if anything is unclear.
Last edited on
closed account (48T7M4Gy)
It would be a good idea to show us a small sample of the data file that is being read in.

PS Are you sure pedFile is a good naming choice? It's not funny if that was the intent.
Last edited on
Hi,

It is a .ped file from this program:

http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml#ped

And the format looks like this:

FAM001 1 0 0 1 2 A A G G A C
FAM001 2 0 0 1 2 A A A G 0 0

Why on earth would I try to be funny about that?? That would be quite sick...
(But ok I have changed this, I had not really noticed this unfortunate coincidence)
closed account (48T7M4Gy)
So, I am guessing you have many lines of this type of coding.

I think your matrix idea is perhaps flawed/too complicated and a vector or array of structs would be better. This means the struct would be made up of string, int, int, ... char, char corresponding to each element in each row. Maybe string followed by chars depending on what you plan to do with the data.

The whole array of lines is just a single dimensioned array/vector. The profusion of pointers doesn't seem necessary. Columns are just a formatting artefact, nothing special as far as data structure is concerned.

http://www.cplusplus.com/doc/tutorial/structures/
Last edited on
But surely it must be possible?

And my allocChar does produce a matrix of char like structure right?

And thanks a lot for the feedback!

PS the file might have more columns the first 6 are always there but from 7 and on there might be n number of columns.
Last edited on
Since you're writing a C++ program why all the pointers. Why not just use c++ strings and std::vector instead of the pointers.

1
2
3
4
5
6
7
8
9
10
11
struct Pfile{
  std::string familyID; 
  std::string individualID; 
  std::string paternalID; 
  std::string maternalID; 
  int sex; // male = 1 female = 2 anything else unknown.
  int phenotype; // -9 unknown,  0 missing, 1 unaffected, 2 affected
  std::vector<char> genotypes;
  int sites;
  int individuals;
};
Last edited on
The way of the pointers is the way I was taught C++, it sure makes the program run faster right?

And could you point to an example of how to create a matrix structure with std::vector?
closed account (48T7M4Gy)
A matrix using vectors is simply a vector of vectors. Suggest you google it because it is stock Stroustrup stuff. Same principle as a 2d matrix of chars which is where you us double pointers but you're asking for problems.
closed account (48T7M4Gy)
You can store the Pfile objects as they are read in as elements in a vector of Pfile objects and within each object is a vector of genotype as described above. All very simple, manageable, extendible and direct.
Last edited on
closed account (48T7M4Gy)
Pointers don't make your program run faster especially with what you have divulged so far.
The way of the pointers is the way I was taught C++, it sure makes the program run faster right?

Probably not faster, but is is definitely much more error prone.

And could you point to an example of how to create a matrix structure with std::vector?

kemort already explained how to go about this, you would have a vector of your structure to hold each record (line). This structure has a vector to hold the genotypes.

By the way the structure I provided is probably incorrect, the sex and phenotype probably should be strings.

closed account (48bpfSEw)
So far I understood you are allocating memory for a 2D matrix:

1
2
3
4
5
6
7
char** allocChar(int x,int y){
  char** ret = new char*[x]; (LINE 25)
  for(int i=0;i<x;i++){
    ret[i] = new char[y]; 
  }
  return ret;
}



Why do you allocate for each row and for each char of the rows dynamically memory?

I would do that:

1
2
3
char** allocChar(int x,int y){
  return (char**) malloc (x*y);
  }


It's the same result. Don't forget to clean up the allocated memory.
It's the same result.

No, not really the same, similar maybe. And you really should be using new/delete in a C++ program.

With the second snippet you loose the ability to use array notation (allocChar[x][y]) to access the array elements.
closed account (48bpfSEw)
thanks jlb for the clarification.


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
#include <stdio.h>
#include <iostream.h>


/*** compiles but does not work!!!

int main(void) {

  int xDim = 10;
  int yDim = 11;
  char** ptr = (char**)malloc (xDim * yDim);
  
  for (int x=0;x<xDim;x++) {
    for (int y=0;y<yDim;y++) {
      ptr[x][y] = (char) (x*y);
      std::cout << (int) ptr[x][y] << " ";
      }
    std::cout << std::endl;
    }

  delete []ptr;
  return 0;
  }
***/



// works:

int main(void) {

  int xDim = 10;
  int yDim = 11;
  char* ptr = (char*)malloc (xDim * yDim);
  
  for (int x=0;x<xDim;x++) {
    for (int y=0;y<yDim;y++) {
      ptr[x*xDim + y] = (char) (x*y);
      std::cout << (int) ptr[x*xDim + y] << " ";
      }
    std::cout << std::endl;
    }

  delete []ptr;
  return 0;
  }
So then why would you suggest the second "working" snippet? IMO ptr[x*xDim + y] is much harder to read as is much more error prone than ptr[x][y].

Also your "working" snippet is also incorrect. You are trying to delete a pointer which was not allocated with new.

Watch out for those C style casts, if you must use a cast in a C++ program you should be using the proper C++ style cast (ie: static_cast<char>()) to maintain type safety. Those C style casts are not necessarily type safe. Also be careful with char cast, you could easily overflow the bounds of the type, which could lead to Undefined Behavior.

You really need to come into this century and find and download a more modern compiler. Any compiler that accepts <iostream.h> should be abandoned. And stop using malloc() in C++ programs, except in very very rare circumstances.

closed account (48bpfSEw)
I understand now and agree with all your words!
thank you again, jlb!

Ok I guess I will construct a vector of vectors, for my char matrix. So thanks for the help! :)

And also thanks a lot for the swifts responses, most appreciated.

But one final question, that I am a bit befuddled with still, should it be possible to construct a char matrix the way I devised? And if so is the function I constructed proper?
Last edited on
Ok so actually the issue seemed to be, that I was printing my char Matrix, in order to see if I had read in the data correctly, thus:

fprintf(stdout,"First geno: %s\n",ped.genotypes[0][0])

Instead it should be %c for printing a char:

fprintf(stdout,"First geno: %c\n",ped.genotypes[0][0])

So is this because the char has no null termination character ('\0')? And the fprintf then goes out of bonds when trying to print a char as a string?
fprintf(stdout,"First geno: %s\n",ped.genotypes[0][0])

Why are you using fprintf()? You seem to be writing a C++ program so you really should be using the C++ streams instead of the more error prone C-stdio functions.

So is this because the char has no null termination character ('\0')? And the fprintf then goes out of bonds when trying to print a char as a string?

Yes, and remember using the incorrect specifier for the type invokes Undefined Behavior. If you were using C++ streams you wouldn't be having this kind of issue because the C++ streams know the variable type unlike the error prone C-stdio functions.

So when you say C++ streams are you thinking of cout, cerr and all that?

And once again thanks a lot for the help! :)
Pages: 12