popen or fopen freezes my code

Pages: 12
Hello,

I am using chrony to track the offset from my NTP. I have a code that writes this value to a csv. For some reason, after running for a couple hours (~4/5 hours), it freezes my program randomly. Its not consistent either. It'll freeze on a random iteration and at a random spot. My main code works without it. I have plenty of storage so that's not an issue. I've tried doing it via popen and fopen (as shown in my commented out code)

I feel like there's a memory leak going on but i have no idea where or how.

Any guidance would be appreciated.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
void ntpSync(){
	char ntp[]= <insert NTP IP>;
	FILE *jitter;
	char line[100];
	char *start;
	char *i;
	
	system("chronyc sources|cat > offset.txt");
	jitter = fopen("offset.txt","r");
//	jitter = popen("chronyc sources |cat", "r");

	fgets(line,100,jitter);
	while(!feof(jitter))
{	
	i=strstr(line,ntp);
	if(i!=NULL){
		start = strpbrk(line,"+-");	
		myfile<<start;	
	}

	fgets(line,100,jitter);
}

	fclose(jitter);
//     pclose(jitter);
}
Last edited on
So when your program 'freezes', do you see offset.txt continuing to grow in size with data from chronyc?

The "|cat" filters seem completely unnecessary in all the places you've used it.

Using feof() in the way you have is clunky.
1
2
3
4
5
6
while ( fgets(line,sizeof(line),jitter) ) {
  if ( strstr(line,ntp) ) {
    char *start = strpbrk(line,"+-");
    myfile<<start;	
  }
}

No because I am overwriting offset.txt every time instead of appending it.
the "|cat" filter is unnecessary due to a bug from chrony/linux that it wont output "chronyc sources" unless you do it (or are sudo).

I tried it as while(fgets(line,100,jitter) before, but I haven't tried the if(strstr(line,ntp)) so I'll try that. thanks!

Whats also odd is that it works on one identical computer but not the other.
Last edited on
If there is an error reading jitter, then feof() may never get set and the program will loop forever.

Did you run it in a debugger so you can see exactly where it's hanging? What makes you think the problem is in the code you've shown?

I assume line 25 is a typo and should be pclose(jitter); Edit: fixed in original post. See below.
Last edited on
I tried using a debugger and everything worked accordingly, it just "stopped." It hangs at random spots during each run (when I open my csv, itll freeze at random write parts).

I think the issue is with this code segment because when I comment out the call, it doesn't freeze.

and yes, that is a typo. I'll fix it in the original post. apologies.
> No because I am overwriting offset.txt every time instead of appending it.
Then I think your while loop needs rewind whenever it thinks it gets to the end.
https://www.cplusplus.com/reference/cstdio/rewind/

Or maybe, just do the whole open/read/close cycle each time.

Certainly leaving a single open handle on a file that something else is busy creating and deleting, only to create it again isn't something I would recommend.

Maybe it just stops when the file system decides it's worn out some particular inode and just moves your offset.txt to another part of the disk.

> the "|cat" filter is unnecessary due to a bug from chrony/linux that it wont output "chronyc sources"
I'm assuming you mean 'necessary' here.
Wait, so you're saying if you just type in
chronyc sources
at your normal user shell, nothing happens?



Then I think your while loop needs rewind whenever it thinks it gets to the end.

I dont think so? It does what it needs to do for the first couple thousand iterations (each iteration takes a second). It generally freezes about 4 hours in (which only further adds to my frustration since it makes it harder to debug)

I tried just doing popen so that it doesnt have to write to a file to no success either (the commented out segments).

at your normal user shell, nothing happens?

correct, there was a bug report on it that and adding |cat was a solution (im also a linux rookie here).
> I dont think so?
Don't confuse apparent early success with being bug-free.

The fact that it goes wrong at some point is telling you something is wrong with your approach.

> system("chronyc sources|cat > offset.txt");
> jitter = fopen("offset.txt","r");
The fact that you need '|cat' to work-around some issue is suspicious.

What if the real underlying problem is that chronyc actually spawns another process to run asynchronously to your program, causing system() to return before the file has been written?

I tried using a debugger and everything worked accordingly, it just "stopped." It hangs at random spots during each run (when I open my csv, itll freeze at random write parts).
What I mean is where is the program in the code when it freezes? You should be able to interrupt the program in the debugger when it hangs and see what it's doing.
The fact that you need '|cat' to work-around some issue is suspicious.
https://bugzilla.redhat.com/show_bug.cgi?id=1575002
it seems like its an acceptable response due to the bug call? (I cant disable SELinux due to rules so thats not an acceptable workaround unfortunately).

Don't confuse apparent early success with being bug-free.
very true, but when I run the code by itself, it still freezes which makes me think this is the main culprit

 
What if the real underlying problem is that chronyc actually spawns another process to run asynchronously to your program, causing system() to return before the file has been written?

Thats why I did the popen call to no success either :(


All my code does is essentially get the time, floors it, adds a second and waits until the half second before reading some registers. It also writes those values to a csv file and then waits until the next second starts.
Regarding cat, the bug says that chrony can't output to the console. Since you're directing output to a file or a pipe, it should work fine.

Have you checked yet in a debugger to see where the program is when it hangs?
I tried using gdb, but im not sure i was using it right. and from my understanding, it hangs at random times/is inconsistent.

if you could provide guidance on how to use gdb, I would appreciate it. I tried doing backtrace where it froze, but I'm not sure I'm doing it right. The output I got was

#0 0x00007ffff6c8b840 in __nanosleep_nocancel () from /lib64/libc.so.6
#1 0x00007ffff6c8b6f4 in sleep () from /lib64/libc.so.6
#2 0x00007ffff7560eb9 in std::this_thread::__sleep_for(std::chrono::duration<long, std::ratio<1l, 1l> >, std::chrono::duration<long, std::ratio<1l, 1000000000l> >) () from /lib64/libstdc++.so.6
#3 0x0000000000402ada in void std::this_thread::sleep_for<long, std::ratio<1l, 1000000000l> >(std::chrono::duration<long, std::ratio<1l, 1000000000l> > const&) ()
#4-6 removed due to character length
Doing a backtrace is a good idea, but what's this!?
> #4-6 removed due to character length

All that you've managed to show are the various levels of wrapper around 'sleep'.
The really interesting things are in the frames not shown.

gdb man page wrote:
backtrace
bt

Print a backtrace of the entire stack: one line per frame for all frames in the
stack.
You can stop the backtrace at any time by typing the system interrupt charac-
ter, normally Ctrl-c.
backtrace n
bt n

Similar, but print only the innermost n frames.
backtrace -n
bt -n

Similar, but print only the outermost n frames.
backtrace full
bt full
bt full n
bt full -n

Print the values of the local variables also. n specifies the number of frames to
print, as described above.
The names where and info stack (abbreviated info s) are additional aliases for
backtrace.
In a multi-threaded program, gdb by default shows the backtrace only for the current
thread. To display the backtrace for several or all of the threads, use the command thread
apply .
For example, if you type thread apply all backtrace, gdb will display the backtrace for all the threads; this is handy
when you debug a core dump of a multi-threaded program.


All that you've managed to show are the various levels of wrapper around 'sleep'.
and since there's nothing in the code you've shown that would cause a call to sleep(), it indicates that the problem is actually somewhere else. This is why it's so critical to verify where the problem occurs.
Doing a backtrace is a good idea, but what's this!?

It was just the tracing from main to my other code, I'll rerun it and share it.

Also, to just make sure I'm understanding, if I just run gdb with my code, wait until it freezes, ctrl+c, and then backtrace -n? Seems straightforward enough if I understand it right. Thanks. I'll report back when it freezes.

and since there's nothing in the code you've shown that would cause a call to sleep()


my code floors the current time figures out the next second. at the end of the loop, it updates the values.

so at the beginning of the loop, theres this:
1
2
3
nextTime = currTime + chrono::seconds(1); //incrementing for next time 
    nextTimeSeconds = chrono::duration_cast<chrono::seconds>(nextTime.time_since_epoch()).count();
    nextTimeMicroSeconds = nextTimeSeconds * 1000000;


and then at the end.

1
2
3
        currTime = nextTime; //comment out if testing with one sec
        iteration++; 
	this_thread::sleep_until(nextTime);


so from running the code, and backtracing, this is the output
#0  0x00007ffff5b4d85d in nanosleep () from /lib64/libc.so.6
#1  0x00007ffff5b4d6f4 in sleep () from /lib64/libc.so.6
#2  0x00007ffff6422eb9 in std::this_thread::__sleep_for(std::chrono::duration<long, std::ratio<1l, 1l> >, std::chrono::duration<long, std::ratio<1l, 1000000000l> >) () from /lib64/libstdc++.so.6
#3  0x0000000000404c5c in void std::this_thread::sleep_for<long, std::ratio<1l, 1000000000l> >(std::chrono::duration<long, std::ratio<1l, 1000000000l> > const&) ()
#4  0x0000000000404b92 in void std::this_thread::sleep_until<std::chrono::_V2::system_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >(std::chrono::time_point<std::chrono::_V2::system_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > > const&) ()
#5  0x0000000000403ba9 in resync() ()
#6  0x00000000004040b5 in main ()
Last edited on
sorry for the double post, but I hit the character limit.

I wasnt able to step into any of the outputs so I wasn't sure where or what to do from there. My code seems to be working fine? What is odd is that it runs fully on half the computers (2 of 4) but not these two. Two are identical computers too and only one works.
You should do the "thread apply all backtrace" command, because what you show is just the main thread sleeping.

Also, try a minimal program, such as
1
2
3
4
5
6
int main ( ) {
  while ( true ) {
    ntpSync();
    sleep(1);
  }
}
So I have essentially did that which is why I think ntpSync() is the culprit

this code freezes too
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
for(int j = 0; j < (runs); j++) //change this to change how many iterations the FPGA timestamp resyncs for
{
	nextTime = currTime + chrono::seconds(1); //incrementing for next time 
    nextTimeSeconds = chrono::duration_cast<chrono::seconds>(nextTime.time_since_epoch()).count();
    nextTimeMicroSeconds = nextTimeSeconds * 1000000;


	cout << "	Time cycle "<< chrono::duration_cast<chrono::microseconds>(currTime.time_since_epoch()).count() << " to "<< nextTimeMicroSeconds<<"\n\n";

        myfile<<iteration<<",";
       	myfile<< nextTimeSeconds<<","; //nextMicroSec
	//Add easy to read time here
//	writeTime();

	chrono::high_resolution_clock::time_point epochTime = chrono::high_resolution_clock::now();
	double epocTime = chrono::duration_cast<chrono::nanoseconds>(epochTime.time_since_epoch()).count();	
	epocTime /= 1E9;

	myfile<<fixed<<epocTime<<",";	

    this_thread::sleep_until(currTime + chrono::nanoseconds(500000000));    
    
ntpSync();

currTime = nextTime; //comment out if testing with one sec
        iteration++; 
	this_thread::sleep_until(nextTime);

	cout<<"Current time at end  :                            			 	"<< chrono::duration_cast<chrono::microseconds>(chrono::high_resolution_clock::now().time_since_epoch()).count()<<endl;


> that which is why I think ntpSync() is the culprit
You need to find evidence of things happening (or not happening).
You're not going to guess your way out of this.

Start by adding some instrumentation code to your function.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
int numSyncCalls = 0;
void ntpSync()
{
  numSyncCalls++;
  char ntp[] = <insert NTP IP >;
  FILE *jitter;
  char line[100];
  int numLinesRead = 0;

  system("chronyc sources|cat > offset.txt");
  jitter = fopen("offset.txt", "r");
  while ( fgets(line,sizeof(line),jitter) ) {
    ++numLinesRead;
    if ( strstr(line,ntp) ) {
      char *start = strpbrk(line,"+-");
      myfile<<start;
    }
  }
  fclose(jitter);
}


OK, so you press ctrl-c when it apparently locks up, and you get another nice stack trace of it sleeping, which is where it's going to be 99% of the time anyway.

But before you type 'continue', try 'print numSyncCalls'.
Make a note of the result, continue for say 20 seconds and ctrl-c again.
Now try 'print numSyncCalls' again.
Does it look like it ran as many times as you would expect, given the interval you let it run for?

Another useful command to try would be
break 19 if numLinesRead < 10
Where you
- replace 19 with the actual line number in your source file of the fclose(jitter) call.
- replace 10 with whatever the usual number of lines of text you expect to see in the output of cronyc

If you do hit this conditional breakpoint, then go and open up offset.txt in your text editor to find out what actually got written to the file, which you're presently not taking into account.


Last edited on
This is very difficult because you keep giving us little pieces of your code. Can you show the whole thing? Or can you reduce to problem to a program that's small enough to show. This is almost certainly something simple, but without the info, it's nearly impossible to tell.

For example, maybe one of the variables is overflowing. But since you haven't shown us the type nextTime, or nextTimeSeconds, or currTime, it's impossible to know.

My guess is that by the time you call this_thread::sleep_until(nextTime);, nextTime has already passed and sleep_until() doesn't account for that possibility.
Pages: 12