SOCK_STREAM ENOMEM Help

Hi,

I am trying to understand what would cause an ENOMEM error from a 'recv(fd, buf, sizeof(buf), 0);' command on a AF_INT, SOCK_STREAM socket, fd. I would be grateful if anybody could help. I have a client application that (seemingly) randomly triggers this error, which causes repeated disconnect/reconnect cycles to the unending server data stream. I would like to make a simple client streaming prog to repeat the ENOMUM error, but I am not sure I fully understand its cause.

Documentation says it means "Could not allocate memory for recvmsg()". Since its a streaming socket, my understanding is that 'recv' will put the pending data into buf. If there is a big queue of data waiting to be read, and the read cycle is slow, I guess a queue of unread data will grow. Is it the inability to allocate memory for this queue which triggers ENOMEM ?

If so, is there a simple example which would force a 'ENOMEM' error from a streaming socket recv call? I have tried looking at limiting the size of SO_RCV_BUF via setsockopt(), having a slow 'recv' cycle and monitoring the queue size via 'ioctl(fd, SIOCINQ, &QueueSize)'. The queue size seemed to grow until the size set via setsockopt(), then reduce by sizeof(buf), before growing again. I thought once the SO_RCV_BUF limit was reached, it would trigger ENOMEM, but nothing happened.

Any helpful insights would much appreciated.
I presume you are looping round when calling recv(...) and checking for when you have reached the end of the data being returned from the socket? Are you allocating memory on the heap and also freeing it? Some code in which you are getting the error would help us to help you further.
I am trying to understand what would cause an ENOMEM error from a recv
Out of memory.
http://pubs.opengroup.org/onlinepubs/009695399/functions/recv.html

But really, you need to post your code so we can see what else is going on. For example, it's pointless showing variables unless we can see the declaration, you could have corrupted the heap (pretty likely). ...

It might help if you expained the circumstances in which it occurs; does it happen after a while or immediately ...

Hi thanks for the reply. Yes, I loop when calling recv(...), I'm not sure about checking for end of data or heap allocation / freeing as I did not write the API, but I assume that is ok since the process runs without issue or ENOMEM error for many hours.

Unfortunately I don't have a concise code snippet that repeats the error since, as i mentioned its part of an API that I didn't write, so i just 'summarised' the looping process below as best I can. As far as I can tell, the socket exception - cannot allocate memory, is triggered immediately after the 'recv' call. I understand that without the exact code snippet that causes the problems it is difficult to assess, hence I was wondering for any given socket client, what could cause such an ENOMEM error, (thus things I could look out for) or is there any simple example socket recv code snippet available which repeats this ENOMEM error (I had tried but failed to produce one) ? Or indeed am I approaching the problem in the wrong way ?

It is a seemingly random occurance - on startup the application runs fine for many hours - suddently (and inexplicably to me) an ENOMEM error is triggered - the application disconnects from server, destroyies / recreates a new socket and auto-reconnects - sometimes it may continue without issue for another few hours, or sometimes it may repeatedly trigger ENOMEM error immediately upon the new socket reconnect.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70

 BytesVec m_inBuffer;
  char buf[8192];

  // starting to connect to server
  m_fd = socket(AF_INET, SOCK_STREAM, 0);
  struct sockaddr_in sa;
  memset( &sa, 0, sizeof(sa));
  sa.sin_family = AF_INET;
  sa.sin_port = htons( port);
  sa.sin_addr.s_addr = inet_addr( host);

  // try to connect
  connect( m_fd, (struct sockaddr *) &sa, sizeof( sa)))

for(;;)
{
  int nResult = ::recv( m_fd, buf, sz, 0);
  if( nResult > 0 && handleSocketError()) 
  {
    m_inBuffer.insert( m_inBuffer.end(), &buf[0], &buf[0] + nResult);
  }
  const char* beginPtr = &m_inBuffer[0];
  const char*	ptr = beginPtr;
  const char*	endPtr = ptr + er.size();
  try 
  {
    while( (m_connected ? processMsg( ptr, endPtr) : processConnectAck( ptr, endPtr)) > 0) 
    {
      if( (ptr - beginPtr) >= (int)er.size())
	break;
    }  
  }
  
  catch (...) 
  {
    CleanupBuffer( er, (ptr - beginPtr));
    throw;
  }
  CleanupBuffer( er, (ptr - beginPtr));
}

bool handleSocketError()
{
  // no error
  if( errno == 0)
	return true;

  // Socket is already connected
  if( errno == EISCONN) {
	return true;
  }

  if( errno == EWOULDBLOCK)
	return false;

  if( errno == ECONNREFUSED) 
  {
  }
  else 
  {
      // "Socket exception: Cannot allocation memory" appears here
      cout << strerror(errno) << endl;
  } 
  // reset errno
  errno = 0;
  // Disconnect socket
  eDisconnect();
  return false;
}
recv(...)

Will return the number of bytes returned by the socket which the buffer argv will hold, when the connection is closed and all the data has been passed back it will return 0, and when it returns a value < 0 this flags an error which you need to handle. So instead of having a continuous for(;;) loop you should have a do{...} while (nResult > 0);. Neither have you initialised the buffer (var named 'buf') and where is sz initialised? So, I would suggest that you cater for when the recv(...) returns 0.
Last edited on
I can't see where sz is initialised.
Thanks for the interest. Yes, thanks for comments about the posted code - as mentioned unfortunately i do not have a simple snippet which repeats the error, its part of a heavily inherited multiclass API, thus that posted is not the verbatim code that is implemented and is likely of limited diagnostic use, sorry i forgot to add that 'sz = sizeof(buf)'. The stream never ends, hence the return 0 trigger is not really looked for, whatever data is available is read (up to sizeof(buf)), and sent to processMsg(), as far as I can see.

Is it possible to create a code snippet that reads a socket and triggers the ENOMEM system error after some time (to mimic my applications problem) ? As I mentioned in my initial post, i tried to limit the socket buffer size SO_RCV_BUF, but that wouldn't trigger the ENOMEM error. If ENOMEM could be triggered from a "corrupted heap" as mentioned, I guess it is possible that the error could (horrific) lay anywhere within the entire code and may not be related to 'recv' or the socket at all ? Why does any memory need to be allocated at all for a socket read ?, i thought the 'buf' has already allocated a place to put the data. Is the socket allocating memory for the queued data that is waiting to be read ? and at some stage its inability to do so, triggers the ENOMEM error ?
Well, if you are expecting that the recv(...) to never get to the end of the data-stream then by catering for it returning 0 should not cause a problem, but not catering for it returning a 0 and it ACTUALLY returning 0 would be a major problem and may cause such an error that you are encountering. Are you able to debug this "library" you refer to and locate where the error is being generated? Or do you not have access to this code?
Oh sorry, didn't realise that could trigger an ENOMEM. Upon further looking i see that it does in fact cater for a zero return. The lowest level function for the socket read is below. I have access to the code yes, its just lots of wrappers for reading the datastream. After a random amount of time, the handleSocketError reports a 'socket expection - cannot allocate memory', and kills the socket. But if it is heap corruption - it means the cause may not necessarily be the socket recv right ? Is there any socket related flags I could print upon the error to pinpoint why it says cannot allocate ?

1
2
3
4
5
6
7
8
9
10
11
12
int ClientSocket::receive(char* buf, size_t sz)
{
	if( sz <= 0) return 0;
	int nResult = ::recv( m_fd, buf, sz, 0);
	if( nResult == -1 && !handleSocketError()) {
		return -1;
	}
	if( nResult <= 0) {
		return 0;
	}
	return nResult;
}
Despite the kind interest shown previously, I have yet to make any progress. In summary my questions were:

1. Exactly what memory, and why does a socket allocate any memory at all on reading, is referred to by a socket returning ENOMEM error ? The documented "out of memory", to me, is not very informative. When i see the error in my system, there is ample memory available as far as I can see (via 'top').

2. Is it possible to re-create a code snippet that will force an ENOMEM error from a socket ? is there only 1 cause of such an error, or possibly many ?

3. As hinted by a contributor above, "heap corruption" maybe a possible cause of this error. Any more expansion on this would be much appreciated. My applications memory consumption is stable over many continuous hours of operation.

Thanks.

1. Exactly what memory, and why does a socket allocate any memory at all on reading, is referred to by a socket returning ENOMEM error ? The documented "out of memory", to me, is not very informative. When i see the error in my system, there is ample memory available as far as I can see (via 'top').
Why don't you just look at the source? Linux is Open Source right?

From the linux kernel file net/ipv4/tcp_input.c from https://github.com/torvalds/linux
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
int tcp_send_rcvq(struct sock *sk, struct msghdr *msg, size_t size)
{
    struct sk_buff *skb = NULL;
    struct tcphdr *th;
    bool fragstolen;

    if (size == 0)
        return 0;

    skb = alloc_skb(size + sizeof(*th), sk->sk_allocation);
    if (!skb)
        goto err;

    if (tcp_try_rmem_schedule(sk, skb, size + sizeof(*th)))
        goto err_free;

    th = (struct tcphdr *)skb_put(skb, sizeof(*th));
    skb_reset_transport_header(skb);
    memset(th, 0, sizeof(*th));

    if (memcpy_fromiovec(skb_put(skb, size), msg->msg_iov, size))
        goto err_free;

    TCP_SKB_CB(skb)->seq = tcp_sk(sk)->rcv_nxt;
    TCP_SKB_CB(skb)->end_seq = TCP_SKB_CB(skb)->seq + size;
    TCP_SKB_CB(skb)->ack_seq = tcp_sk(sk)->snd_una - 1;

    if (tcp_queue_rcv(sk, skb, sizeof(*th), &fragstolen)) {
        WARN_ON_ONCE(fragstolen); /* should not happen */
        __kfree_skb(skb);
    }
    return size;

err_free:
    kfree_skb(skb);
err:
    return -ENOMEM;
}


2. Is it possible to re-create a code snippet that will force an ENOMEM error from a socket ? is there only 1 cause of such an error, or possibly many ?
You haven't shown us all the relevant code, you're pretty much on your own until then.

3. As hinted by a contributor above, "heap corruption" maybe a possible cause of this error. Any more expansion on this would be much appreciated. My applications memory consumption is stable over many continuous hours of operation.
You could try running a heap checker like valgrind.


EDIT: Thinking more about it, that code is probably for send. The the principle is sound; when in doubt, look at the source.
Last edited on
Thanks, I believe the problem has been solved (at least 10 continuous hours thus far of streaming without any socket enomem error), very much so as a result of this forum and the 'heap corruption' comment, and as it turns out, nothing to do with socket recv.

If of any interest / use to anybody, the only part of the application looking out for system errors was the socket handling, thus naturally an ENOMEM would show up there - obvious likely to most but not to me, actually any part of the application could have been screwing up the memory long before the socket tried to read. I put in ENOMEM checks all over the application and waited for one to pop up. Essentially the problem came from the below code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
class Streamer
{
  vector< vector<data> > DataStreams;  // Data container for each stream
  void RecvSocketData(); // Read once from socket and append any data to relevant stream
};

int main()
{
  Streamer myStreamer;
  vector< vector<data> > ProcessStreams;

  // Init number of data and process streams
  vector<data> Stream;
  for(int num = 0; num < 10; num++)
  {
    myStreamer.DataStreams.push_back(Stream);
    ProcessStreams.push_back(Stream);
  }
 
  for(;;)
  {
     // Update process streams from streamer
     ProcessStreams = myStreamer.DataStreams;
   
     // Get latest stream data
     myStreamer.RecvSocketData();

     // Do whatever to ProcessStreams
   }
 } 
}


After a few hours, an ENOMEM was triggered after 'ProcessStreams = myStreamer.DataStreams'. I don't really know why, its sequential, no multiple access, no mutexing required, and it only caused an issue with the socket reading (simply because it was told to disconnect on ENOMEM). It seemingly did not effect the data containers, subsequent processing, storage, results, display, no noticeable memory leak, no segmentation, no crash, all I had to do was create a new socket, reconnect and it would carry on without issue until the next ENOMEM.

Anyways, thanks again, I learned quite a bit.
Do you really mean to copy all those 2D vectors?
Unfortunately, yes i do.

Stupid and unnecessary that it is, any idea as to why doing such a copy would screw up the heap ? or even then, its very indeterminate, once the 2D vector copy generated an ENOMEM and a subsequent socket read error was triggered, I would create a new socket and reconnect. Following that, there may be no ENOMEM triggered for another few hours. I'd have assumed once ENOMEM was triggered after the 2D copy, it cannot 'fix' itself, but seemingly creating a new socket did temporarily fix it. I don't really understand it, but the issue certainly seems fixed now after I removed the 2D vector copy.
Copying stuff shouldn't corrupt the heap.

I take it data doesn't have pointers that are incorrectly handled during copy/assignment.
Last edited on
Seems you are right and I spoke too soon. After about 28 hours, an 'ENOMEM' appeared again, but removing that 2D vector copy certainly seemed to improve things alot.

Yeah data is just a plain typedef. The system can handle it, now that its frequency of occurance is so low, i'll just have to keep an eye on it I suppose.

Topic archived. No new replies allowed.