localhost performance question

I wrote my own network TCP/IP send/receive benchmark utility, and I have results similar/consistent with iperf. One reason I wrote my own was to try to better understand why localhost performance is so much slower than RAM (especially given that localhost isn't even suppose to touch the NIC -- you can disable ALL your network adapters, and still ping and send TCP over 127.0.0.1). I've assume localhost is implemented by the host OS (Windows 10 Pro 64-bit in this case) and NOT the NIC drivers or motherboard drivers (where I noticed localhost has a MTU of 4GB, but otherwise I'm not sure what size send/receive buffers are allocated to localhost).

I am seeing localhost performance of about 4000-8000 Mbps (or about 600-900 MBps) on i7-DDR3 based hardware (stuff under 2 years old), and memtest86+ is showing RAM speeds much faster than this (say 20k-40k Mbps). I assume the difference is due to TCP/IP packaging overhead (though localhost MTU is 4GB, I assume the host OS still has to partition data in packets). Note, I am using boost 1.60 and VS2015 Community (boost was also compiled with my VS2015) for the io service library.

My main real question is this: I have 12 different systems that I've tested this benchmark on (various laptops and desktops). I have a particular system that is consistently getting HALF the network I/O performance across localhost than my other systems. I HAVE tested with some older/slower 2007-2009 era DDR2-based systems, and indeed CPU/memory performance does effect the benchmark throughput (as expected!). But my original question stands because the unexpected performance is on a fairly modern system. So I suspect my question is more of a motherboard/chipset architecture question. Here are the details:

SYSTEM A: (named OANH, average result is 880/940 MBps send/recv performance)
OS Win10 64-bit, 16GB DDR3, i7-4770/3.4GHz CPU, ASUS B85M-E/CSM mainboard
mainboard link: http://www.asus.com/Motherboards/B85ME/

SYSTEM B: (named BLACKJACK, average result is 334/342 MBps send/recv performance)
OS Win10 64-bit, 16GB DDR3, i7-3770K/3.5GHz CPU, MSI Z77A-G45 mainboard
mainboard link: http://us.msi.com/product/motherboard/Z77AG45.html#hero-overview

These results are with both 125MB and 1024MB/1GB payloads. And again, I got similar results with iperf3 64-bit. I have several other i5/i7 DDR3-based systems that get similar performance as SYSTEM A/OANH. So that's my question -- memtest shows SYSTEM B/BLACKJACK has similar DDR3 main memory performance as all the other DDR3 machines. And they are using the same OS (Win10 64-bit). Architecturally, what could be causing localhost/127.0.0.1 traffic to be so much slower (almost half) on SYSTEM B? Does sending across localhost (under win10) involve the north or south bridge, or is any part of the bus involved?

While this is not a specific C++ question, eventually I would like to share my C++/boost implementation of a network benchmark. But I'm hoping to come across someone with insight on localhost implementation under Windows. I suppose in addition I should try to run this under Linux. But again, all my other Win10 64-bit systems are getting over 600MBps send/receive performance -- it's just this one machine that is getting the 300MBps half-performance, and I haven't yet really come up with a rationale on why.
Last edited on
note, for iperf3 I used...

server: iperf3 -p 2228 -s
client: iperf3 -p 2228 -c 127.0.0.0.1 -n 131072000 -f M


I use 131072000 since that is 125MB, which across an actual LAN Ethernet connection should take only 1 second to transfer (with gigabit components, as 1000Mbps = 125MBps). Across localhost, you'd expect the 125MB to transfer much faster than 1 second (so at 300MBps, that's pretty fast -- but with same OS and similar h/w, I'd expect the 600-900MBps performance that the other configurations are getting). I can change -n to 1024*1024*1024 (1GB) and the MBps throughput should be the same (which it is).

edit: also to clarify the averages reflect over 1000 iterations/runs, and with Windows configured the same across the machines (e.g. Windows Search disabled, all applications closed, etc) -- too the extent possible, e.g. there are variations of video drivers and motherboard drivers.


edit: here are iperf3 results and my ANT benchmark results for comparison (on SYSTEM B). Note I match the ~320MBps result, though my benchmark also shows main memory performance of ~3200MBps (which is one of my main questions: why is localhost TCP/IP performance so much slower than main memory performance?)

F:\iperf-3.0.11-win64>iperf3 -p 2228 -c 127.0.0.1 -n 1073741824 -f M
Connecting to host 127.0.0.1, port 2228
[ 4] local 127.0.0.1 port 56410 connected to 127.0.0.1 port 2228
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-1.00 sec 314 MBytes 314 MBytes/sec
[ 4] 1.00-2.00 sec 321 MBytes 321 MBytes/sec
[ 4] 2.00-3.00 sec 329 MBytes 329 MBytes/sec
[ 4] 3.00-3.18 sec 60.0 MBytes 327 MBytes/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-3.18 sec 1.00 GBytes 322 MBytes/sec sender
[ 4] 0.00-3.18 sec 1.00 GBytes 322 MBytes/sec receiver

iperf Done.


F:\ANT_0_1_0>ant_64 -b 1073741824
A Network (Performance) Tester v0.1.0
Initializing program options...
Logging data transfer content to file..: DISABLED
Status tempo...........................: 21 (OFF)
Working Buffer size....................: 65536 bytes
Execution mode.........................: CLIENT (requests data from an ANT Server)
Statistical history depth..............: 8
Attempting to resolve host [127.0.0.1:2228]
Host resolved to [127.0.0.1:2228]
Issuing command as a client named [BLACKJACK]
This client will be requesting [1073741824 bytes] repeated [20 times]
Performance result summary will be written to [results_BLACKJACK.txt]
ANT Test Command has been sent (33/33 bytes)
(000001/000020) [memory=3,405.792 SEND=000,344.265 RECV=000,312.626 MBps] [SEND=000,002.974449 RECV=000,003.275478 sec]
(000002/000020) [memory=3,314.526 SEND=000,332.801 RECV=000,319.411 MBps] [SEND=000,003.076912 RECV=000,003.205898 sec]
(000003/000020) [memory=3,330.506 SEND=000,326.892 RECV=000,320.404 MBps] [SEND=000,003.132535 RECV=000,003.195965 sec]
(000004/000020) [memory=3,335.661 SEND=000,326.325 RECV=000,322.805 MBps] [SEND=000,003.137973 RECV=000,003.172197 sec]
(000005/000020) [memory=3,339.928 SEND=000,326.043 RECV=000,324.354 MBps] [SEND=000,003.140693 RECV=000,003.157040 sec]
(000006/000020) [memory=3,345.167 SEND=000,325.893 RECV=000,325.457 MBps] [SEND=000,003.142133 RECV=000,003.146344 sec]
(000007/000020) [memory=3,347.297 SEND=000,323.656 RECV=000,324.082 MBps] [SEND=000,003.163854 RECV=000,003.159691 sec]
(000008/000020) [memory=3,348.379 SEND=000,322.546 RECV=000,323.620 MBps] [SEND=000,003.174744 RECV=000,003.164209 sec]
(000009/000020) [memory=3,343.477 SEND=000,319.824 RECV=000,325.442 MBps] [SEND=000,003.201765 RECV=000,003.146494 sec]
(000010/000020) [memory=3,360.573 SEND=000,320.140 RECV=000,325.937 MBps] [SEND=000,003.198602 RECV=000,003.141710 sec]
(000011/000020) [memory=3,356.278 SEND=000,319.559 RECV=000,325.156 MBps] [SEND=000,003.204419 RECV=000,003.149256 sec]
(000012/000020) [memory=3,357.076 SEND=000,318.083 RECV=000,323.702 MBps] [SEND=000,003.219286 RECV=000,003.163399 sec]
(000013/000020) [memory=3,348.365 SEND=000,316.350 RECV=000,321.851 MBps] [SEND=000,003.236924 RECV=000,003.181597 sec]
(000014/000020) [memory=3,345.184 SEND=000,314.819 RECV=000,320.344 MBps] [SEND=000,003.252658 RECV=000,003.196560 sec]
(000015/000020) [memory=3,342.649 SEND=000,315.298 RECV=000,320.870 MBps] [SEND=000,003.247716 RECV=000,003.191325 sec]
(000016/000020) [memory=3,325.359 SEND=000,316.684 RECV=000,322.179 MBps] [SEND=000,003.233512 RECV=000,003.178357 sec]
(000017/000020) [memory=3,322.083 SEND=000,316.288 RECV=000,321.919 MBps] [SEND=000,003.237553 RECV=000,003.180928 sec]
(000018/000020) [memory=3,320.656 SEND=000,316.328 RECV=000,321.986 MBps] [SEND=000,003.237148 RECV=000,003.180259 sec]
(000019/000020) [memory=3,323.505 SEND=000,316.304 RECV=000,322.014 MBps] [SEND=000,003.237390 RECV=000,003.179990 sec]
(000020/000020) [memory=3,320.593 SEND=000,316.271 RECV=000,322.091 MBps] [SEND=000,003.237731 RECV=000,003.179221 sec]
( TOTAL ) [memory=3,338.120 SEND=000,318.963 RECV=000,322.877 MBps] [SEND=000,003.210402 RECV=000,003.171487 sec] (TOTAL)
All iterations completed
Summary of results written to [results_BLACKJACK.txt]
Client side data transfer logging was not enabled.
Shutting down client connection
Last edited on
Topic archived. No new replies allowed.