too much cpu usage from my thread pool.

Forum

Forum
Beginners
too much cpu usage from my thread pool.

too much cpu usage from my thread pool.

closed account (SECMoG1T)

so i have written a command line utility program that scans a directory for files
["dat","txt",...] and use regex to find a search string in all searchable files -[am testing it using text files];

so am using the main thread to scan the dir and then submit the found files into a custom thread pool which uses several threads to search for the keyword in all files.

the search hits are then pushed into a vector and then printed onto the console,
the code works fine but it seems the custom thread pool is pushing my cpu to the edge, so could any one look at the code and tell me what i might be doing wrong.

the code contains four files:

main.cpp ///test code
file_handle.h
thread_pool.h
thread_safe_queue.h

usage for testing: pass the directory you want to search, you can set regex pattern in variable patt in main().

e.g C:\users\admin\desktop
patt = "c(\\w)*ion" searches all words starting with 'c' and ending with "ion" e.g cataion, corporation ....

result: result are printed with the file name at the top the all search hits follow together with the line numbers in which they occur.

link:
https://pastebin.com/FXZDvYZ2

thanks in advance.

Last edited on

poteto (525)

yield to sleep_for 1ms do anything?

if its the logic, then comment out the regex, getline, file io and find the location that causes the cpu usage.

Also please benchmark, I can bet that reading files in parallel is more than twice as slow than a single threaded program reading files one by one.

This should explain why (its from 2009 but it should still be relevant, page 2 has the graphs):
http://www.drdobbs.com/parallel/multithreaded-file-io/220300055?pgno=1

If you want something smart, throw away all your code and start from scratch, use 2 threads (only make 1), main thread reads the file into a string, move or shared_ptr the string to a plain queue, use a mutex to protect it. Then the 2nd thread finds the regex from a single line (using a similar spinlock you have but with a sleep_for) while the main thread is loading the file. Make sure you check if the thread is beating the IO or the IO is beating the thread then continue to optimize from there. It is rubbish but you shound't care about writing something that is awesome for someone with 20 cores, you should only care about whether the tool works for you (unless your some aspiring library writer, then give me my progress bar and command options, those features are way more important than threads!).

You should start with the bare minimum and see what makes the code significantly faster and if something doesn't make your code noticeably faster while making your code more complex and unreadable then it needs to be, than remove it.

You cant multi-thread without benchmarking because speed really is unpredictable if you aren't a master.

Last edited on

closed account (SECMoG1T)

@poteto thank you very much for the help, i appreciate but before i went deep into that i
tested the amount of time the significant code uses to load the files, load them into the thread pool and display the result as you suggested and am really surprised.

yesterday when i was testing : i used 5 text files with about 200k words, the time was ok but the cpu usage was really high {50+}

i just run the tests again now using a timer class and with 80 text files each with about 200k+ words and it took 10 seconds in average in 20+ tests , different keywords, high search hit count , the surprising thing is my cpu max usage was just about 3-4 % per each run which is way below yesterdays 50+, so i can say maybe my computers was just crazy yesterday or the performance is horrible for low file count or just a marginal case.

what do you think of those results from my today test, well i'll also try the dual threaded approach you suggested see if i can get better results with consistent performance, i'll also look through everything you suggested so i can learn as much as possible.

edit: previously i had used the normal single thread and it was a horrible wait , characteristics similar to an infinite loop, so i thought multi thread was a solution.

Thanks again,
Yolanda.

Last edited on

Peter87 (11175)

Isn't 100% CPU usage what you want? My guess is that when you used only 5 files the operating system was able to keep most, if not all of it, in cache so when your program read the files it was very fast. 80 files on the other hand is probably too much to be kept in cache so when your program read the files it probably had to spend a lot of time waiting for the data to be read in from disk which is very slow and would explain the lower CPU usage.

Last edited on

closed account (SECMoG1T)

@peter87 am not very sure but 50% for a single app, all the other applications in the system slows down and some even stop responding, i still have a lot to learn but will try to understand why it behaves that way.

thank you for your explanation .
Yolanda.

Peter87 (11175)

If you don't want to use all of your CPU why then are you using a thread pool? I thought that was the main reason why they're used.

If you use less % of your CPU it means that the CPU is spending more time doing nothing useful. You could of course change the program to use less CPU but that probably means it will take longer for the task to complete.

3-4 % makes it sound like your program is so called "I/O bound" so you are probably not benefiting much (if anything) from using multiple threads in that case. Most of the time (the other 96 %) is probably just spent waiting for the files to be read.

https://en.wikipedia.org/wiki/IO_bound

Last edited on

closed account (SECMoG1T)

thanks @peter87, i definitely needed to reduce the wait time, as i mentioned earlier my single thread version would go into dead silence for almost 5+ minutes and that is really long staring at a black screen and very inefficient also.

am still learning and i will try to optimize the code as much as i can, thanks for your advice.

poteto (525)

What I think happened is that you have 4 cores (maybe not), and when the first job finished it did the 5th file, but the other 3 threads were waiting in a spinlock with the yield (you can google "yield using high cpu").

But the fact you can't reproduce your problem is disturbing. I still recommend you to use sleep_for instead of yield because you aren't doing anything that is time intensive (which yield is good for since sleep_for is very inaccurate and lags much longer than 1ms, but gives the CPU some air).

I also don't understand how the single threaded version is inefficient, what speed did the multi threaded version get, was it 10 seconds? It really doesn't make sense because of the caching, if you use getline you get a small chunk of the full file, but caching should make accessing the rest of the data as fast as reading ram (for the allocation granularity size).

I may be wrong, maybe getline is much more inefficient than I think, but there really is no other easy way around it, even something more bare like finding the delimiter yourself, a 2x speedup isn't that big.

Are you using more than 1 disk with RAID 0 or RAID 5?

Can you do a test like this (you can use the logic of Dr Dobbs source test for simplicity):

Restart your computer, try multiple times

1
2
3

 //time start
open_file("1.txt");
//time end

Then restart your computer, try multiple times

 //time start
std::thread t1([](){open_file("1.txt");});
std::thread t2([](){open_file("2.txt");});
t1.join();
t2.join();
//time end

Don't forget to try tell us whats in them.

Last edited on

closed account (SECMoG1T)

guys i think i'll have to rewrite the single threaded code again and benchmark everything i just might be doing things the wrong way.

closed account (SECMoG1T)

@peter87 and @peteto am getting discouraging results after rewriting the single thread test code.

single thread test code using the 80 files runs to completion in 1 minute;
i.e from the time i create the regex object to after displaying the results on the console, the same takes 10 second in the multi threaded version.

the average read speed on file io in both multi and single thread: 1500 bytes/millisecond

from my analysis the code using most time in both multi and single thread code is the one printing the results to the console .

overall the performance from my multi threaded code is discouraging now, the single thrd version seems to run faster now.

am getting to think that this performance is too inconsistent i can't really tell how the code will run tomorrow (faster/slower).

test files source: stripped from a dictionary on git-hub.
computer details: 4 cores / 1TB 1000 hdd/ 4GB RAM

i'd like to know why the code performance keeps fluctuating.

Last edited on

Peter87 (11175)

the performance from my multi threaded code is discouraging now, the single thrd version seems to run faster now.

I don't understand why you say this. Didn't you just say that the single threaded code was 6 times slower (i.e. 1 minute vs. 10 seconds).

Thomas1965 (4571)

Would you be willing to try a new approach useing std::async ?
A very basic demo.

#include <iostream>
#include <fstream>
#include <vector>
#include <string>
#include <sstream>
#include <future>
#include <regex>
#include <ctime>
#include <stdexcept>
#include <cstdio>

using namespace std;

struct SearchResult
{
  string filename;
  vector<int> line_numbers;
};

const vector<string> FILE_NAMES = 
{
  "C:\\Test\\File1.txt",
  "C:\\Test\\File2.txt",
  "C:\\Test\\File3.txt",
  "C:\\Test\\File4.txt",
  "C:\\Test\\File5.txt",
  "C:\\Test\\File6.txt",
};

const string PATTERN = "Gipsbein|Gips|Lisa";

SearchResult search_file(const string& filename, const string& pattern);
void report(const vector<SearchResult>& search_results);
string read_file(const string& filename);

int main()
{
  cout << "Starting sequential search...."; 
  clock_t start = clock();
  vector<SearchResult> results;

  for (const string& fname : FILE_NAMES)
  {
    results.emplace_back(search_file(fname, PATTERN));
  }
  clock_t stop = clock();
  cout << "done. Took " << double(stop - start) / CLK_TCK << " s\n";
  report(results);

  cout << "Starting asynchronous search....";
  results.clear();
  clock_t start2 = clock();
  vector<future<SearchResult>> futures;

  for (const string& fname : FILE_NAMES)
  {
    futures.push_back(async(launch::async, search_file, fname, PATTERN));
  }
  for (auto& f : futures)
  {
    SearchResult res = f.get();
    results.push_back(res);
  }
  clock_t stop2 = clock();
  cout << "done. Took " << double(stop2 - start2) / CLK_TCK << " s\n";
  report(results);
}

SearchResult search_file(const string& filename, const string& pattern)
{
  SearchResult result;
  fstream src(filename);
  if (!src)
    throw runtime_error(strerror(errno));
  int line_number = 1;
  string line;
  regex rex(pattern);
  result.filename = filename;
  while (getline(src, line))
  {
    if (regex_match(line, rex))
    {
      result.line_numbers.push_back(line_number);
    }
    line_number++;
  }

  return result;
}

void report(const vector<SearchResult>& search_results)
{
  for (const auto& result : search_results)
  {
    cout << "File: " << result.filename << '\n' << '\t'; 
    if (!result.line_numbers.empty())
      for (int i: result.line_numbers)
        cout << i << ' ';
    else
      cout << "No results.";

    cout << '\n';
  }
}

kbw (9488)

Accessing files is slow. One of the most expensive things an OS can do is open a file, and this is especially true in Windows.

Also, searching those files concurrently will be slow and impact the system performance as it will flush cached file info that's currently in use by other applications.

I suggest you make the following changes to improve performance.
1. Use a queue to schedule work to be done, then processes that work queue by a single thread.
2. Use an event to check for file changes, rather than parsing directories for updates.

closed account (SECMoG1T)

@kbw thank you very much for the suggestions, i'll definitely try that, been away for sometime .

let me see if this can improve the performance.

Topic archived. No new replies allowed.

C++

Forum

too much cpu usage from my thread pool.