It depends on what the bottlenecks are. If it's slow mainly because it's waiting for the data to be read from the hard drive you will not see much benefit using multiple threads.
The files are small component files residing in different "category" folders in hierarchical order. Each directory has atleast one file or a sub-directory. So unfortunately we can't skip any folder in the scan process.
I have tried to time the application, and I see that maximum time is consumed will trying to recurse the directory structure, not while reading the files.
void FileElementsCategory::reload() {
// ... some code
QStringList dirs = cat_dir.entryList(QStringList(), QDir::AllDirs | QDir::NoSymLinks | QDir::NoDotAndDotDot, QDir::Name);
//Maximum time is consumed in this loop
foreach(QString dir, dirs) {
// this creates a new element category and calls its "reload" function again.
categories_.insert(dir, new FileElementsCategory(cat_dir.absoluteFilePath(dir), this, file_parent_collection_));
}
// Process the file (does not take much time)
QStringList elmts = cat_dir.entryList(QStringList("*.elmt"), QDir::Files, QDir::Name);
foreach(QString elmt, elmts) {
elements_.insert(elmt, new FileElementDefinition(cat_dir.absoluteFilePath(elmt), this, file_parent_collection_));
}
}
Be VERY careful what you use multi-threading for. If you use multi-threading to load data, and you write it wrong, you can have a race-condition nightmare. I would suggest each thread load 1 file.
Another thing you can do is change the file-structure so that it loads, reads, or writes differently. In other words: you could have some sore of reading mechanism in place that allows you to skip to different areas in the file.
#include <iostream>
#include <vector>
#include <string>
#include <chrono>
#include <boost/filesystem.hpp>
int main()
{
usingnamespace std::chrono ;
constauto start = steady_clock::now() ;
std::vector<std::string> file_list ;
auto begin = boost::filesystem::recursive_directory_iterator("/usr/local");
constauto end = boost::filesystem::recursive_directory_iterator() ;
unsignedlonglong cnt = 0 ;
for( ; begin != end ; ++begin )
{
++cnt ;
const std::string path = begin->path().string() ;
if( path.find( ".a" ) != std::string::npos ) file_list.push_back(path) ;
}
constauto dur = steady_clock::now() - start ;
std::cout << cnt << " directory entries were searched\n"
<< file_list.size() << " entries with '.a' in the path were added to the vector\n"
<< "elapsed wall-clock time " << duration_cast<milliseconds>(dur).count() << " milliseconds.\n" ;
}
g++-4.8 -std=c++11 -O2 -Wall -Wextra -pedantic-errors -pthread main.cpp -lboost_system -lboost_filesystem && ./a.out
13923 directory entries were searched
226 entries with '.a' in the path were added to the vector
elapsed wall-clock time 108 milliseconds.
#include <iostream>
#include <vector>
#include <string>
#include <chrono>
#include <boost/filesystem.hpp>
#include <atomic>
#include <future>
std::atomic<unsignedint> nthreads{1} ;
std::vector<std::string> find_files( boost::filesystem::path p, std::string substr )
{
++nthreads ;
std::vector<std::string> file_list ;
auto begin = boost::filesystem::recursive_directory_iterator(p);
constauto end = boost::filesystem::recursive_directory_iterator() ;
for( ; begin != end ; ++begin )
{
const std::string path = begin->path().string() ;
if( path.find( substr ) != std::string::npos ) file_list.push_back(path) ;
}
return file_list ;
}
int main()
{
usingnamespace std::chrono ;
constauto start = steady_clock::now() ;
std::vector<std::string> file_list ;
auto begin = boost::filesystem::directory_iterator("/usr/local");
constauto end = boost::filesystem::directory_iterator() ;
std::vector< std::future< std::vector<std::string> > > futures ;
for( ; begin != end ; ++begin )
{
const boost::filesystem::path p = begin->path() ;
if( boost::filesystem::is_directory(p) )
futures.emplace_back( std::async( std::launch::async, find_files, p, ".a" ) ) ;
else
{
const std::string path_str = p.string() ;
if( path_str.find( ".a" ) != std::string::npos ) file_list.push_back(path_str) ;
}
}
for( auto& f : futures )
{
auto v = f.get() ;
file_list.insert( file_list.end(), v.begin(), v.end() ) ;
}
constauto dur = steady_clock::now() - start ;
std::cout << file_list.size() << " entries with '.a' in the path were added to the vector\n"
<< "elapsed wall-clock time " << duration_cast<milliseconds>(dur).count() << " milliseconds.\n"
<< "#threads: " << nthreads << '\n' ;
}
g++-4.8 -std=c++11 -O2 -Wall -Wextra -pedantic-errors -pthread main.cpp -lboost_system -lboost_filesystem && ./a.out
226 entries with '.a' in the path were added to the vector
elapsed wall-clock time 104 milliseconds.
#threads: 13
> Can you please explain what this line does:
> futures.emplace_back( std::async( std::launch::async, find_files, p, ".a" ) ) ;
Breaking it down:
1.
1 2
std::async( std::launch::async /* execute in a separate thread*/,
find_files /* this callable object (function) */, p, ".a" /* with these arguments */ )
2. std::aync() returns a std::future<>; through which we can access the result of asynchronous operation (in this case, a vector of strings) later. http://en.cppreference.com/w/cpp/thread/future
3. futures.emplace_back( std::async( ... ) ) ;
Save the returned std::future<> in a vector of futures for later use.
Moves the std::future<> into the vector; future is not CopyConstructible, but it is MoveConstructible
4. And later:
1 2 3 4 5 6 7 8 9
for( auto& f : futures ) // for each future that was saved
{
auto v = f.get() ; // move the result (vector of strings) into v
// if the result is not yet available, wait for it to become available
// http://en.cppreference.com/w/cpp/thread/future/get
// append the vector of strings returned by the async call to file_list
file_list.insert( file_list.end(), v.begin(), v.end() ) ;
}
Note: The code snippet makes use of the knowledge that /usr/local contains just a bunch of sub-directories.