I was sitting down the other day talking to a friend of mine and he mentioned his wife giving me her old laptop but wanted me to copy her pictures, music, and movies on to her external HDD. Since I've been away from programming for a little bit, I have that want to get back into it. I thought this was an awesome concept, but ran into some serious problems the more I thought about it.
The proposal is to create a program that takes any number of file extensions, searches any specific directory (C: for windows, and root for Linux, and whatever other main, or specific, directories) for all corresponding files, displays them for the user, and copies them to a specific directory. You can specify a specific folder for each set of extensions and the program would place them accordingly.
For example, I set the program to look for .jpg, .gif, .png, and tell it to copy it to E:\Media\Pictures. The program scours the C:\ drive looking for any and all files that meet that wildcard, display all results (I understand that could be a huge list), and copies them while displaying a progress bar.
Some issues off the bat that I ran into is there is no standard C++ way of looking at files or searching them. System calls in windows are dangerous, and the WinAPI is brutal to learn, IMO. I came up with two solutions, neither of which I have used, being Boost libraries or wxWidgets. I'm leaning more towards boost since wxWidgets feels like learning a new language altogether.
Another issue I ran into was how I wanted to store the information. Since I'm essentially creating a list of files and just moving them, I figured using the std list would be ideal. Does anyone foresee any issues I may have? I thought about the duplicate names being copied, i.e. two pictures with the name image, but I believe that is more dependent upon the filesystem directly, or the library that I choose to use for the filesystem. I figure I could check at copy time, or scan time, to adjust file names, if needed as well.
Another big obstacle i ran into was with displaying a progress bar. I'm not even 100% positive this is going to be feasible on the console, but I'd love to be able to let the user know how long they may need to wait without scrolling a million lines to display every 20 seconds how much longer they need to wait.
I also would like to eventually turn this into a GUI, however a functional program is more important to me at the moment then the overall appearance of it. Of the upmost importance is the fact the program itself runs as fast as possible and the only thing that will slow it down is the copy/write speed of the hard drive.
I'm aware this is a huge undertaking going from learning projects to a physical completed project, but we must all start somewhere. I would like to see how you guys weigh in with thoughts about each of my potential issues and ideas to help me along. Also, if anyone has used either the boost filesystem and/or the wxWidgets language, how steep is the learning curve? I figure boost would be the easier of the two unless you have had prior experience with wxWidgets.
As per http://isocpp.org/std/status one may expect the committee to release a technical specification for std::filesystem which will be based on v3 of the boost::filesystem sometime next year. So, you can't really go wrong with going boost, as your experience with it will prepare you for what will (eventually) be standardized.
I'm honestly impressed. I didn't know it was physically possible to do something like that. On the plus side, that gives me something to work with whenever I get there.
Back to the Boost filesystem, I was doing some digging and thinking of a way to traverse through an entire filesystem, i.e. C:\ or Root, without losing my place. I forgot about recursion until I stumbled across this beauty, boost::recursive_directory_iterator. It is almost perfect, the only thing I'm wondering is how well it will work scanning an entire drive if a file is added, deleted, etc. while in progress.
Also, would it be best to create a list of path pointers when I find a file with an extension that I want? As in, would it be the fastest way to store everything I need for later?
Learning project and because I know C++. I took up programming for the challenge, and the harder something appears, the greater the appeal. Refer to my work on recreating the stl list. It wasn't perfect, and I had a lot of help from the generous members here, but I learned more about pointers and the way the stl containers work than I would have if I had just used them.
I'm already trying to figure out a way to force myself into learning threads with this project and looking for the most appropriate way to do so.
Although I'd vote for cire's solution given his information on the upcoming standard; I'd just like to say something in defense of the WinAPI. It is not difficult to learn, it is written for C so that it can be applied to either language and as long as you keep that in mind things become more obvious.
Take your program here for example, this would only require you to know how to use "FindFirstFile()", "FindNextFile()" (and their associated WIN32_FIND_DATA struct), "CopyFileEx()", what ever you want to use as a Callback function for "CopyProgressRoutine()" and "sprintf()" from the cstdlib header file. As an added bonus you'd be able to have the progress bar that cire mentioned without fussing around with threads. In case you wanted threads though the function is "CreaeteThread()".
Something I have prided myself on is the attempt to keep everything as portable as possible. Win32API doesn't achieve that while the boost library that I'm using does. Unfortunately, MinGW doesn't support C++11 threads so I have to use boost threads to achieve essentially the same thing.
So far, everything seems to be fairly easy to understand and implement, the only issue I'm running into is the concept of threading and what to actually use as a thread as opposed to keeping linear.
My concept so far, without completely understanding the trade-offs of performance, is to create separate threads for each major function. I want to have one function that will find all files, evaluate their extensions, and add wanted files to a list of path pointers. Another function will scan the directory, let's say starting with C:\, and find all following directories. The last function would be to run a progress bar that simulates the number of files found and the number of files scanned (I'm not sure that this is needed due to the fact I'm not reading each file, simply checking their extension).
Once the list of desired files are found, I would like to copy them to a destination directory and display a progress bar for that. cire's example of one seems very helpful and while it might take me a little while to convert it from a set time progress bar to a data size progress bar, I believe it should be simple enough.
I'm going to start a new thread trying to get help directly with my concept of concurrency since I believe that would be my best chance of getting help. Thanks for the ideas so far and has helped me out quite a bit already.
Unfortunately, MinGW doesn't support C++11 threads so I have to use boost threads to achieve essentially the same thing.
MinGW supports threads, just not the build that you get from the home site. You can get MinGW builds from here, http://sourceforge.net/projects/mingwbuilds/ . They even have MinGW gcc 64 bit, and new versions, 4.8.1. Just choose POSIX threads when installing.
In short, for this version of mingw, the threads-posix release will use the posix API and allow the use of std::thread, and the threads-win32 will use the win32 API, and disable the std::thread part of the standard.
Ok, so I considered your thoughts and have made it slightly simpler since I figure just the use of threads should be adequate practice for now. I've come up with asking the user the starting directory, having a preset set of extensions that fall under pictures, music, and movies (may make this a .ini in the future), and spawning a thread to scan the directory and all sub directories, while the main thread runs the function to copy files over. I have since ran into two problems.
First, if I'm understanding the different things about threads, I'll need to make a list that contains each path pointer but it should be a mutex. Whenever either one accesses the list, they will need to lock it to do their job and unlock it upon pushing an item or popping an item, correct?
Secondly, is there anyway for my main thread to run indefinitely until my scan thread finishes? There will be times, mainly at the start of the program, when the list is empty so i can't use while !myList.empty(). Another issue I'm running into is, if my main thread is handling the copying of files, some may be over a GB, is it more realistic to create a new thread for each file being copied so that main doesn't completely halt the scanning process?
I'm beginning to realize that this might be better off staying a linear program at this point, but I am determined to use threads for it. Performance is an issue for me, but in my mind, there is a place for threading in this program, even if its just o use cire's progress bar or something similar.
Yeah, I laughed a little inside after I posted that.
I don't get how you expect threads to improve performance of this program *unless* you copy those files between several physical hard disk pairs, not just one pair (or even worse - two partitions of the same device). And even in that case, it is possible to write a program that does parallel copying many disks at once in one single thread. Also, that won't gain much for SSDs, as a single SSD pair copying will almost saturate SATA III link (about 600 MB/s).
Two threads writing to the same filesystem simultaneously will make performance worse, not better, especially on Windows which is not very smart about prefetching and I/O scheduling.
Of course, for educational reasons it makes sense. Otherwise it does not (I assume you're *not* on a supercomputer with a beefy RAID array, but just a home PC with a single SATA III link).
Yes, Windows seems to be far behind on the technology aspect compared to POSIX systems, however, its become the publics OS of choice. Ideally, I want to use threads since I never did play with them before and I'm anxious to get started with them.
As far as I know, my netbook is about worthless, probably has a 50 MB/s speed, and is just naturally slow, at everything. My original idea behind this was to scan any directory and copy to another specified directory, but was assuming separate drives, internal HDD and external HDD, but never thought about a directory on the same drive, since that is a feasible option too.
Either way, I'm beginning to side with you that this is better left a linear program concept, however, I'm dying to start doing a practical concurrent program. I just want to see something I create be that much closer to a real world project, but have no idea where to start.
I was looking over cire's code and one of the first things i noticed was that it looked very professional, all the variables seemed to be properly typed, well laid out and thought out prior to writing it, but was a simple program. There are still some things i seem to have issues with understanding and that was a specific spot where he used const. I know I still have a lot to learn, and it seems like no matter how much I read and apply, there are always better coders out there.
Anyways, besides that need to release some frustration with myself, I need some honest suggestions for threads in a practical program. Not necessarily a thousand thread application, nor a two thread application either. I want some serious ideas that may require me to think, and coordinate effectively.
I just wanted to say thank you for sharing your idea, and code, for the progress bar. I designed a very basic one myself, but used file sizes as inputs as opposed to just a little timer. Now that I have the basis down, I'm on to learning the boost::filesystem and implementing threads.