Parsing in C++


I need to parse multiple lines from a file so that I only get a substring from each line. I have my sample data and my rough algorithm below.

Sample Data:

Input: 05/29/2014 08:03 PM 6,385,700 Intro-to-Political-Science.pdf

Expected Output: Intro to Political Science

Input: 14,655,232 Google.com.Regular.Expression.2001.pdf

Expected Output: Regular Expression

Input: Bing.com_Cplusplus Programming, Dale and Weems.pdf

Expected Output: Cplusplus Programming

My algorithm so far

Declare filename as string
Declare lineinput as string
Prompt user to enter filename
Input filename
Get lineinput from filename

I am not sure after this what regular expression would give me my desired output? I read about regex_match and match_results, but I was not sure how to use it in here.
Last edited on
for a simple input like that regex seems to be somehow overdone.

What are the criteria for the sub string?
My first thoughts were that it was too complicated for regex? And rather problematic in general, based on the limited information provided to far.

OK... here you somehow you spot that Bing.com is a web site and that Dale and Weems are the authors...

Input: Bing.com_Cplusplus Programming, Dale and Weems.pdf

Expected Output: Cplusplus Programming


but the other examples are just book titles, without authors, so how about:

Object-oriented Analysis, Design and Implementation: An Integrated Approach Paperback by Brahma Dathan, Sarnath Ramnath (dropping the subtitle...)

Input: Bing.com_Object-oriented Analysis, Design and Implementation.pdf

Expected Output: Object-oriented Analysis, Design and Implementation


rather that just Object-oriented Analysis

And how about:

Justinguitar.com Beginner's Songbook by Justin Sandercoe

Input: 05/29/2014 08:03 PM 6,385,700 Justinguitar.com Beginner's Songbook

Expected Output: Justinguitar.com Beginner's Songbook


how do you know to keep Justinguitar.com here but throw away Bing.com and Google.com elsewhere?

Andy
Last edited on
Topic archived. No new replies allowed.