about regex

first thing is i want to check if the user has supplied a string of "-c" in the method scan2c.
if the user did then the regex expression extensions takes of the form of
(c|cpp|h|hpp|hxx) which is a C++ file.

then i check if the file currently being scanned has an extension which matches the extension of a C++ file (c|cpp|h|hpp|hxx). so i would like to report the total number of (c|cpp|h|hpp|hxx) files respectivly and their respective total file sizes.

I am getting an error for the line:

if (is_regular_file(dir->status()) && regex_match(currentfileextensionstr , extensions))

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
  void scan2c(string swit,  path const& folder)
{
	cout << "\nScanning current folder:\n ";
	directory_iterator dir(folder);
	directory_iterator end;



	if (swit == "-c")
	{
		regex extensions(".(c|cpp|h|hpp|hxx)");
	}

	else {
	
		regex extensions(".(c|cpp|h|hpp|hxx)");
	
	}

	//regex extensions(".(c|cpp|h|hpp|hxx)");

	while (dir != end)
	{
		//cout << dir->path();

		 currentfileextension = dir->path().extension();

		  currentfileextensionstr = dir->path().extension().string(); 

		if (is_regular_file(dir->status()) && regex_match(currentfileextensionstr , extensions))
		{
			// do some computations 

		}

		++dir;
	}
	cout << endl;

	
}
> I am getting an error for the line:
> if (is_regular_file(dir->status()) && regex_match(currentfileextensionstr , extensions))

The name extensions is not visible outside the if or else blocks (not visible on line 30).

Note: for it to be interpreted as a literal '.', the '.' in the regex must be escaped.
i did this :


regex extensions("");

if (swit == "-c")
{
extensions = (".(c|cpp|h|hpp|hxx)");
}

else {

//regex extensions(".(c|cpp|h|hpp|hxx)");

}


I noticed that if i did not escaped the "." infront of extensions , it still works not sure why . so is it better to escape the "." and why ?

extensions = ("\.(c|cpp|h|hpp|hxx)");

also i have seen these regex expressions , i am not sure what they mean.

extensions with a double slash \\

extensions = ("\\.(c|cpp|h|hpp|hxx)");

and also this:

regex extensions("\\.(cpp|c|h|hpp$");

why is the dollar sign "$" only applied for hpp ,

if i was looking for if the string ends with a .cpp or .c or .h or .hpp should there not be a dollar sign infront of each possilbe case ?





Last edited on
> so is it better to escape the "." and why ?

In a regular expression, the period '.' is a special character (a metacharacter).
To use it literally, it needs to be escaped.
.cpp - any character, followed by cpp (matches acpp, bcpp, xcpp, .cpp, ycpp)
\.cpp - the period, followed by cpp (matches .cpp)


> with a double slash \\

In C++ string literals too, the backslash is treated specially as an escape character.
To specify a single literal backslash, we would have to write "\\"
So to specify the regular expression \.cpp we would write const std::regex re( "\\.cpp" )

When many backslashes are involved, using raw string literals would be easier: std::regex re( R"(\.cpp)" )
http://www.stroustrup.com/C++11FAQ.html#raw-strings


> why is the dollar sign "$" only applied for hpp

To make it applicable to all of the alternatives, place the $ outside the closing parenthesis for the alternatives.
1
2
 // a. literal '.' b. one of cpp|c|h|hpp c. end of line/string
regex extensions( "\\.(cpp|c|h|hpp)$" );
if i follow the logic,

would u not need three slashes to represent \.cpp ?

one slash to escape the slash character itself ,

that gives 2 slash characters,

plus one more slash to escape the "." ,

so it will look like this :

const std::regex re( "\\\.cpp" ) ??

Try this:
1
2
3
4
5
6
#include <iostream>

int main()
{
    std::cout << "\\.cpp" << '\n' ; // prints \.cpp
}
but is my logic correct ?
No.
The sequence \\ in C++ literal string yields one \
That one \ escapes the . in the regular expression

This is a syntax error: "\\\.cpp"

1
2
3
4
int main() 
{
    "\\\.cpp" ; // error: unknown escape sequence: '\.'
}

http://coliru.stacked-crooked.com/a/57c4faf18cc989f8

so do i need two slashes "\\" when i want to escape whatever symbol ?

for instance if i want to check if the user enters "\.(cpp|c|h|hpp)" with quotation marks;

the regex extensions would be:


regex extensions( "\\"\\.(cpp|c|h|hpp)\\"" )

or would it be

regex extensions( "\\"\\.\\(cpp|c|h|hpp\\)\\"" )

because i need to escape the parenthesis "(" and ")" , again i am not sure if i have to escape the parenthesis.


> i am not sure if i have to escape the parenthesis

Yes, because we want them to be interpreted as literal ( and ).
Parentheses (both the opening parenthesis and the closing parenthesis) are metacharacters.

In the common flavours of regular expressions:
there are 12 characters with special meanings: the backslash \, the caret ^, the dollar sign $, the period or dot ., the vertical bar or pipe symbol |, the question mark ?, the asterisk or star *, the plus sign +, the opening parenthesis (, the closing parenthesis ), the opening square bracket [, and the opening curly brace {.
http://www.regular-expressions.info/characters.html
i still dont get it what is the reason i have to escape the dot "." in my regular expression :

So to specify the regular expression \.cpp we would write const std::regex re( "\\.cpp" )

can i just write:

regex re( ".cpp" )
> what is the reason i have to escape the dot "." in my regular expression :

We are meandering around in circles, aren't we?
See: http://www.cplusplus.com/forum/beginner/219493/#msg1011201


Do these first:

Go through the lessons, one by one, in this tutorial:
https://www.regexone.com/lesson/introduction_abcs

Try your hand at solving the practice problems starting from: https://www.regexone.com/problem/matching_decimal_numbers
these links are awesome , but they don't show the solution when i click on it



Task Text
Match 3.14529 Success
Match -255.34 Success
Match 128 Success
Match 1.9e10 Success
Match 123,340.00 Success
Skip 720p Failed


regex: \d



Exercise 2: Matching With Wildcards
Task Text
Match cat. Success
Match 896. Success
Match ?=+. Success
Skip abc1 Failed


regex: ... ( three dots)




Exercise 3: Matching Characters
Task Text
Match can Success
Match man Success
Match fan Success
Skip dan Failed
Skip ran Failed
Skip pan Failed

regex: \w+
Last edited on
> they don't show the solution when i click on it

You appear to be using a script blocker (like ScriptSafe).
To see the solutions, allow scripts to run on regexone.com and cdnjs.cloudflare.com


Match        cat.
Match        896.
Match        ?=+.

Skip         abc1

regex: ...\.


Match        can
Match        man
Match        fan

Skip         dan
Skip         ran
Skip         pan

regex: [cmf]an
i am using https://regex101.com/

i have entered \(w\) as my regex

and the test string (e) .

and i am not getting match ...

would not \(w\) mean that i am looking for any word enclosed with a opening and closing paranthesis

the delimiter i am using is a / and no regex flags and flavor is php
Last edited on
Any word enclosed within opening and closing parentheses: \(\w+\)
Topic archived. No new replies allowed.