if (x in xs) ...

A few years ago on SO someone asked for a “Compact way to write if(..) statement with many equalities”.
https://stackoverflow.com/questions/35984196/compact-way-to-write-if-statement-with-many-equalities/35984558

I responded with some Radical Language Modification* and I have been thinking about finalizing a nice design to do just that ever since.

The goal is to write statements like this:

    if (x in xs)
    if (x in {1, 2, 3})

And maybe the inverse too:

    if (x ni xs)
    if (x ni {1, 2, 3})

There are several significant problems to that, though, because the C++ Standard makes RLM rather difficult. Specifically, all of the following are an issue:

  • new operators cannot be created in C++
  • std::initialization_list requires object construction or one of the assignment operators
  • assignment gets ye olde -Wparentheses warning (and lots of it)
  • you cannot overload binary operators for built-in types
    (specifically const char* and const char[N])
  • operator precedence limits underlying operator choice
  • minimum standard requirements
  • and of course, evil macros

These all collude to make things potentially inconsistent, syntax-wise.

I could, of course, forget the pretty and create a function:

    if (x_in_xs( x, xs ))
    if (is_member_of( x, xs ))
    if (contains( x, xs ))
    
etc.

But I would not be posting if I didn’t want pretty, so forget the function.
I will create it anyway, just for the pretty operator overload and macro to use. Heh.

As for the macro, in my favor is that a macro named “in” is not likely to collide with anything, and it is easy to avoid using it if desired. (Seriously, though, don’t use “in” as a name in your code!)

So, here are my options as I see them:
Last edited on
OPTION ONE

Require parentheses as part of the expression.
This would make everything very consistent and pretty:

    if ((x in xs))
    if ((x in {1, 2, 3}))
    if ((c in "abc"))
    if (("hello" in {"hello", "world"}))
    bool ok = (x in xs);
    my_function( (x in xs) );
    if ((x in xs) and (y in ys))

I am leaning toward using this solution, actually. It solves almost all the problems with every other solution. It just needs those obnoxious extra parentheses. And an extra-unpretty version of the in macro.

For this solution, the underlying operator could be written with the original comma operator as in my SO post, but I would actually prefer (and it would work fine with) the bitwise OR operator, such that the following could also be available:

    if (x|xs)
    my_function( x|xs and y|ys )

but not:

    if (x|{1, 2, 3})

The underlying #define would be something along the lines of:

    #define in , duthomhas::in =

or

    #define in | duthomhas::in =

Where duthomhas::in is a template thunk to pass the RHS object along. (Operator precedence does not matter here — it can be managed either way.)

This incidentally makes an inverse operation rather trivial (and readable) as well:

    if (!(x in xs))

in addition to a not-in operator complement:

    if ((x ni xs))
Last edited on
OPTION TWO

Make macros evil again.
Define a variadic macro.

    #define in(...) | duthomhas::in{ __VA_ARGS__ }

However, this does introduce a syntactic inconsistency that I do not like:

    if (x in (xs))

The list of things works fine though... even if it does not look like C++:

    if (x in (1, 2, 3))

Naturally, the type of x and xs’s elements remain important, so that the following both work properly:

    if (c in ("abc"))
    if (s in ("hello", "world"))

I do not like this solution.
Last edited on
OPTION THREE

Forget assignment operators and forbid magic initializer_list syntax.
This would allow the omission of those pesky extra parentheses and still be very nicely-behaved, precedence-wise. It also is a bit more pedantic on C++ syntax.

This means that all XS must now have an explicit type:

    if (x in std::set{1, 2, 3}) // C++17
    if (x in std::set <int> {1, 2, 3}) // C++11

This can be made more readable with another evil macro, though:

    #define L std::set
    if (x in L{1, 2, 3})

The utility of that kind of reusable solution is limited to C++17, though, otherwise it is no better than just using a local value:

    std::set <int> xs{1, 2, 3};
    if (x in xs)

Which I think kind of kills the pretty readability thing.

Likewise we hit the ‘no strings allowed’ problem, which must be remedied with an explicit string type:

    if (c in "abc"s)


https://www.google.com/search?q=i'm+a+frayed+knot
Last edited on
Well, I think that’s all I’ve got for the moment. Thoughts? Is it reasonable to use ((lots of parentheses)) for a petty vanity?
Last edited on
Like in Python?
1
2
3
4
5
x = 12
if x in { 1, 2, 3 }:
   print( "Yes!" )
else:
   print( "No!" )


Trouble is, I use "in" for the name of an istream rather a lot.

For c++, maybe a heavily overloaded/templated function
is_element_of( x, S )
simplifying any use of std::find().
Last edited on
I would do something like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
#include <iostream>
#include <functional>
#include <iterator>
#include <initializer_list>
#include <algorithm>

template < typename T, typename SEQ, typename EQ = std::equal_to<void> >
bool in( const T& value, const SEQ& seq, const EQ& eq = {} )
{ return std::any_of( std::begin(seq), std::end(seq), [&]( auto&& v ) { return eq(v,value) ; } ) ; }

template < typename T, typename U, typename EQ = std::equal_to<void> >
bool in( const T& value, const std::initializer_list<U>& seq, const EQ& eq = {} )
{ return std::any_of( std::begin(seq), std::end(seq), [&]( auto&& v ) { return eq(v,value) ; } ) ; }

// TO DO: specialise for associative sequences (optimisation)

#include <vector>
#include <cstring>
#include <iomanip>

int main()
{
    std::cout << std::boolalpha << in( 'a', { 100, 94, 1, 224, 97, 100 } ) << '\n' ;

    const std::vector<std::string> vec { "abcd", "efgh", "ijkl", "mnop" } ;
    char cstr[100] = "ij" ;

    std::cout << "case-sensitive: " ;
    if( in( std::strcat( cstr, "KL" ), vec ) ) std::cout << "found " << std::quoted(cstr) << '\n' ;
    else std::cout << "did not find " << std::quoted(cstr) << '\n' ;

    const auto eq_ncase = [] ( const std::string& a, const std::string& b )
    {
        if( a.size() != b.size() ) return false ;
        for( std::size_t i = 0 ; i < a.size() ; ++i )
            if( std::toupper( (unsigned char)a[i] ) != std::toupper( (unsigned char)b[i] ) ) return false ;
        return true ;
    };

    std::cout << "case-insensitive: " ;
    if( in( cstr, vec, eq_ncase ) ) std::cout << "found " << std::quoted(cstr) << '\n' ;
    else std::cout << "did not find " << std::quoted(cstr) << '\n' ;
}

http://coliru.stacked-crooked.com/a/174859836478b3ed
Much to think about.

@JLBorges
Yes, you have noticed the std::find_if / std::any_of connection. Too much rainbow in that cookie, though. Comparability magic should pre-exist, IMO.

    if (contains( x, xs ))      
←→    if ((x in xs))
    if (in( x, xs, []{...} ))   
←→    if (any_of( x, xs, []{...} ))
                   ↑
                   er, what?
                   Does not read well without having to look up what “in” means beyond “membership”...?


Remember, “in” is strict membership, fudging only on is_comparable( x, any element of xs ).

lastchance wrote:
Trouble is, I use "in" for the name of an istream rather a lot.

Crud.

I knew that, too. Using languages where “in” actually is an operator (like Pascal and Python), has made me hesitant to ever use it as an object identifier to the point where I forget that people do.

And such use is perfectly valid reasonable, too.

(I tend to use “input” and “inf” where it makes a difference over just “f” for file meta names.)

    if (x|xs) still has the readability problems... (“What does that mean...?”)

    if ((x in xs)) would still work, but it would break any place you try to use “in” as an identifier...

Though, and this is probably a conceit of mine, since C++ does not have anything like Python’s set comprehensions, the (x|xs) syntax is still close enough to set builder notation that it works for my mind. Maybe even better as ((x|xs)), since the extra parentheses do actually indicate that we are going boolean with the expression.

In Pascal, you can say “x in xs” for a simple membership predicate, and “xs <= ys” for a subset predicate (xs ⊆ ys), so I have considered <= as well, but it just doesn’t flow, and is no better than a simple vertical pipe. That is, if (x <= xs) just doesn’t feel right, and makes no more sense than if ((x|xs)).

Sigh. If only there were in and ni operators in C++. (Something I have always missed.)

Vanity projects can be both rewardingly fun and tear-your-hair-out frustrating.


Y’all’ve given me [more stuff] to think about. Thanks!
duthomas wrote:
new operators cannot be created in C++


I wish they could, though. Then you could write (say):
if ( x .in. S )
which would effectively call
template<typename T, typename SEQ> operator(.in.) ( T x, SEQ S )
Last edited on
Maybe not enough demand but a true set with union/intersection/etc would be helpful here, or algorithms that do those things to the various containers (?).

The only place where I have needed this (in bulk, clearly I have to search for 1 item at times, but answering if 100 things are all in one million things etc), I was working with a database and just let the DB do the work via sql.

set manipulation would clearly not cover everything in the original scope, but it gets some of them. Or am I x/y ing your problem into something else here?
Last edited on
Not really. The Y is pretty much the motivating rationale for the X.

But, simply, I just want a simple, in-language way to test for membership instead of having to be all C++-cryptic about it.

I mean, how often have we all written something like:

if (std::string( ", \r\n" ).find( c ) != std::string::npos)

That hurts to read as much as it hurts to write.

if ((c in ", \r\n"))

or even:

if (is_member_of( c, ", \r\n" ))

hurts a lot less.

A membership predicate is a common operator in languages supporting some kind of set.
Hi,

jonnin wrote:
Maybe not enough demand but a true set with union/intersection/etc would be helpful here, or algorithms that do those things to the various containers (?).


This is just a mere thought: boost::hana has a set container, maybe one could use that in some way?

https://www.boost.org/doc/libs/1_61_0/libs/hana/doc/html/structboost_1_1hana_1_1set.html

Perhaps create two sets: The first with the items to be looked for; the other of all the items. Then use boost::hana::set::intersection.

Not sure about an empty set notion.

Hana does have a bunch of concepts though.

The other thing about Hana is that it is supposed to work on heterogenous container with types and values.

Good Luck :+) !!

Edit: I guess the main problem is that it's all compile time :+|
Last edited on
boost is a good answer. Though it feels like a bulldozer potting a flower here.
Boost::Hana is to (x in xs) as World War I is to ‘my kid frowned at me today.’
I would absolutely not export any macro named in. IMO, that's a complete non-starter.

Of course, given a target of expression in expression, there's not many decent choices for in.

You noticed that in could be replaced with a form like operator in operator, but I don't understand why you chose , and = when any binary operators would do. So that's my take on it -- requiring the user to write elt *in* sequence. A little ugly, but it eliminates the need for any macro.

I chose binary multiplication because it does not have minimum precedence. The goal is to allow
if (auto const& o = foo; o *in* collection && test(o)) { /* ... */ }

The main issue is that it could easily look like a multiplication.

The idea is to write a binary function named contains(element, sequence) and curry it. In this case I just hacked in a way to use '*' as the binding form instead of instead of the typical postfix ().

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
#include <utility>
template <typename Fn>
class invoke_with_multiply
{
  Fn fn_;

public:
  constexpr explicit invoke_with_multiply(Fn&& f)
    : fn_(std::forward<Fn>(f))
  {}

  template <typename Arg>
  friend constexpr decltype(auto) 
  operator*(invoke_with_multiply<Fn> r, Arg&& arg)
  { return r.fn_(std::forward<Arg>(arg)); } 
};

inline constexpr auto contains
  ([](auto const& x) {
     return [x](auto&& seq) { 
	      for (auto&& elt: seq) 
		if (elt == x) return true;
	      return false;
	    };
   });

inline constexpr struct infix_contains_t {} in;

template <typename Lhs>
constexpr auto operator*(Lhs&& lhs, infix_contains_t)
{ return invoke_with_multiply(contains(std::forward<Lhs>(lhs))); }

int main()
{
  static_assert('2' *in* "1234");
}


It's sloppy code, but hopefully it gets the idea across.

Edit: You could overload operator ->* in this context. That's a clear flag that something unusual is going on, although it's really ugly.
Last edited on
Very nice, but wrapping a class with leading and trailing operators is what my ugly macro does. Still, I somehow had not thought of it in exactly this manner.

So how about:

    if (x <in> xs)

Eh? Eh?

Choice of macro import has always been up to the user. So, if user wants to use

    #define in <in>

then the expression becomes the even prettier

    if (x in xs)


There is one issue that is still unresolved, though: You cannot use an initializer_list with non-assignment binary operators, required for constructs like:

    if (s in {"hello", "world"})

or:

    if (s *in* {"hello", "world"})

This is the reason for the ugly ‘<pick a binary operator> <thunk class> =’ macro definition — otherwise you cannot have the above construct. You are left requiring the user to create a specific type:

    if (x <in> std::set{ 1, 2, 3 })

Part of the original concept was to allow the magic literal construct.

However, this might be palatable in the end.
Topic archived. No new replies allowed.