istream estraction of unsigned values

Forum

Forum
General C++ Programming
istream estraction of unsigned values

istream estraction of unsigned values

I am trying to extract unsigned values from an input stream. I expect the extraction to fail when an invalid character is extracted. It fails correctly when I try to extract an unsigned int from "abc", but when I try to extract an unsigned in from "-1", the extraction succeeds, and the max unsigned int value is extracted (as if -1 were cast to unsigned int). I would expect the '-' to cause the extraction of an unsigned value to fail.

The code I am using is below.

#include <iostream>
#include <sstream>
#include <string>
#include <limits>

int main()
{
    unsigned int value = 8;
    std::string negString = "-1";
    std::istringstream negStream(negString);
    negStream >> value;
    std::cout << "Value:  " << value << std::endl;
    return 0;
}

Is this standard behavior for an istream extractor?

I am trying this in both Linux (gcc 4.4.3) and in windows with Code::Blocks (whaterver came with CB 13.12, apparently gcc 4.7.1)

Last edited on

iQChange (374)

Yes. They have the same bit structure ( when the signed get the ((max value of unsigned / 2)+1), it becomes the ((max value of unsigned/2)*-1). ). I used to do this with regex:

#include <iostream>
#include <sstream>
#include <regex>
#include <string>
int main()
{
    unsigned my_num = 0;
    while(true)
    {
        std::string input;
        std::cout << "Enter a unsigned number: ";
        std::getline(std::cin, input);
        if(!std::regex_match(input, std::regex("(?:\\+)?(\\d+)")))
        {
            std::cout << "Input what I want! >:(\n\n";
            continue;
        }
        else
        {
            std::stringstream(input).operator>>(my_num);
            break;
        }
    }
    //Your code here
    return my_num; //I have just put this to test ;). In CodeBlocks, it returns the input.
}

Regexes are kinda heavy. You may check yourself with a function like:

#include <string>
bool valid_number(std::string str)
{
    bool state = true;
    for(std::string::iterator ch = str.begin(); ch != str.end(); ch++)
        if(ch == str.begin()) state &= (((*ch - '0' >= 0) && (*ch - '0' <= 10)) || (*ch == '+'));
        else state &= ((*ch - '0' >= 0) && (*ch - '0' <= 10));
    return state;
}

Hope this helps!

Last edited on

Duthomhas (13125)

Be aware that <regex> is only really available in the most recent versions of compilers. If you have an older compiler, Boost Regex is a handy drop-in replacement.

Personally, I prefer to get all user input as strings, then validate and convert them.
(The user will always press Enter at the end of every input!)

Also, I agree that the -1 thing is a really stupid rule. The only possible reason for using unsigned with input is to ask the compiler to validate the input as non-negative. But the rule makes that meaningless. I guess someone on the standards committee thought that being able to say -1 to get all bits set no matter the size of the value was worth the dumb rule.

doug4 (1535)

So that is the way it's supposed to work?

I love C++, but this rule is a strike against in my book.

iQChange (374)

Well, almost all the programs (I suppose as this occurred with me) you will be using a framework/toolkit/native GUI(e.g.:Win32). Some of them must have these regexes and checking, but others don't. So if you don't want to make them, I'm sorry: you'll have to seek another language.

MiiNiPaa (8886)

It looks like this behavior violates standard. Extraction operator is defined as relying on num_get in standard and in num get description there is this line:

Standard wrote:
22.4.2.1.2.3 The numeric value to be stored can be one of: — zero, if the conversion function fails to convert the entire field. ios_base::failbit is assigned to err. — the most positive representable value, if the field represents a value too large positive to be represented in val. ios_base::failbit is assigned to err. — the most negative representable value or zero for an unsigned integer type, if the field represents a value too large negative to be represented in val. ios_base::failbit is assigned to err. — the converted value, otherwise. The resultant numeric value is stored in val.

Standard wrote:

22.4.2.1.2.3
The numeric value to be stored can be one of:
— zero, if the conversion function fails to convert the entire field. ios_base::failbit is assigned
to err.
— the most positive representable value, if the field represents a value too large positive to be represented in val. ios_base::failbit is assigned to err.
— the most negative representable value or zero for an unsigned integer type, if the field represents a value too large negative to be represented in val. ios_base::failbit is assigned to err.
— the converted value, otherwise.
The resultant numeric value is stored in val.

It is the same for both C++11 and C++14.

On a side note result of negative number assigment to unsigned number is implementation defined. Some compiler do assign underlying bit structure, others might assign absolute value or everything but sign bit.

Cubbi (4774)

Clang produces ~~the expected~~ zero: http://coliru.stacked-crooked.com/a/c1af407dbb0c1005 -- check (and possibly post) gcc bugzilla (Edit: actually it's an open LWG issue 1169, so don't rush with bug reports)

(note: the reasoning is a bit more complex than MiiNiPaa's quote indicates, see http://stackoverflow.com/a/13046160/273767 for some discussion)

Last edited on

Duthomhas (13125)

Hmm, I haven't taken the time (and I don't care to now) to look at the Standard, but I'm pretty sure there was a recent discussion about the same issue, where someone pointed out where --- oh, I know what it was: It was because someone expected the failbit to be set and it wasn't for "-1" input to an unsigned, and the previous value to be left unmodified.

...

MiiNiPaa (8886)

Hm. This line is actually misleading, as there is no way to apply it.
Previous part reads:

The sequence of chars accumulated in stage 2 (the field) is converted to a numeric value by the rules of one of the functions declared in the header <cstdlib>:
— For a signed integer value, the function strtoll.
— For an unsigned integer value, the function strtoull.
— For a floating-point value, the function strtold.

strtoull cannot possible return negative value. C standard on strtoull is prone to misreading as it seems but it looks like implementation defined result of negating of positive part should be returned. Still,

the most positive representable value, if the field represents a value too large positive to be represented in val. ios_base::failbit is assigned to err.

should apply (unless value returned by strtoull is within range of result) and failbit should be set.
However heavely cross-referenced standard is not easy to read, C++ standard does not mention which version of C standard library (and what) should be included, and

suitably adjusted to ensure static type safety

part looks too general. So I don't have any idea what should really happen anymore.

JLBorges (13770)

AFAIK, the clang++ / libc++ behaviour is conforming.

The sequence of chars accumulated in stage 2 (the field)

This extracts (consumes) the characters making up the negative number

converted to a numeric value by ... the function strtoull.

The characters accumulated are converted to an unsigned integer as per the rules of C.

The numeric value to be stored ... the most negative representable value or zero for an unsigned integer type, if the field represents a value too large negative to be represented in val. ios_base::failbit is assigned to err.

The extracted and converted value is not stored; instead it is discarded, zero is stored, and failbit is set.

#include <iostream>
#include <string>

int main()
{
    unsigned int u = 77 ;
    
    std::cout << "std::cin >> u\n" ;
    if( std::cin >> u ) std::cout << "success: value " << u << " was read\n" ; 
    else 
    {
        std::cout << "input failure: value was set to " << u << '\n' ;
        // the characters making up the negative number ("-1") were extracted, converted and discarded
        // and the stream was put into a failed state
        std::cin.clear() ;
    }
    
    std::string str ;
    std::cin >> str ; // read the string remaining in the input buffer ("test")
    std::cout << "the string read is: " << str << '\n' ;
}

echo 'clang++ libc++' && clang++ -std=c++14 -stdlib=libc++ -O2 -Wall -Wextra -pedantic-errors main.cpp -lsupc++ && ./a.out <<< -1test
echo -e '\ng++ libstdc++' && g++ -std=c++14 -O2 -Wall -Wextra -pedantic-errors main.cpp && ./a.out <<< -1test
clang++ libc++
std::cin >> u
input failure: value was set to 0
the string read is: test

g++ libstdc++
std::cin >> u
success: value 4294967295 was read
the string read is: test

http://coliru.stacked-crooked.com/a/595c78b7db698c4a

Note: The behaviour of std::scanf (and scanf in C) is different.
http://coliru.stacked-crooked.com/a/61ccd6058b67a140
http://coliru.stacked-crooked.com/a/65a7ac65748b73c9

Cubbi (4774)

This is actually the open LWG issue http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-active.html#1169 -- both zero and UINT_MAX may be valid depending on intepretation.

Last edited on

JLBorges (13770)

I think the question raised is not whether the value stored could be interpreted as either zero or UINT_MAX in the current standard; but whether the behaviour should be changed to become compatible with C.

On the other hand, the result of num_get conversion of negative values to unsigned integer types is zero.
This raises a compatibility issue.
...
THe issues is what to do with -1. Should it match 'C' or do the "sane" thing.

Last edited on

Topic archived. No new replies allowed.