I need a code review for Caesar cipher

Pages: 1 23

ctype::narrow() before comparison with your translation table.

Could you give more detail, please?

ctype::narrow() (like its C analog, wctob()), converts a wide character into locale-defined multibyte (UTF-8, GB18030, what have you) if the result fits in one byte.
Which means, for Unicode, that U+0041 (the letter 'A') narrows into 0x41, but, for example, U+FF21 (the letter 'Ａ') fails to narrow because its UTF-8 representation is three bytes (0xEF 0xBC 0xA1).. Which is why I mentioned the fullwidths.

Last edited on

cire (8284)

If you want to hide ASCII arithmetic, add helper functions for it such as,

He's not hiding ASCII arithmetic. He's not using ASCII arithmetic. And because he's not, his code will work with the EBCDIC character encoding (which is, thankfully, vanishingly rare,) whereas code that relies on ASCII arithmetic will not. I've been confused about people saying "hardcoded ASCII alphabet" when it was clear to me that he purposefully avoided making the assumption he was dealing with ASCII.

IIRC, the only character set "arithmetic" guaranteed to work by the C++ standard is that involving the characters '0' to '9'.

hamsterman (4538)

@Cubbi, indeed, apparently there is support for Unicode somewhere in there. Could you point to a tutorial on the subject? They seem scarce and I don't get it at all.

Ceasar is not defined for non-Latin alphabets

Not sure where you got that definition. Regardless, Latin alphabet does not mean English alphabet. For instance, my own aąbcčdeęėfghiįyjklmnoprsštuųūvzž is a Latin alphabet, but I'm guessing the "narrow" trick won't work here.

Catfish3 (666)

@ hamsterman: maybe this is relevant somewhat?
http://en.wikipedia.org/wiki/Combining_character

Cubbi (4774)

Ceasar is not defined for non-Latin alphabets

Not sure where you got that definition.

Granted, I don't have Svetonius on my bookshelf, but I'm pretty sure that was the language Caesar used

aąbcčdeęėfghiįyjklmnoprsštuųūvzž

Lithuanian, hm? That's easy, you can still use shift ciphers because all your collation units are single-character. How would you generalize Caesar to Czech alphabet, which is also based on latin, but has a two-character collation unit between the letters h and i: a á b c č d ď e é ě f g h ch i í j k l m n ň o ó p q r ř s š t ť u ú ů v w x y ý z ž?

hamsterman (4538)

By "Latin alphabet" you could be either referring to the actual Latin alphabet (which does not exactly match English alphabet and is thus not used in this thread) or a family of derived alphabets (for which your suggestion to use ctype::narrow, I assume, wouldn't work). That's all I wanted to point out. In general, I don't see why any alphabet would be a problem - some order can always be enforced and that's all this cipher asks for.
As for Czechs, I don't have a good enough understanding of things to see how much of a problem this is. Is there no function get_me_the_next_biggest_collation_unit_from(char* this_string) anywhere?
It's odd to have 'ch' in the alphabet though... We have a somewhat special sound for ch too, but why add a new character when c and h are already there.

Cubbi (4774)

hamsterman wrote:
By "Latin alphabet" you could be either referring to the actual Latin alphabet (which does not exactly match English alphabet and is thus not used in this thread) or a family of derived alphabets (for which your suggestion to use ctype::narrow, I assume, wouldn't work).

I really meant the 26-letter ISO basic Latin alphabet, although it's true that I haven't seen any sort of definitive reference. I mostly see Caesar as a generalized ROT13, which is for 26-letter alphabets by definition.

ctype::narrow() works just fine for Lithuanian (if you feel like temporarily resurrecting the ISO-8859-4 locale to provide the conversions from UCS4), but indeed, there is no standard C++ way to obtain the alphabet from a given locale. If you're going to generalize that way, you're gonna need ICU with its "exemplar sets". I think a more useful generalization would be to provide a function that translates elements of a sequence according to a predefined table (the D language had that function, but they dropped it, apparently)

Last edited on

Catfish3 (666)

About the example code, I'm thinking about replacing The Ugly:

    const char ca[]{"Sqjzanxwn!"};
    char       *pc{new char[sizeof ca]};

    caesar_cipher(std::begin(ca), std::end(ca), pc, 4);
    std::cout << pc << '\n';
    assert(std::strcmp(pc, "Wunderbar!") == 0);
    delete[] pc;

with:

    const char              ca[]{"Sqjzanxwn!"};
    std::unique_ptr<char[]> upc(new char[sizeof ca]);

    caesar_cipher(std::begin(ca), std::end(ca), upc.get(), 4);
    std::cout << upc.get() << '\n';
    assert(std::strcmp(upc.get(), "Wunderbar!") == 0);

Now, would I be going too far? (Also, tell me if I'm misusing the smart pointer.)

Cubbi (4774)

with naked pointer, that code leaks if ceasar_cipher or operator<< throw an exception.

Catfish3 (666)

Indeed, thanks.

Catfish3 (666)

http://cplusplus.com/forum/general/89081/

Looks like there's a bug in caesar_cipher().
The code below fails when char c is part of the extended ASCII table, with values above 0x7F.
And string::at() throws an exception.

Any suggestions for fixing this? Thanks.

    return std::transform(src_begin, src_end, dest_begin, [ab, rot_ab](char c) -> char {
        if (std::isalpha(c))
        {
            if (std::isupper(c))
                return std::toupper(rot_ab.at(ab.find(std::tolower(c))));

            return rot_ab.at(ab.find(c));
        }

        return c;
    });

Last edited on

hamsterman (4538)

I'm compelled to say "I told you so", even though that's not exactly what I told you.
You could (programmatically) rework your alphabet to contain all of these chars. Or you could add code that deals with failure of find. Something like

if (std::isalpha(c)) {
   bool upper = std::isupper(c);
   char ch = std::tolower(c);
   int pos = ab.find(ch);
   if (pos != std::string::npos) {
      ch = rot_ab.at(ch);
      if (upper) return std::toupper(ch);
      else return ch;
   }
}
return c;

(I didn't look up and don't remember stuff, so that might be a bit wrong. Take it as pseudocode).

I still think you should quit the alphabet...

Last edited on

cire (8284)

The code below fails when char c is part of the extended ASCII table, with values above 0x7F.

http://stackoverflow.com/questions/6693654/isalpha-giving-an-assertion

You mean values below 0, maybe?

Catfish3 (666)

Thanks for the link, cire.

You mean values below 0, maybe?

I thought it's unspecified whether char is signed, or unsigned.

Also I'm not happy with adding that unsigned, as it looks rather misplaced, but oh well.
The other way would've been adding a static_cast<unsigned char> in std::isalpha(). I can't decide which is worse.

Topic archived. No new replies allowed.

Pages: 1 23

C++

Forum

I need a code review for Caesar cipher