I need a code review for Caesar cipher

Pages: 123
As opposed to adding an argument string alphabet. Or wstring maybe. This does raise a problem of how to rotate lower and upper case characters separately. Maybe have two alphabets?

Example http://ideone.com/R75P6v
Lo and behold, "transform" doesn't look like it threw up any more. The "rotator" itself could use some polish though. Especially the std::toupper(rot_ab.at(ab.find(std::tolower(c)))); line...

Why would stream iterators be a bottleneck? If you iterate a string, they are just pointers. A file might be more of a problem. Though they have buffering to help.
Lo and behold, "transform" doesn't look like it threw up any more.

In other words, you think it looks better. Whilst I don't.

The "rotator" itself could use some polish though. Especially the std::toupper(rot_ab.at(ab.find(std::tolower(c)))); line...

What's wrong with it?

As opposed to adding an argument string alphabet. Or wstring maybe. This does raise a problem of how to rotate lower and upper case characters separately. Maybe have two alphabets?

I absolutely do not understand how come you are bothered by the alphabet ab being hard-coded, yet you're alright with the use of cctype functions. Or are you?
Last edited on
For both quote 1 and 2. It is generally considered that cutting large expressions into several pieces is a good idea. That is sort of the only reason why things like transform exist. If I could I'd move rotator into a different function, but C++ won't allow partially applied functions (and I can't decide if writing a class is bloated). Anyway, you should agree that, semantically, encryption of one character and repetition of the former are well separable.

3. I am bothered by cctype a bit. That is expressed by my Unicode related ramblings. Though the difference between hard coded alphavet and use of ASCII is the difference between implemented but unused functionality and a welcome improvement.
[...cutting large expressions into several pieces...] That is sort of the only reason why things like transform exist.


I was under the impression that things like transform exist primarily to reduce the chance of error for frequently performed tasks, much like the range-based for.

As the code is intended to show off language features, and not deal with all possible character encodings, trying to generalize to that point seems like a lot more hassle than it's worth. It might be more fitting to write a separate article that addresses your character encoding concerns.

[Edit: spelling]
Last edited on
In my opinion, std::transform() and the lambda are not meant to be separated. Semantically, the lambda is part of the algorithm.
Last edited on
In my opinion, std::transform() and the lambda are not meant to be separated.

That's awfully academic, IMO. It seems to me that transform is a manifestation of the strategy pattern, and it makes sense for a strategy to be separate from the machinery applying it. Aesthetically, I prefer hamsterman's refactored version. Practically, I think either version is perfectly readable and that this is largely a style issue.
@cire, that too, but readability is also a goal (in fact the two are very much related).

Note, I'm nit picking so much in large part because the article says "proficient" (maybe I'm taking that word for more than it means?). Though the Unicode bit is unrelated. I think everyone should feel bad about not using it.

A Unicode article doesn't seem doable since stdlib has no good support for it (I think?) and there must be many third party choices. Note, there already is one article about it: http://www.cplusplus.com/articles/2w6AC542/

Often some well meaning beginner tries to demonstrate what comments are with something like cout << "hello"; //this line prints hello. and is then told that this is not a good way to use comments. I feel that the transform/lambda situation here is very much the same.

@Catfish, so then you do think that std::transform(src_begin, src_end, dest_begin, [ab, rot_ab](char c) -> char { is significantly prettier than for ( ; src_begin != src_end; src_begin++, dest_begin++) {? I guess I'll stop if you do. I would love to hear a rationale behind it though.
I think it's agreeable that it really doesn't matter, and that this is all largely semantics. It all decays down to the same exact thing, considering transform is defined as:
1
2
for(; _First != _Last; ++_First, ++_Dest)
    *_Dest = _Func(*_First);

so this
1
2
auto func = [ab, rot_ab](char)->char { /*....*/ };
std::transform(beg, end, dest, func);

is the same thing as this
 
std::transform(beg, end, dest, func, [ab, rot_ab](char)->char{ /*...*/ });

which is the same thing as this
1
2
3
auto func = [ab, rot_ab](char)->char{ /*...*/ };
for(; beg != end; ++beg, ++dest)
    *dest = function(*beg);


I don't think there's any notable performance hit by doing it one specific way over the next (although i am unaware of the memory requirements it takes to store a function, or the time it takes to allocate space for it, or if that's even necessary when creating pointers to functions.) And I do believe transform() was included as an algorithm to simplify the commonly used method of applying a function to a container, much like ranged based for, as cire mentioned.
Catfish3's preference is equally as readable as hamsterman's in my opinion. It's not like the lambda spans more then a few lines.
Semantics will be the death of us.

EDIT:
Now that i'm into it, i prefer just dealing with ASCII:
1
2
3
4
5
6
7
8
9
10
11
12
template<class InputIter, class OutputIter>
OutputIter CaesarCipher(InputIter beg, InputIter end, OutputIter dest, int shift)
{
    return std::transform(beg, end, dest, [&shift](char &c)->char {
        if(islower(c))
            return (((c-97) + shift) % 26) + 97;
        else if(isupper(c))
	    return (((c-65) + shift) % 26) + 65;
	else
	    return c; //isn't a letter
    }); 
}


That seems to use the least amount of operations.
And I suppose I like putting the lambda in the call to transform. It flaunts C++11, which is newer and therefore more badass. :D
Last edited on
@ hamsterman: std::transform() is more telling more than a for() loop. Its name hints at its goal, then the attached lambda shows how it accomplishes it.
@Thumper, this was never about performance (except for the "find" bit). Also, I think you're confusing semantics with syntax... (Also, it feels a bit as though you don't know ASM. For someone who claims to <3 performance, that should be fixed)

Note also that things I suggested are improvements of least effort. My actual preference is probably http://ideone.com/dAh6tu (kept ascii and find because I'm lazy).

About your code. I do understand the joy of knowing the values of 'a' and 'A', but this could actually be considered obfuscation. And you don't handle negative shifts. Though I do agree (and I have expressed this earlier), if you do want to stick to ASCII, this is a better way to go. However note that this program does not encrypt text, it only encrypts English text.
You're probably right. My major point being: they all accomplish the same thing. They do. So does it really matter how it's lain out? Everyone's entitled to their own personal preference.
And I never claimed to love performance. Well. I do love performance, but I made it very obvious that I don't know all the little bits as to not over qualify myself.

I thought that was the goal? Perhaps I misunderstood it. You can only make a universal cipher for all languages if you know the letters for all languages (and I don't) and create a zero point for them. You can create a caesar cipher for ASCII text, as you can create a caesar cipher for unicode text, and any other standard of data representation. It doesn't make one any less of a caesar cipher than another.
Considering we speak English, and @OP's first post was a cipher for English, i figured it was safe to assume our goal was to critique his cipher for English.

And you're right, i didn't handle negative shifts in that example. Apologies.

EDIT:
Hmm, I do really like the generic caesar cipher in your example. Perhaps that is the best route to take. Makes it portable.
Last edited on
*cough* is lowercase only.
So does it really matter how it's lain out?
Well, if he had written it all on one line, or in some other hideous way, you wouldn't be saying that, right? I can't argue with you if you think I'm being too picky. I just don't think this use of a lambda counts as proficient when it is there to flaunt C++11 rather than because it is needed.

The ASM bit was not there to somehow ridicule you. I honestly think it would do you good.

I keep complaining about Unicode because it is a wonderful thing. I admit that I mostly use English on my computer but it shouldn't be like this. If I use a text processor I expect it to (have a configuration to) handle my language. It is not a thing of good C++, it is a thing of basic usability.
And again, my complaint here is not so much that he used ASCII, but that he pretended not to know what he's using for half of the program. (I hadn't noticed it earlier. It's not only the alphabet that's hard coded. All of the function knows that the alphabet contains lowercase a-z. If alphabet was anything else, the program would probably crash, because he doesn't handle failure of find).
Last edited on
It's not only the alphabet that's hard coded. All of the function knows that the alphabet contains lowercase a-z. If alphabet was anything else, the program would probably crash, because he doesn't handle failure of find
at() should throw an exception.
We think differently about the hard-coded ASCII alphabet. It was added because C++ doesn't have it as a global. It's a convenience. It's not something that the user should interfere with.
at() should throw an exception
yes, that's called a crash.
You see, your code does about 13 more memory accesses for each character than Thumper's. At first I though that's the price for extra flexibility (and complained that you didn't do anything with it). But now I'm entirely lost.
It's the price to pay for code that is easier to understand.
It's the price to pay for not relying on an unseen ASCII table, and magic numbers (disguised as letters).
hamsterman wrote:
The ASM bit was not there to somehow ridicule you. I honestly think it would do you good.
I completely agree with you there. I haven't the time or motivation to do so at the moment, though.

hamsterman wrote:
I just don't think this use of a lambda counts as proficient when it is there to flaunt C++11
Not only to flaunt C++11, although I'll admit, i really like the use of lambda there because it's really aesthetically pleasing to me. It would be equally proficient to give it it's own variable, or define it as another function outright, or create a class that handles it instead of using a lambda. The feature was added and I intend to use it. But, as noted earlier, I'm unaware of performance hits of using lambda functions as opposed to the traditional method of defining a predicate.

hamsterman wrote:
I keep complaining about Unicode because it is a wonderful thing.
I agree completely. ASCII is way outdated when compared to Unicode. C++11 supports Unicode literals and strings, but (as far as I was aware) the STL doesn't completely support Unicode as of right now (please correct me if i'm wrong on this).
It's easier to go about settling for ASCII characters when you're certain English, or another language that uses the same alphabet, is all you'll need.

Catfish3 wrote:
It's the price to pay for not relying on an unseen ASCII table, and magic numbers (disguised as letters).
That's not a particularly good rationale. You should be aware of how the data types you are using are encoded. It's not like ASCII is difficult to understand. It's just an integer assigned to represent a character, which is why you can treat chars as ints.
@Catfish, efficiency to readability is not a very good trade off. Not because readability is unimportant, but because generally you can have both. If you want to hide ASCII arithmetic, add helper functions for it such as,
1
2
3
4
5
6
7
8
9
int char_to_index(char c) {
   if (isupper(c)) return c-'A';
   else return c-'a';
}

char index_to_char(int i, bool uppercase) {
   if (uppercase) return i+'A';
   else i+'a';
}
Note that you already rely on hidden magic of isalpha, isupper, etc.
Last edited on
C++11 supports Unicode literals and strings, but (as far as I was aware) the STL doesn't completely support Unicode as of right now (please correct me if i'm wrong on this).

Define "support Unicode". C++98 (and C89 for that matter) supported Unicode just fine in all library functions (except on Windows, which didn't provide a locale), C++11 added locale-independent requirements with which MS complied, but they are so minimal that a library such as boost.locale becomes necessary anyway.

In any case, Ceasar is not defined for non-Latin alphabets, so making this support non-ASCII text is trivial: just template on the character type and ctype::narrow() before comparison with your translation table.

Or do you think it should accept U+0FF2 - U+FF5A (fullwidth latin) as well? I wonder if there's an ICU/boost.locale function that would unify them with latin..
hamsterman wrote:
Note that you already rely on hidden magic of isalpha, isupper, etc.

Those functions aren't your code to scrutinize and change.

hamsterman wrote:
If you want to hide ASCII arithmetic, add helper functions for it such as,

The problem is that you "hide" them in your code. Which someone will still have to read and understand...

Cubbi wrote:
In any case, Ceasar is not defined for non-Latin alphabets, so making this support non-ASCII text is trivial: just template on the character type and ctype::narrow() before comparison with your translation table.

Could you give more detail, please?
Pages: 123