I need a code review for Caesar cipher

Pages: 123
I'm planning to add this to the Source Code section, especially because the Caesar cipher seems popular as a homework.

I tried to write high quality, bare C++11 code.

Now is the time to shoot it down. Be ruthless!
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
///
/// @file
/// @author Julius Pettersson
/// @copyright MIT/Expat License.
/// @brief Caesar cipher implementation.
///
/// This program demonstrates proficient usage of C++11's features to implement the Caesar cipher.
/// It was written with Doxygen comments.
///
/// http://en.wikipedia.org/wiki/Caesar_cipher
/// http://en.cppreference.com/
/// http://www.doxygen.org/
///

#include <algorithm>
#include <cctype>
#include <cmath>
#include <cstddef>
#include <fstream>
#include <iomanip>
#include <ios>
#include <iostream>
#include <iterator>
#include <sstream>
#include <string>

///
/// @brief Encrypts letters in the range [`src_begin`, `src_end`) using `shift` and writes to `dest_begin`.
/// @param src_begin    source range beginning
/// @param src_end      source range end
/// @param dest_begin   destination range beginning
/// @param shift        desired shift of the alphabet
/// @return `OutputIterator` to the element past the last element written.
///
template <typename InputIterator, typename OutputIterator>
OutputIterator caesar_cipher(InputIterator src_begin, InputIterator src_end, OutputIterator dest_begin, int shift)
{
    const std::string ab("abcdefghijklmnopqrstuvwxyz"); // AlphaBet
    std::string rot_ab(ab); // ROTated AlphaBet

    shift %= static_cast<int> (ab.length()); // bring the shift within the length of the alphabet

    if (shift < 0)
        std::rotate(rot_ab.rbegin(), rot_ab.rbegin() - shift, rot_ab.rend());
    else
        std::rotate(rot_ab.begin(), rot_ab.begin() + shift, rot_ab.end());

    return std::transform(src_begin, src_end, dest_begin, [ab, rot_ab](char c) -> char {
        if (std::isalpha(c))
        {
            if (std::isupper(c))
                return std::toupper(rot_ab.at(ab.find(std::tolower(c))));

            return rot_ab.at(ab.find(c));
        }

        return c;
    });
}

///
/// @brief Prints usage information and a custom message.
/// @param s    custom message to be printed
///
void print_usage(const std::string &s = "")
{
    std::cerr << "Usage example:\n";
    std::cerr << "\tprogram.exe input_file output_file shift\n\n";
    std::cerr << "Where `input_file' and `output_file' are distinct files, and\n";
    std::cerr << "`shift' is an integer representing the desired alphabet shift.\n";

    if (!s.empty())
        std::cerr << "\nERROR: " << s << '\n';

    std::cerr << std::endl;
}

#ifdef DEBUGGING

#include <cassert>
#include <cstring>

///
/// @brief Debugging program entry point.
///
int main()
{
    std::string s = "Wkh Txlfn Eurzq Ira Mxpsv Ryhu Wkh Odcb Grj";

    caesar_cipher(s.begin(), s.end(), s.begin(), -3);
    std::cout << s << std::endl;
    assert(s == "The Quick Brown Fox Jumps Over The Lazy Dog");

    char ca[] = "Exxego ex srgi!";

    caesar_cipher(ca, ca + std::strlen(ca), ca, -56); // note: -56 modulo 26 == -4
    std::cout << ca << std::endl;
    assert(std::strcmp(ca, "Attack at once!") == 0);
}

#else

///
/// @brief Actual program entry point.
/// @param argc             number of command line arguments
/// @param [in] argv        array of command line arguments
/// @retval EXIT_FAILURE    for failed operation
/// @retval EXIT_SUCCESS    for successful operation
///
int main(int argc, char *argv[])
{
    if (argc != 4)
    {
        print_usage("Wrong number of arguments.");
        return EXIT_FAILURE;
    }

    std::ifstream input_file(argv[1], std::ios_base::binary);

    if (!input_file.is_open())
    {
        print_usage(std::string("input_file `") + argv[1] + "' could not be opened.");
        return EXIT_FAILURE;
    }

    std::ofstream output_file(argv[2], std::ios_base::binary);

    if (!output_file.is_open())
    {
        print_usage(std::string("output_file `") + argv[2] + "' could not be opened.");
        return EXIT_FAILURE;
    }

    int shift;

    if (!(std::istringstream(argv[3]) >> shift))
    {
        print_usage("shift conversion error.");
        return EXIT_FAILURE;
    }

    try
    {
        input_file.exceptions(std::ios_base::badbit | std::ios_base::failbit);
        output_file.exceptions(std::ios_base::badbit | std::ios_base::failbit);

        std::istreambuf_iterator<char> src_begin(input_file);
        std::istreambuf_iterator<char> src_end;
        std::ostreambuf_iterator<char> dest_begin(output_file);

        caesar_cipher(src_begin, src_end, dest_begin, shift);
    }
    catch (const std::ios_base::failure &f)
    {
        print_usage(std::string("File input/output failure: ") + f.what() + '.');
        return EXIT_FAILURE;
    }

    return EXIT_SUCCESS;
}

#endif 
Last edited on
Semi-shameless bump, after moving here from Generic.
It may have been better to post this in my Source Code thread, but I thought, why ruin whatever it derailed into?

If you see any bugs, or have ideas for a better way to do this, let me know. Thanks.
Might make ab static in caesar_cipher, but it looks good to me.
Nothing but this: Why not pass argv[0] to print_usage so that it can use that instead of "program.exe"? I usually do it like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
static void show_usage(const char* argv0)
{
    std::cout << "Usage: " << argv0 << " [options] file" << std::endl;
    // ...
}

int main(int argc, char* argv[])
{
    if (argc < 2) {
        show_usage(argv[0]);
        exit(EXIT_FAILURE);
    }
}
Might make ab static in caesar_cipher, but it looks good to me.

Could you please explain what the benefit would be, in doing that?

Nothing but this: Why not pass argv[0] to print_usage so that it can use that instead of "program.exe"?

Because argc could be 0, and argv[0] could be NULL. Otherwise that's how I would've done it, too.
cire wrote:
Might make ab static in caesar_cipher, but it looks good to me.
Catfish3 wrote:
Could you please explain what the benefit would be, in doing that?


The string would only be constructed once per program instance instead of once per function invocation. Since this program doesn't invoke the function more than once (outside of the debug driver,) it doesn't make any practical difference.
Last edited on
Because argc could be 0, and argv[0] could be NULL. Otherwise that's how I would've done it, too.


Wouldn't argc always be at least 1 on successful entrance?
For reference, 3.6.1.2 says:
...argc shall be the number of arguments passed to the program from the environment in which the program is run. If argc is nonzero these arguments shall be supplied in argv[0] through argv[argc-1] as pointers to the initial characters of null-terminated multibyte strings (ntmbss) (17.5.2.1.4.2) and argv[0] shall be the pointer to the initial character of a ntmbs that represents the name used to invoke the program or "". The value of argc shall be non-negative. The value of argv[argc] shall be 0.

Wouldn't argc always be at least 1 on successful entrance?

Not necessarily. If you're on Linux, check out the exec() functions.
You should be able to pass zero arguments while successfully starting the new process replacing the "current process image with a new process image". (My Unix is rusty.)

http://linux.die.net/man/3/exec
Last edited on
On a strictly compliant POSIX implementation (POSIXLY_CORRECT=1 for GNU), argc is guaranteed to be >= 1. On any other system, argc can be any positive integer.
closed account (3hM2Nwbp)
@ line 48, why aren't you capturing the two variables as references in your lambda? It doesn't appear to me that you're mutating either of them within your lambda expression.

[&ab, &rot_ab](char c) -> char

alternatively,
[&](char c) -> char

to capture them by reference implicitly. Shaving off the microseconds, we are.

Now I'm just nitpicking, but you could add a bit of documentation for what happens if a shift larger than the alphabet size is provided. If I was simply given the function header and that documentation, I'd expect the function to puke if I passed in 100,000,000 and as such, I'd be truncating the shifts in my own user code before passing them.
Last edited on
@ line 48, why aren't you capturing the two variables as references in your lambda? It doesn't appear to me that you're mutating either of them within your lambda expression.

Not knowing of a way to capture by constant reference, I reached the conclusion that capturing by reference, just to avoid copying two 26 character std::strings, would be a misuse of the feature.

Now I'm just nitpicking, but you could add a bit of documentation for what happens if a shift larger than the alphabet size is provided.

Thanks, I'll do that soon.

If I was simply given the function header and that documentation, I'd expect the function to puke if I passed in 100,000,000 and as such, I'd be truncating the shifts in my own user code before passing them.

I thought that since I didn't document any constraints for the shift value, the user should assume that the function works for any value. Which it does. But I understand where you're coming from.
closed account (3hM2Nwbp)
catfish3 wrote:
Not knowing of a way to capture by constant reference, I reached the conclusion that capturing by reference, just to avoid copying two 26 character std::strings, would be a misuse of the feature.


I hadn't thought of capture by const-reference until you mentioned that...it seems very odd that the standards committee overlooked that. I read through the draft papers again to make sure that I didn't miss it, but it's definitely not there. I guess I still don't see how a reference capture would be considered misusing a feature in this case. At any rate, const-reference capture should definitely be added to the standard unless there's a damn good reason not to.
I wrote:
It's odd to copy the alphabet and rotate it in memory or to do find on an alphabet that matches ASCII subset (without providing any way to give another alphabet or use unicode), but I'm bugged the most by the combination of std::transform and a lambda which is possibly less concise than a for loop.
It's odd to copy the alphabet and rotate it in memory or to do find on an alphabet that matches ASCII subset

I am interested in sample code illustrating a more intuitive approach.

(without providing any way to give another alphabet or use unicode)

Honestly, I never touched Unicode, but I feel it has no place in this program. However, if UTF-8 encodes using eight bits, so I could add support for it while keeping only one version. Have you any comments in that regard? I don't think the functions from cctype will be usable anymore?

but I'm bugged the most by the combination of std::transform and a lambda which is possibly less concise than a for loop.

Here I disagree. If one is comfortable with std::transform(), it should be easier to understand than a for() loop iterating two ranges at the same time.
Again, post sample code so that I see better what you mean.
Last edited on
The more "intuitive" approach is
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
template <typename InputIterator, typename OutputIterator>
OutputIterator caesar_cipher(InputIterator src_begin, InputIterator src_end, OutputIterator dest_begin, int shift) {

    for ( ; src_begin != src_end; src_begin++, dest_begin++) {
        char c = *src_begin;
        
        if (std::isalpha(c)) {
            char a = std::isupper(c) ? 'A' : 'a';
            int index = (c - a + shift)%26;
            if(index < 0) index += 26;
            *dest_begin = a + index;
        
        } else *dest_begin = c;
    }
}


The thing to notice is that rot_ab.at(ab.find(c)) turned into (c -'a' + shift)%26 + 'a' and that a third of the code is gone.
Note, I'm not trying to encourage ASCII arithmetic. My point (one third of it) is that you wrote a more general piece of code and then restricted it to what I wrote above.
As for the in memory rotation, it looks odd (to precompute a trivial computation) but I can't find a real reason why that would be bad, so I guess it isn't...

As for Unicode, I don't really know how C++ deals with it. It seems that external libraries are used for that.
In general you shouldn't say that Unicode has no place in text manipulation program though. You may be from UK or US, but not everybody else is.
Also, you have some misconceptions about Unicode encodings. I suggest you read http://en.wikipedia.org/wiki/UTF-8#Description

As for "transform", compare
1
2
3
std::transform(src_begin, src_end, dest_begin, [ab, rot_ab](char c) -> char { //...
//with
for ( ; src_begin != src_end; src_begin++, dest_begin++) { //... 
I'd say they are very close. Both utterly disgusting...
Also, I suppose more people are comfortable with "for" than with "transform". Finally, I'll add that if you moved the rotation of one character into another (properly separated) function (lambda is fine as C++ function passing is such a pain), I'd be very much for using "transform". The problem is when you take things intended to make code clean or concise and achieve neither.
My point (one third of it) is that you wrote a more general piece of code and then restricted it to what I wrote above.

The D programming language has a global ASCII alphabet string.
If C++ had something similar, and I used it instead of handcrafting ab, would you have made the same observation?

Note, I'm not trying to encourage ASCII arithmetic.

Neither am I. Then could you rewrite your function so that it doesn't rely on it?

Finally, I'll add that if you moved the rotation of one character into another (properly separated) function (lambda is fine as C++ function passing is such a pain), I'd be very much for using "transform". The problem is when you take things intended to make code clean or concise and achieve neither.

This again revolves around ASCII arithmetic, doesn't it?
At some point I even thought about using an std::map<char, char>. But that doesn't actually simplify anything.
i wish i knew how to install c++11, then i could run it :/ happy xmas
quote 1. Yes, I would.

quote 2. I'm not criticizing use of an alphabet string. I'm criticizing use of an alphabet that is hard coded to match with ASCII characters. ASCII arithmetic is bad not because it is somehow unsafe, but because it's ASCII (and hard to change) (and a bit unreadable).

quote 3. No, this bit is about syntax only. I guess moving the lambda into its own variable would be a step forward.

By the way, have you considered doing binary search on the alphabet? Linear is kind of odd (even though you'd only gain a few compares with binary). I think it's okay to ask for it to be sorted.
I'm not criticizing use of an alphabet string. I'm criticizing use of an alphabet that is hard coded to match with ASCII characters.

As opposed to matching with what?

No, this bit is about syntax only. I guess moving the lambda into its own variable would be a step forward.

I do not understand what you mean. Could you give an example, please?

By the way, have you considered doing binary search on the alphabet? Linear is kind of odd (even though you'd only gain a few compares with binary). I think it's okay to ask for it to be sorted.

That's an interesting idea, but I think it's overkill. The real bottleneck ought to be the stream iterators, which read and write byte by byte (I think). And even that is irrelevant for the small text files this program is supposed to be used on.
Last edited on
Pages: 123