Underestimated Programming Problems

I was watching a talk from C++Now by Niko Matsakis, and at around here
https://youtu.be/lO1z-7cuRYI?t=3608 he mentions the URL parser for Firefox, and how it's a much more complicated piece than one would likely imagine. I find situations like these interesting and was trying to think of various programming problems that people will typically underestimate using only intuition. I'm curious what other member's of this forum know of these kind of problems whether it's something they've heard of or know from personal experience.
IMO, anything user-facing tends to be more complicated than intuition suspects.
User input doesn't follow design contracts.

Example:
Maybe you have a regular expression engine in the back-end. It's sometimes possible to craft an input which causes such an engine to break.

Try to run this Perl code
1
2
3
4
#!/bin/perl
print("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" =~ 
      ("a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?".
       "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"))

The code above tries to match the string "aaaa....aaaa" against "a?a?....a?a?aaaa...aaaa".

You'll be waiting forever, or close to it (it takes exponential time). Fundamentally this is not necessary: there's a well-known linear time algorithm that could work for the above, but PCREs support features for which a fast algorithm probably doesn't exist (i.e., backreferences -- the problem's NP-complete).

Now if you ever want to process user-input with PCREs, or more directly, take a regular expression as input, you have to make sure that you'll never run a query like this. Tricky!

Also, Moravec's paradox:
https://en.wikipedia.org/wiki/Moravec%27s_paradox


Last edited on
Writing a regex to validate eMail addresses.
Communication with other devices. Especially when the external devices have permanent output.

Stopping the communication by the user, error handling, one request may get multiple responses, timing, etc.
@mbozzi
I've seen people face similar issues on StackOverflow before actually!
Also, thanks for the link to Moravec's paradox, I hadn't heard of that before but it's certainly interesting.

@coder777
I've seen that personally a bit. The company I intern at does a lot of work with devices on network clusters. I myself don't get my hands dirty with it much however.
Everything that deals with dates and times (and timezones) in a more involved manner is a nightmare.
The average person usually has no idea how many special cases there are.
"How many days between may 1260 and june 1855 ?" Head->Table!

Also UTF8/16 and multi-byte strings. Total PITA.
Character case conversion in a multi language environment. That's punishment.
Topic archived. No new replies allowed.