Posts of links from new members

Pages: 12
Is someone running this forum watching?

There's a rash of bots making new accounts, then posting a page full of links.

Sometimes these "bots" post 5 or more of the same message.

A very simple heuristic would stop it.

For the first 10 posts, if there's a link....reject the post automatically.

They'd never appear again.
your heuristic is awful, too prone to generate false positives
How many first time posters include a page of 10 or more links?

Your reaction was just too quick, isn't that obvious?

Sure, one link, maybe two is normal.

10? 20? 50? Like many of those posts have?

Obvious.

Aside from that, in the first 10 posts most new users don't include links. Haven't you noticed?

There isn't any real problem limiting a new users to zero links for the first 10 posts. Just include an instruction.

Bots can't "understand" that, nor would they likely work around it.
Last edited on
Sure, but you didn't say that, you said "a link". New users will sometimes want to link a picture or code. If we clarified it to be 10+ links, that would have virtually 0 false-positives, I agree. Not that something like that would ever get implemented, one of us would have to write a user-side program to do something like that.
I'm working on a client side filter albeit slowly :(

I wonder if there is any AUP/ToS policy with respect to such filter bots?
Last edited on
> in the first 10 posts most new users don't include links. Haven't you noticed?
I don't care about the majority, but there is a small pool that does do it and I don't see a good reason for them to be banned from posting
for example, some links that I saw:
- to a pastebin with the code (because of the character limit here)
- to a file hosting with a .zip of their project (full of useless object files)
- to their github repository (yes, I saw that for new users)
- to a picture of the error message (they don't know better)
- to a picture of their assignment (can't bother to write a couple of lines)
- to a picture of a portion of their code (¿how did they survive this long?)

sure, a lot of those are not justified, but even in those cases, if you simply ban them from posting then they'll just leave to another site.
Modifying the site requires the approval and effort of the site owner.

Not gonna happen IMO. We are beyond lucky the semi-absentee owner keeps the site open and working as it is.

Besides, I take some small amount of pleasure in squashing the spammer insects.

I doubt those paying for the spammer to do the work see a return on their investment here.
There is an account named 'admin' who was obviously able to close threads. The last appearance was on May 21, 2012 at 4:32pm. What ever happens back then, it seems no updates are done ever since....
Is someone running this forum watching?

No.

If this forum had a FAQ, this would have to be the first question on it.
I'm surprised at the design discussion over a means of filtering these nonsense posts given this is a forum for developers.

Can't anyone take a napkin sketch and figure out the details anymore?

Seriously, does anyone really believe that the initial suggestion was the entire design?

Think a little, guys, all it takes to fake out a bot is a warning and instruction for a new user (within a few posts of starting an account, like the first 10 posts).

I've been on a number of forums with these filters, and they are widely deployed.

I get the fact that the developers at this site are MIA...fine, surprising since the documentation region is being updated.

Ah, this topic.

There is an account named 'admin' who was obviously able to close threads. The last appearance was on May 21, 2012 at 4:32pm. What ever happens back then, it seems no updates are done ever since....

Well, I'm not sure about that. This site did need an update for the GDPR, and it got one. That cookies banner wasn't always there. Also, there is C++14 stuff in the reference and C++14 support from cpp.sh.

Seriously, does anyone really believe that the initial suggestion was the entire design?

Well... yes. There was not much left unspecified in your proposals. Your first was to reject newcommer posts with a link in them, then moved on to rejecting newcommer posts with n links in them, with various values of n. A proposal that I don't think anyone would view as complete is "implement a heuristic that rejects posts from newcommers based on the quantity of links in them".

Also, determining who's a newcommer by post count is an insufficient metric for the purposes of spam prevention, at least without a mandatory delay between posts.

I get the fact that the developers at this site are MIA...fine, surprising since the documentation region is being updated.

Is it? It looks stuck on C++14 to me.

That said! There are users who have elevated privileges who can edit parts of the main site. For instance, there's a (semi-hidden and incomplete) FAQ section that Duothomas can edit freely. It's possible someone has those privileges for the reference section.

Otherwise, I do wish this site was better maintained/developed. Personally, I'd be willing to volunteer my time to help, though AFAIK the site admin has mixed feelings about volunteer work.

-Albatross
The site admin will silently clean up some spam issues from time to time.

Well... yes. There was not much left unspecified in your proposals. Your first was to reject newcommer posts with a link in them, then moved on to rejecting newcommer posts with n links in them...


If anyone gets something from this thread other than a complaint about some bot(s) that frequently post a list of links to underground video sites (sometimes 5 or 10 posts with dozens of links each), it should be this one point.

The notion that any proposal of any kind in engineering, be it mechanical, electrical, chemical or software, is never the complete design.

In 40+ years of development work, I have never seen that happen or even expected.

Even in architecture, the initial designs are not the complete designs.

One reason is the simple fact that engineers (if there are actually engineers involved) test, evaluate and then feed that back into the design.

I can't imagine any professional assuming that a napkin sketch proposal is the final design.

Think of it this way:

Say you're working in a company that makes electronic devices, and someone (say someone at the top, the boss) comes up with a proposed idea.

Now, imagine what happens if everyone else hearing or reading this idea takes it as the final, full, complete design and rejects it entirely on the basis that they find flaws, and focus on those flaws as a point of rejection.

How does that move things forward? It doesn't. It stops everything.

That's how government works.

That's now how engineering works. We'd never have the technology we have today if that were how engineers thought about proposed ideas.

Seriously, if you get nothing else out of the exchanges here, get that part. How that exchange develops is key to how an industry based on engineering and design works. It is central to employment and career.

I know in part because I hire, sometimes as a consultant and often as the owner of my own firm.

Notice the one thing that stops any attempt to filter out bots posting junk - management; ownership.

There is nothing, absolutely nothing, which stops the idea of filtering bot behavior from being implemented appropriately. It is done elsewhere to good effect. Anyone starting a brand new account is easily suspect of being a bot. Those with accounts having merely a few posts beyond the first develop a history of behavior and become active, valued members.

Last edited on
I think most spam is cleared up automatically by the repost system, isn't it?

I do know there have been occasions in the last couple of years where disruptive posters have been banned, which suggests the admin is still showing some interest. Unless some other users have been given that power too.
I think most spam is cleared up automatically by the repost system, isn't it?


That's people like me reporting the post, which removes it.
Yes, that's exactly what I meant. When a user with a high post count reports a post from a user with a very low post count, the report system automatically removes the offending post.
I think you've fundamentally misunderstood the context here, Niccolo.

This is not an engineering group with limited participation. This is a high-traffic public forum, where almost anyone can join and participate, and where one needs to decide where best to allocate their time and effort. Typically, this means that high effort posts are rewarded with high-effort responses.

From our perspective, your proposal was about as low-effort as they get. We did not see any iterative process leading up to your initial proposal (as I'd hope there would be). We just saw you say:
A very simple heuristic would stop it. For the first 10 posts, if there's a link....reject the post automatically.

...which sounds like one of the first things someone would think of. In fact, there was no indication that you wanted to iterate on this idea, just an implicit assertion that you thought it was a good idea. It's not unreasonable to take that at face value here.

And, as ne555 pointed out, that heuristic is unfit for purpose. A key requirement of anti-spam is that it needs to have a low false-positive rate. Your proposed heuristic would have one that's absurdly high, and it would interfere unreasonably with normal operation of this forum, unless you expect newcomers to play the www (dot) mysite (dot) com link-filter-bypass game when half of the newcomers don't even read the instructions for each forum.

No, it wasn't the most constructive response. No, there was no acknowledgement that the basic premise of "posts from new accounts with links are much more likely to be spam" is reasonable (it is). However, ne555 is not unjustified in dismissing the proposal outright without discussing how it might be improved, just because of the context in which it was provided. Most people in this thread are probably not here to engineer an anti-spam measure that is unlikely to ever get implemented. If that was the intended topic, you never made it clear.

Relatedly, you then pivoted to saying
How many first time posters include a page of 10 or more links?

And then proceeded to give some reasoning behind your suggestion (along with a more relaxed version where a number of links past some cutoff in a post get the post rejected). That is much more reasonable, and I can't remember anyone saying anything negative about it. Ganado was even positive about it. Personally I tend to agree, though if we're discussing anti-spam, there's probably further discussion warranted about the overall moderation scheme of this forum and how spam prevention would tie into it.

@MikeyBoy
If a new account receives too many reports from older accounts too quickly, they're automatically banned. This has occasionally been abused to get newbies banned, sadly, and occasionally spam accounts get to the point where their posts can't be insta-removed. There's definitely room for improvement there.

-Albatross
Last edited on
I think most spam is cleared up automatically by the repost system, isn't it?

More like my blood, sweat, and tears.

Anyway, a better method would be if they've got 2+ links in their first posts. That would remove well over 90% of spam and stop most false positives. I've only seen a monthly spammer come in with 1 link in their spam.

The algorithm can be greatly enhanced by taking phrases that are always copied and pasted by the spammers so that these posts are automatically detected and deleted. With these two methods, I doubt there would be more than the occasional spam once every few months. The only spammers who'd get away with it are those asking legit questions but are just spammers with the link in their profile.


But still, no one here to implement.
That ^^^ @zapshe 's post

That right there is what I'm talking about.

Not specifically about technique. That fluctuates with opinion.

This kind of stuff exists on web forums all over the place.

What I'm talking about is how zapshe said it, how it flowed from idea to extension.

Instead of declaring the original, short idea was awful and unworkable.

It's not that I take any offense, per se, but that functional discussion works the way @zapshe just posted it.



Last edited on
A better human verification system would be helpful since bots can easily read those stupid number things. (When not logged in, press the "register" button to see the numbers.)
Last edited on
Pages: 12