Seb, I examined your wiki to augment my spammer database, then I unleashed my robotic minion on your wiki. All ~1500 pages in your wiki should shortly be spam-free.
-- RichardP
Wow, my wiki now feels so... clean! Dude, you rock. Thanks! -- SebPaquet
Thanks Seb. By the way, about 400 of the 1500 pages in your wiki appear to have been created by spammers. I've included a complete list of pages created by spammers at the end of [my page] on your wiki. You might consider using the administrator edit/delete command to remove those pages.
-- RichardP
Yes Richard, go ahead and run your bot on my wiki hourly if you like. -- Seb
Hey, while you're at it, I've got another one that is quite dirty: http://www2.iro.umontreal.ca/~paquetse/cgi-bin/wiki.cgi - let me know if you could use a list of good revisions to prime the pump.
Hi Richard!
I guess you're the one who cleaned Lee's wiki in the last couple of days? I was pleasantly surprised to see it cleaned (after being really annoyed to see it back online). Lee's wiki seems to be a very good source for spammer's IP addresses as it doesn't seem to be used for any other purpose anymore.
I'm really interested in your bot's code. Should I be watching this page? Or is there another place where you might announce it?
-- [Manni]
Manni, you're the operator of the [Cleaner] bot that was keeping Lee's abandoned wiki spam free, right? In any case, yes - after I noticed that Lee's wiki was back online, that it was being swamped with spammers again and that you apparently hadn't noticed yet, I went ahead and started sweeping Lee's wiki. Are you interested in starting Cleaner back on Lee's wiki? If so, I'd definitely be more than happy to have my bot stop cleaning Lee's wiki, particularly since it is probably not a good idea to have two bots patrolling the same wiki. Since your Cleaner bot is based on a whitelist it is bound to be more effective in suppressing spam on Lee's wiki with much less maintenance than my own bot (at the cost, of course, of stomping over any changes a legitimate editor might make until Lee returns).
With regards to my code, up to now Phil is the only one who had expressed any interest in the code, so I was just planning on announcing the release here. If you're also interested, you're welcome to a copy as well. I'd be pleased to have your input, since you've been tackling the wiki spam problem longer than I have.
-- RichardP
Richard!
I certainly wouldn't mind your bot doing the cleaning on Lee's wiki. Yes, my bot is based on a whitelist. I'd just have to add your name and I could let it loose again. Problem is: I'm one of those conservative folks that turn off their computer during the night.
I'm interested in your code because your bot seems much better than mine (it can check content, e.g.). What language did you use?
Oh. And in case you are interested, here are two more wikis that seem to be under constant attack: [know-how wiki] and [Kayak wiki]
-- [Manni]
Manni, I'll clean those two wiki's soon.
-- RichardP
I just noticed that Lee's wiki was suffering from a UseMod hickup again that makes it impossible to save any changes. The trick is to perform the unlock action.
-- Manni
Thanks for the tip Manni. I noticed that my bot was reporting that its edits were unsuccessful due to a software error involving locks, but I didn't know what to do about it. I really appreciate you letting me know how to fix it.
-- RichardP
I had to do some extensive research on the usemod wiki. The information is there, but hidden quite efficiently.
-- Manni
(Discussion about bot comment style has been moved to /CommentStyle...)
http://www.voght.com/cgi-bin/pywiki?KmWiki
Thanks for your help, and sorry for sending a similar request to Use MOD. -- DenhamGrey
Further thoughts on the PyWiki? issue moved to /TheCode
Hi, RichardP. The Kwiki crew would love to adapt for [Kwiki]. Is the code available somewhere?
-- AdinaLevin
Adina, the first release of the code will occur shortly. There are still a few nagging issues I'm trying to resolve. However, from looking at Kwiki, I am not so sure that Kwiki would benefit from the approach my bot uses. Kwiki appears to provide anonymity to those who want it (i.e. it doesn't show the source IP address of anonymous edits - they all appear as AnonymousGnome?. Without source IP addresses my approach of examining both source IP addresses and external links provides no benefit over the more organized content-only effort at [BannedContent]. You might consider looking at either [spamclean.py] or [BannedContentBot].
I mostly intended my bot as a temporary solution until wiki developers include these kind of features in their wiki packages. I do like Kwiki's plug-in architecture, it makes it easier to incorporate anti-spam features directly into the wiki. I believe such an approach is likely to provide a better long term solution then an external anti-spam bot. It might be possible to adapt my code to work on the server side as an administrator's tool. If Kwiki's internals record the source IP address for anonymous edits, then that would be the approach I recommend you explore.
-- RichardP
BTW Richard, if it isn't going to take up too much more bandwidth / time, any chance of pointing WikiMinion at the Optimaes wiki? : http://www.nooranch.com/synaesmedia/optimaes/optimaes.cgi
cheers
-- PhilJones
Phil, I've added Optimaes wiki to the list it visits. It is very easy for me add wiki's that are currentlly clean to the list, it is more of a challenge to start cleaning wiki's that are already quite dirty. Cleaning an already dirty wiki generally requires me to examine the wiki's historical edits and use the spam I find their to augment my database of spam sources and domains. If I don't do that work, then the bot will frequently do more damage then good.
-- GoJoMo
GoJoMo, the bot doesn't currently support HTTP authentication, simply because your wiki is the first I've heard of that requires it. I don't see any reason why I can't add that feature. I'll take a look at the crawler.archive.org wiki and let you know. Who should I contact about getting a username/password?
-- RichardP
Thanks for the offer to take a look. Email me at archive dot org for access info. The auth-challenge is only on edit-submits. Our latest versions are currently clean of spam, the most recent attack came from [195.175.37.xxx]. An earlier offender was CBL217-132-240-191.bb.netvision.net.il, though I think that their revs have all rotated out of the UseMod history by now.
-- GoJoMo
I've updated WikiMinion to support [Basic HTTP Authentication]. It now supplies basic HTTP credentials on all requests to the Heritrix Wiki. I created a fake spam link on the SandBox page and the bot successfully removed it. I've added Heritrix Wiki to the list of wikis visited by the bot. It will clean spam from your wiki approximately every two hours.
-- RichardP
Hi Richard -- Brian found out about your WikiMinion from Seb Paquet. We were wondering if you would be willing to unleash your WikiMinion on our [UBCWiki]. We would be eternally grateful! Thank you so much.
-- BrianLamb? and MichelleChua?
Brian and/or Michelle, I've instructed my spam-fighting automaton to clean [UBCWiki]. It will return approximately every two hours.
-- RichardP
It's. So. Beautiful! Thank you so much, Richard. :) Brian and I are wondering if there is anything we can do to contribute to WikiMinion? Thanks again!
-- Michelle & Brian
Michelle & Brian, WikiMinion is definitely still a work in progress that is currently being tested. It would be helpful if you let me know if you notice it make a mistake. In particular, I would appreciate being notified if you see it regularly overlook a certain spammer. Similarly, let me know if you happen to notice that it has chosen the wrong revision when attempting to revert a spammed page back to a clean page.
-- RichardP
Grrr! Bastards have found SdiDesk wiki too :-(
If it's no trouble, Richard.
Thanks
-- PhilJones
Phil, I've added it to the list. Although, isn't SdiDesk using OddMuse? I think OddMuse natively supports a BannedContent? option that might work to keep out spammers.
-- RichardP
Robin, the île sans fil wiki appears to be using the MoinMoin wiki software and, unfortunately, WikiMinion doesn't currenly support MoinMoin. It looks easy to add support for MoinMoin, so I'd be happy to start cleaning île sans fil wiki, assuming you are willing to wait a couple of days. However, I have to ask, why do you need WikiMinion to clean your wiki? It is my understanding that MoinMoin already includes an excellent anti-spam package as a standard feature. For details see [MoinMoin:AntiSpamGlobalSolution]. If you are not the administrator of the île sans fil wiki, perhaps you could refer the administrator to that page and ask him or her to install the anti-spam feature described on that page.
-- RichardP
Update: Robin didn't mention it, but it appears the île sans fil wiki folks installed MoinMoin's standard anti-spam package shortly after the previous note was posted here.
-- RichardP
Richard, I'm seeking some help over at http://www.kayakforum.com/cgi-sys/cgiwrap/guille/wiki.pl I'm being overwhelmed. NickSchade?
Nick, I've brought your wiki to the attention of my robotic minion. All 1400+ pages should shortly be clean of spam. It will return approximately every two hours to stomp any additional spam in the future.
-- RichardP
Richard! Here are two wikis that need cleaning, but I won't be able to run my bot in the next 8 days. Could you unleash WikiMinion? http://www.hacklabs.org/wiki.pl?Cambios_Recientes and http://cyber.law.harvard.edu/xdev/cgi-bin/wiki.pl?RecentChanges
-- Manni
Manni, I've added both TuesdayWiki and Hacklabs to the list of wikis regularly visited by WikiMinion. However, I am a little concerned about the Hacklabs wiki. Your bot Cleaner is much quicker than WikiMinion. WikiMinion tries very very hard to always do the right thing, in consequence it is easily outperformed by spam bots. Of course, WikiMinion is very patient and will tirelessly continue to cleanup after a spammer, even if it falls way behind. I am, however, a little concerned about the spammer on Hacklabs who is apparently willing to both create a thousand new spam pages as well as edit individual pages thousands of times.
-- RichardP
Thank you, Richard. I'll be offline for most of this week, my machine will be off and thus my Cleaner bot won't be on duty. Since the Hacklabs wiki is an unmodified version of UseMod that doesn't protect its kept pages, the cleaning is merely symbolic anyway. And it keeps those spammers busy. Soon, this one will have the username 'RichardP'. just like the latest spam was from 'Cleaner'. I think that the Chinese spammer creating all those nnnnn.org pages has stopped doing this. At least for now. Our friend with the strange posting habits, however, is still busy as hell and on that wiki you can see that it is our old friend, the Russian cigarette spammer. Most importantly, I think that ThoughtStorms goes to show just how good WikiMinion is and that it is capable of also cleaning this sledgehammer spam. -- Manni
OK. This is scary. His username now actually is 'WikiMinion' and he switched to a zombie in my network. So the Cleaner bot would have a hard time figuring out whether it was looking at spam or not. In fact, it wouldn't be able to tell the difference. -- So, Mr. Spammer! Since you are obiously reading this, why not unlurk and tell us a little bit about yourself? -- Manni
Hello.
I sometimes come across some heavily spammed wikis. I wonder if you would be taking a look at them and possibly clean them up (just one time or on an ongoing basis). -- WikiTomos
Hi WikiTomos. Sure, by all means, if you make mention of the wikis here I'll see if WikiMinion can help. There are, however, a couple of caveats.
First, WikiMinion doesn't support that many wiki engines (see the compatibility chart). If the wiki uses an engine that isn't on the list I'll still consider adding support for it, but it might take a while and not all wiki engines support the features that WikiMinion requires to reliably despam a wiki.
Second, I prefer to obtain the permission of the administrator before running WikiMinion on a wiki (it can generate a lot of traffic when cleaning up after a major spam run). For most wikis that WikiMinion is cleaning the wiki admin has given me explict permission to use WikiMinion on their wiki. However, I'll unleash WikiMinion on a wiki if attempts to contact the administrator have failed and there are no objections from the regular users of the wiki. For example, before beginning to clean the News Monster wiki on which I've seen you, I attempted to contact Kevin Burton (the NM admin) several times to obtain permission. Unfortunately, he never responded.
Finally, as I mentioned above, WikiMinion tries very very hard to always do the right thing and thus it is easily outperformed by spam bots. Because of this, I don't try to use WikiMinion to despam a wiki if the wiki has so much spam that all of the valid content and been lost due to it being forced out of the 'kept pages'. For those kinds of wikis I recommend you ask Manni to clean them with his Cleaner anti-spam bot.
-- RichardP
Thank you very much for your positive responce. I understand the compatibility and other technical issues, and I find your stance very good and reasonable. And if it saves some of the valuable wikis, I am more than happy to contact wiki admins.
How do you think about this MoinMoin, for example: markov.music.gla.ac.uk/cmt-wiki/RecentChanges? ?
I will try to contact admin if you would consider deploying your WikiMinion.
-- WikiTomos
Ouch, you are right, the Centre for Music Technology Wiki was beginning to lose content due to the influx of spam. I've never seen a wiki running a version of MoinMoin that old before (version 1.0) - I had to extend WikMinion?'s code since that really old version of MoinMoin doesn't support either a revert action like newer versions of MoinMoin or the editing of previous page revisions like UseMod (I added code that did a revert by using the raw action to retrieve a previous revision and then an edit action to update the page). I've gone ahead and swept the wiki clean and added it to the list of wikis visited by WikiMinion. I'd appreciate if you could make an attempt to contact the admin. Ideally the admin probably should either update to the latest version of MoinMoin, lock the wiki to prevent further spam, or shut the wiki down if they don't have the time or resources to keep it spam free.
-- RichardP
Thank you for the operation :) I'll contact the admin for sure. That MoinMoin was harder to manually despam for the reasons you pointed out. So I was wondering if your bot could be more effective. Well, it took you extra work to deal with it. I appreciate it.
P.S. Email has just been sent to the admin.
I Received a reply, and he said that running the bot is fine. The wiki directory is backed-up so that it could be reinstated if it is needed. He also said that he would upgrade the wikiengine, (as opposed to make it read-only or shut it down) and any suggestion would be welcome.
I think MoinMoin's later versions come with more anti-spam tools, such as a quick click-revert and editable blacklist ("Bad Content"). But he seemed to be open to the idea of even to migrate to another wikiengine. Any recommendation?
I think upgrading to the latest version of MoinMoin is the easiest path for him. By staying with MoinMoin he doesn't have to find a way to transition his data to a new engine, besides MoinMoin 1.3 and newer include support for the an automatic global anti-spam blacklist, so wiki's running that version or later get much less spam, even if the administrator never performs any maintenance.
-- RichardP
Thank you for your opinion. With a kind of thank you note, I told the administrator about your opinion (along with the url of this page).
Besides, I found you the other day on Tuesday Wiki (of Harvard). That was one of the hardest-hit wiki among the surviving ones I know. I am glad that you intervened. I hope it is not too late.
I noticed today that the admin of the Centre for Music Technology Wiki has accepted your suggestion and upgraded to version 1.3.4 of MoinMoin. It should be much easier to manually revert changes now. However, could you ask him to enable MoinMoin's anti-spam feature? To do so, ask him to uncomment (remove the #) from the line...
#from MoinMoin.util.antispam import SecurityPolicy... in his MoinMoin configuration file (probably "wikiconfig.py").
With regards to Harvard's Tuesday Wiki, it was one of the two "ghost towns" that Manni asked me to clean earlier on this page. Unfortunately, I think it is too late for Tuesday Wiki. All of the valid content appears to have been destroyed by spammers, keeping the wiki clean now is mostly just a symbolic action of defiance against spammer vandalism.
-- RichardP
Just a short note to say that the CMT wiki is now up and running again, thanks to your fine efforts with the moin software.
The "official" way in is via http://cmt.gla.ac.uk/cmtwiki.html, but the fast way in (for experienced wiki users :) ) is to go straight to http://markov.music.gla.ac.uk/cmt-wiki as before
Please point any bots at it you like!
-- Nick Bailey
Nick, I just noticed an error on the page "http://cmt.gla.ac.uk/cmtwiki.html". The first sentence in the main content correctly says the wiki is at "http://markov.music.gla.ac.uk/cmt-wiki", but its hyperlink is broken. It uses a relative url to link to "/cmt-wiki/", so clicking on the link goes to "http://cmt.gla.ac.uk/cmt-wiki/" not "http://markov.music.gla.ac.uk/cmt-wiki".
-- RichardP
Fixed, thanks Richard. I also had to fight hard to get the spell check to work. I blamed a python bug, but of course it was an internationalisation problem. This is a known issue: [utf problem] - but it took me a while to find it!
We're running the Digital Music Research Network conference at the end of July: hope we'll be able to use the Wiki to organise it now :)
-- Nick Bailey
Great! I saw some contents coming back to the wiki. I hope things will go well with your wiki.
Hello Richard
Since I couldn't find any email address of yours I am putting my request here in the open: would you be so kind and include the wiki of the Swiss MD-PhD? Association at http://www.smpa.org/cgi-bin/wiki.cgi into WikiMinion, please?
That would be great! Thanks in advance.
-- DavidAndel
David, I've swept all pages of the SMPA wiki for spam and added the SMPA wiki to the list of wikis visited by WikiMinion. The wiki appears to not have had much of spam problem, WikiMinion only found one page that needed to be fixed.
-- RichardP
Cool, thank you very much!
-- DavidAndel
Could you add http://easytopicmaps.com. It's been totally overrun :(
-- PeterV?
I am sorry Peter, Easy'TopicMaps wiki can't be cleaned by WikiMinion - it appears to be using the wakka wiki engine. Unfortunately, WikiMinion doesn't support the wakka wiki engine (see the compatibility chart). I have investigated adding support for wakka to WikiMinion in the past, however, wakka appears to lack the features that WikiMinion requires.
-- RichardP
Now that the Center for Music Technology seems to be running rather well, I wonder I could draw your attention to another wiki. How do you think about this Blackbeltjones wiki?
http://www.blackbeltjones.com/cgi-bin/moin.cgi/RecentChanges?action=recall&date=1110002556 (the current version is spammed as of now)
Again, if you are interested in taking care of the wiki, I would be happy to contact the admin via email, ask a permission for WikiMinion, and suggest upgrading, etc.
-- WikiTomos
Sure, not a problem. I'm happy to help, especially since you're doing all the hard work of tracking the admin down and obtaining permission. I've used WikiMinion to sweep all the page of the blackbeltjones.com wiki clean and added it to the list of wiki's visited by WikiMinion.
-- RichardP
Well, I am not sure which is harder work - writing, modifying, running the script or emailing the admins. But thank you any way for your help.
I have just sent an email to the admin. I explained that the wiki has been spammed, that I asked your help, that we wanted his permission for the WikiMinion. I suggested perhaps upgrading to the latest version of MoinMoin is an option, too. I explained that our communication takes place here.
I received a reply from Matt Jones, the administrator. He was quite thankful of our despamming. He did not tell me he was going to update his wiki or anything, but I infer he would not mind WikiMinion being there.
Wow, Metaweb is certainly running an ancient version of MediaWiki. I had no idea that there were any wikis still using the pre-formal release MediaWiki codebase. I'll update WikiMinion to support that version of MediaWiki and begin cleaning the MetaWeb? wiki. However, it might take a day or two before I get it up and running.
-- RichardP