More on the war against comment spam
As a follow up to this article, I now elaborate on the additional steps necessary to combat the ever increasing problem of comment spam. In the previous article, I covered the merits of tokenising your site's forms, to stop automated posting to form handlers. This is effective when your site is crawled and a profile of your forms is made, to decide what to post into what input field. However, it does little against methods that actually visit your site, retrieve the form and then send it back with the comment spam enclosed. This is the case when programs akin to RoboForm are used, or where cheap labour markets are exploited to copy and paste into online forms.
Pre-moderation of comments is always going to prove effective, but is also time and labour intensive and makes your site the little bit less interactive. It is also a snub to regular visitors who contribute to the content of your site. Verification is the answer here and it is this that I will focus on in this article. With verification we also bring user status into the equation and this is a key part to tightening up your operation, without making your contributors suffer too much.
Verified users
Firstly, we need to confirm that the user is a real person and that their motives for contributing are genuine. This involves some level of trust, but in these times of plentiful mailboxes that users can allocate for set purposes, it shouldn't really be an issue. As you will see on countless blog oriented sites, it is common practice to collect a contributor's email address. It is up to the site owner to not abuse that small level of trust and to not exploit that email address.
Because in principle, an email address should identify a particular person, we use this for that purpose. After they have made their first post to your comments facility, they are sent an email at that address, containing a link that they must use to verify that they have access to that email address. Once they have done that, they are added to our verified list. This in itself shouldn't automatically grant them any privileges above that of anonymous visitor. This should be a manual process, carried out by a human on behalf of the site. We store the email address on the server, for future use and also at the user's end for their future convenience, via a cookie. For best practice, it would be best for the cookie stored copy to be hashed or obscured in some way, as a step towards protecting their privacy. Hash comparisons are a trivial matter and can be sped up through use of secondary identifiers, which is explained in more detail in the Tokenised forms article. If you have the means, weak(ish) encryption might be better than hashing and simplifies checking on the server.
If we don't get the email address verified through the user clicking the emailed link within a specific period of time, then we discard their post and flag that email address for further consideration. If they continue to not verify that email address, but make more posts, then we might decide to flag that email address as a potential spammer, or at least somebody who isn't willing to play by the rules. That is for you to decide and again, should be a manual process.
Once verified and the user's posts aren't ringing any alarm bells with you, then we may decide that subsequent posts that are identified by that email address, are reasonably trustworthy and can proceed to be displayed on our site, without needing to be pre-moderated. They are now a trusted user and that allows them to post to your comments facility without moderation, nothing else. This privilege can be revoked at a later date, if they start to abuse it.
Privilege escalation
You may also have other facilities on your site that users can interact with, such as discussion boards or forums, personal journals, which are more or less sub-blogs, and so on. It is recommended that access to these facilities require a more detailed registration and verification scheme than simply verifying an email address. Additional to an email address, a username and password, along with optional display name and security question, should be provided by the user. The account should still be verified and preferably granted the elevated privileges on a manual basis, when you are confident about allowing that person to post freely to your web site.
Least privilege should always be in effect, allowing that person to perform the actions permitted to them and no more. If you have administrative or other empowered users, then you need to make sure that your mechanisms for identifying them are secure and cannot be exploited. However, that is a subject of greater scope than this article permits and will be thoroughly covered in future articles.
In summary
So, using the principles explained above, we should now be able to confirm that our contributors are indeed human beings. We also have the ability to vet these users and get a feel for their intentions, before allowing them the freedom to post unmoderated content to our site. These users can be identified in the future, for their convenience and our peace of mind. When privileges are granted, we should also have the ability to revoke those privileges.
Things to keep in mind, are shared or public computers such as those in net cafes and libraries. Can we be sure that the same person is using the same computer with that cookie present? You may want to only set temporary cookies, which doesn't work well in this scenario, or at least give the option of allowing the user to decide whether a permanent cookie is set on the machine.
At some stage I will publish an article on user accounts, which is the real meat of servicing repeat, regular contributors to your web site. Once I feel that I have covered all of the theories that need to be fully understood I may go on to publish code snippets, giving you examples of how to implement these theories. However, I would prefer to use parts of the code used in this CMS to do this and that means that my own code needs to be heavily scrutinised first. As I point out at the end of the Tokenised forms article, I want to be sure that showing the mechanics of my methods doesn't compromise their effectiveness or security. Depending on time factors, I may simply write short code snippets that aren't part of my existing code, to demonstrate certain points.



