The definitive guide to form based website authentication
We believe that Stack Overflow should not just be a resource for very specific technical questions, but also for general guidelines on how to solve variations on common problems. "Form based authentication for websites" should be a fine topic for such an experiment. 
improve this question | comment
Gino Medhurst Created at: 2013-11-13 17:07:49 UTC By Gino Medhurst
Why exclude HTTP Basic Authentication? It can work in HTML Forms via Ajax: - Chandler Von
HTTP Basic Auth has the property of being (comparatively) difficult to make a browser forget. It's also horribly insecure if you don't use it with SSL to secure the connection (i.e., HTTPS). - Brown Zboncak
I think it'd be worth talking about sessions (including fixation and hijacking) cookies (the secure and http only flags) HTTP based SSO - Ezra Ziemann
The super-useful HttpOnly cookie flag, which prevents JavaScript-based cookie theft (a subset of XSS attacks), should be mentioned somewhere too. - Haskell Maggio
Wow. Lengthy answers, dozens of upvotes for some of them, yet nobody mentions the common mistake of serving login forms over HTTP. I've even argued with people who said "but it submits to https://..." and only got blank stares when I asked if they were sure an attacker didn't rewrite the non-encrypted page the form was served over. - Trycia Eichmann
12 Answers
Simple protection Against Brute Force Attack

A very simple method to protect against brute force attack without much code. 

Make your server return randomly false for login-in as if the user did a wrong typing in the password. Doing so once every 40 login would give a 2.5% chance that the code is never found thus a 100% doubt that the right code has already been tried by the hacker and a 98.5% chance that the real user is not bothered. Write that information on your website so that attackers knows. Not perfect at all but simple enough to be implemented by everyone.
User Authentication on the World Wide Web is old, but a good primer read nonetheless.
First, a strong caveat that this answer is not the best fit for this exact question. It should definitely not be the top answer!

I will go ahead and mention Mozilla’s proposed BrowserID (or perhaps more precisely, the Verified Email Protocol) in the spirit of finding an upgrade path to better approaches to authentication in the future.

I’ll summarize it this way:

Mozilla is a nonprofit with values that align well with finding good solutions to this problem.
The reality today is that most websites use form-based authentication
Form-based authentication has a big drawback, which is increased risk of phishing. Users are asked to enter sensitive information into an area controlled by a remote entity, rather than an area controlled by their User Agent (browser).
Since browsers are implicitly trusted (the whole idea of a User Agent is to act on behalf of the User), they can help improve this situation.
The primary force holding back progress here is deployment deadlock. Solutions must be decomposed into steps which provide some incremental benefit on their own.
The simplest decentralized method for expressing identity that is built into the internet infrastructure is the domain name.
As a second level of expressing identity, each domain manages its own set of accounts.
The form “account@domain” is concise and supported by a wide range of protocols and URI schemes. Such an identifier is, of course, most universally recognized as an email address.
Email providers are already the de-facto primary identity providers online. Current password reset flows usually let you take control of an account if you can prove that you control that account’s associated email address.
The Verified Email Protocol was proposed to provide a secure method, based on public key cryptography, for streamlining the process of proving to domain B that you have an account on domain A.
For browsers that don’t support the Verified Email Protocol (currently all of them), Mozilla provides a shim which implements the protocol in client-side JavaScript code.
For email services that don’t support the Verified Email Protocol, the protocol allows third parties to act as a trusted intermediary, asserting that they’ve verified a user’s ownership of an account. It is not desirable to have a large number of such third parties; this capability is intended only to allow an upgrade path, and it is much preferred that email services provide these assertions themselves.
Mozilla offers their own service to act as such a trusted third party. Service Providers (that is, Relying Parties) implementing the Verified Email Protocol may choose to trust Mozilla's assertions or not. Mozilla’s service verifies users’ account ownership using the conventional means of sending an email with a confirmation link.
Service Providers may, of course, offer this protocol as an option in addition to any other method(s) of authentication they might wish to offer.
A big user interface benefit being sought here is the “identity selector”. When a user visits a site and chooses to authenticate, their browser shows them a selection of email addresses (“personal”, “work”, “political activism”, etc.) they may use to identify themselves to the site.
Another big user interface benefit being sought as part of this effort is helping the browser know more about the user’s session – who they’re signed in as currently, primarily – so it may display that in the browser chrome.
Because of the distributed nature of this system, it avoids lock-in to major sites like Facebook, Twitter, Google, etc. Any individual can own their own domain and therefore act as their own identity provider.
This is not strictly “form-based authentication for websites”. But it is an effort to transition from the current norm of form-based authentication to something more secure: browser-supported authentication.
I do not think the above answer is "wrong" but there are large areas of authentication that are not touched upon (or rather the emphasis is on "how to implement cookie sessions", not on "what options are available and what are the trade offs".

My suggested edits / answers are

The problem lies more in account setup than in password checking.
The use of two factor authenitication is much more secure than more clever means of password encryption
Do NOT try to implement your own login form or database storage of passwords, unless 
the data being stored is valueless at account creation and self-generated (that is, web 2.0 style like Facebook, Flickr, etc.)

Digest Authentication is a standards based approach supported in all major browsers and servers, that will not send a password even over a secure channel.

This avoids any need to have "sessions" or cookies as the browser itself will re-encrypt the communication each time. It is the most "lightweight" development approach.

However, I do not recommend this, except for public, low value services. This is an issue with some of the other answers above - do not try an re-implement server-side authetication mechanisms - this problem has been solved and is supported by most major browsers. Do not use cookies. Do not store anything in your own hand-rolled database. Just ask, per request, if the request is autheticated. Everything else should be supported by configuration and third-party trusted software.

So ...

First, we are confusing the initial creation of an account (with a password) with the 
re-checking of the password subsequently. If I am Flickr and creating your site for the first time, the new user has access to zero value (blank web space). I truly do not care if the person creating the account is lying about their name. If I am creating an account of the hospital intranet / extranet, the value lies in all the medical records, and so I do care about the identity (*) of the account creator.

This is the very very hard part. The only decent solution is a web of trust. For example, you join the hospital as a doctor. You create a web page hosted somewhere with your photo, your passport number and a public key, and hash them all with the private key. You then visit the hospital and the system administrator looks at your passport, sees if the photo matches you, and then hashes the web page / photo hash with the hospital private key. From now on we can securely exchange keys and tokens. As can anyone who trusts the hospital (there is the secret sauce BTW). The system administrator can also give you an RSA dongle or other two-factor authentication.

But this is a lot of hassle, and not very web 2.0. However, it is the only secure way to create new accounts that have access to valuable information that is not self-created.

Kerberos and SPNEGO - single sign on mechanisms with a trusted third party - basically the user verifies against a trusted third party. (NB this is not in any way the not to be trusted OAuth)
SRP - sort of clever password authentication without a trusted third party. But here we are getting into the realms of "it's safer to use two factor authentication, even if that's costlier"
SSL client side - give the clients a public key certificate (support in all major browsers - but raises questions over client machine security).
In the end it's a tradeoff - what is the cost of a security breach vs the cost of implementing more secure approaches. One day, we may see a proper PKI widely accepted and so no more own rolled authentication forms and databases. One day...
I just thought I'd share this solution that I found to be working just fine.

I call it the Dummy Field (though I haven't invented this so don't credit me).

In short: you just have to insert this into your <form> and check for it to be empty at when validating:

<input type="text" name="email" style="display:none" />

The trick is to fool a bot into thinking it has to insert data into a required field, that's why I named the input "email". If you already have a field called email that you're using you should try naming the dummy field something else like "company", "phone" or "emailaddress". Just pick something you know you don't need and what sounds like something people would normally find logical to fill in into a web form. Now hide the input field using CSS or JavaScript/jQuery - whatever fits you best - just don't set the input type to hidden or else the bot won't fall for it.

When you are validating the form (either client or server side) check if your dummy field has been filled to determine if it was send by a human or a bot.


In case of a human:
The user will not see the dummy field (in my case named "email") and will not attempt to fill it. So the value of the dummy field should still be empty when the form has been send.

In case of a bot: The bot will see a field whose type is text and a name email (or whatever it is you called it) and will logically attempt to fill it with appropriate data. It doesn't care if you styled the input form with some fancy CSS, web-developers do it all the time. Whatever the value in the dummy field is, we don't care as long as it's larger than 0 characters.

I used this method on a guestbook in combination with CAPTCHA, and I haven't seen a single spam post since. I had used a CAPTCHA-only solution before, but eventually it resulted in about five spam posts every hour. Adding the dummy field in the form has stopped (at least till now) all the spam from appearing. 

I believe this can also be used just fine with a login/authentication form.

Warning: Of course this method is not 100% fool proof. Bots can be programmed to ignore input fields with the style display:none applied to it. You also have to think about people who use some form of auto-completion (like most browsers have built-in!) to auto-fill all form fields for them. They might just as well pick up a dummy field.

You can also vary this up a little by leaving the dummy field visible but outside the boundaries of screen, but this is totally up to you. 

Be creative!
See also Wikibooks PHP Programming: User login systems.
Definitive Article

Sending credentials

The only practical way to send credentials 100% securely is by using SSL. Using JavaScript to hash the password is not safe. (TODO citation required). There's another secure method called SRP, but it's patented (although it is freely licensed) and there are few good implementations available.

Storing passwords

Don't ever store passwords as plaintext in the database. Not even if you don't care about the security of your own site. Assume that some of your users will reuse the password of their online bank account. So, store the hashed password, and throw away the original. And make sure the password doesn't show up in access logs or application logs. The best hashing function seems to be bcrypt. 

Hashes by themselves are also insecure. For instance, identical passwords mean identical hashes. Instead, store the salted hash. A salt is a string appended to the hash - use a different (random) salt per user. The salt is a public value, so you can store them with the hash in the database.

This means that you can't send the user their forgotten passwords (because you only have the hash). Don't reset the user's password unless you have authenticated the user (users must prove that they know the answer to the security question, or are able to read emails sent to the stored (and validated) email address.)

Security questions

Security questions are insecure - avoid using them. Why? Read PART III: Using Secret Questions in @Jens Roland answer here in this wiki.

Session cookies

After the user logs in, the server sends the user a session cookie. The server can retrieve the username or id from the cookie, but nobody else can generate such a cookie (TODO explain mechanisms). Cookies can be hijacked (TODO really? how?), so don't send persistent cookies. If you want to autologin your users, you can set a persistent cookie, but you should set a flag that the user has auto-logged in, and needs to login for real for sensitive operations (TODO is this correct? I think this is what Amazon does.)

List of external resources

Dos and Don'ts of Client Authentication on the Web (PDF)
21 page academic article with many great tips.  
Ask YC: Best Practices for User Authentication
Forum discussion on the subject  
You're Probably Storing Passwords Incorrectly
Introductory article about storing passwords
Discussion: Coding Horror: You're Probably Storing Passwords Incorrectly
Forum discussion about a Coding Horror article.
Never store passwords in a database!
Another warning about storing passwords in the database.
Password cracking
Wikipedia article on weaknesses of several password hashing schemes.
Enough With The Rainbow Tables: What You Need To Know About Secure Password Schemes
Discussion about rainbow tables and how to defend against them, and against other threads. Includes extensive discussion.
Not quite 'definitive', but OWASP has some good stuff as well...
When hashing, don't use fast hash algorithms such as MD5 (many hardware implementations exist).  Use something like SHA-512.  For passwords, slower hashes are better.

The faster you can create hashes, the faster any brute force checker can work. Slower hashes will therefore slow down brute forcing. A slow hash algorithm will make brute forcing impractical for longer passwords (8 digits +)
good article about realistic password strength estimation
My favourite rule in regards to authentication systems: use passphrases, not passwords. Easy to remember, hard to crack.
More info:
PART I: How To Log In

As a rule, CAPTCHAs should be a last resort. They tend to be annoying, often aren't human-solvable, most of them are ineffective against bots, all of them are ineffective against cheap third-world labor (according to OWASP, the current sweatshop rate is $12 per 500 tests), and some implementations are technically illegal in some countries (see link number 1 from the MUST-READ list). If you must use a CAPTCHA, use reCAPTCHA, since it is OCR-hard by definition (since it uses already OCR-misclassified book scans).
It is possible to prevent browsers from storing/retrieving a password with the autocomplete tag for forms/input fields. However in the real world, your customers will have many accounts on different systems; it compromises their security if they use the same password for every site. Can you expect them to remember different passwords for every site? There are some good password managers out there, however there are also bad ones - which will become a target for attackers.
The only (currently practical) way to protect against login interception (packet sniffing) during login is by using a certificate-based encryption scheme (for example, SSL) or a proven & tested challenge-response scheme (for example, the Diffie-Hellman-based SRP). Any other method can be easily circumvented by an eavesdropping attacker.
On that note: hashing the password client-side (for example, with JavaScript) is useless unless it is combined with one of the above - that is, either securing the line with strong encryption or using a tried-and-tested challenge-response mechanism (if you don't know what that is, just know that it is one of the most difficult to prove, most difficult to design, and most difficult to implement concepts in digital security). Hashing the password is effective against password disclosure, but not against replay attacks, Man-In-The-Middle attacks / hijackings, or brute-force attacks (since we are handing the attacker both username, salt and hashed password).
After sending the authentication tokens, the system needs a way to remember that you have been authenticated - this fact should only ever be stored serverside in the session data. A cookie can be used to reference the session data. Wherever possible, the cookie should have the secure and HTTP Only flags set when sent to the browser. The httponly flag provides some protection against the cookie being read by a XSS attack. The secure flag ensures that the cookie is only sent back via HTTPS, and therefore protects against network sniffing attacks. The value of the cookie should not be predictable. Where a cookie referencing a non-existent session is presented, its value should be replaced immediately to prevent session fixation.
PART II: How To Remain Logged In - The Infamous "Remember Me" Checkbox

Persistent Login Cookies ("remember me" functionality) are a danger zone; on the one hand, they are entirely as safe as conventional logins when users understand how to handle them; and on the other hand, they are an enormous security risk in the hands of most users, who use them on public computers, forget to log out, don't know what cookies are or how to delete them, etc.

Personally, I want my persistent logins for the web sites I visit on a regular basis, but I know how to handle them safely. If you are positive that your users know the same, you can use persistent logins with a clean conscience. If not - well, then you're more like me; subscribing to the philosophy that users who are careless with their login credentials brought it upon themselves if they get hacked. It's not like we go to our user's houses and tear off all those facepalm-inducing Post-It notes with passwords they have lined up on the edge of their monitors, either. If people are idiots, then let them eat idiot cake.

Of course, some systems can't afford to have any accounts hacked; for such systems, there is no way you can justify having persistent logins.

If you DO decide to implement persistent login cookies, this is how you do it:

First, follow Charles Miller's 'Best Practices' article Do not get tempted to follow the 'Improved' Best Practices linked at the end of his article. Sadly, the 'improvements' to the scheme are easily thwarted (all an attacker has to do when stealing the 'improved' cookie is remember to delete the old one. This will require the legitimate user to re-login, creating a new series identifier and leaving the stolen one valid).
And DO NOT STORE THE PERSISTENT LOGIN COOKIE (TOKEN) IN YOUR DATABASE, ONLY A HASH OF IT! The login token is Password Equivalent, so if an attacker got his hands on your database, he/she could use the tokens to log in to any account, just as if they were cleartext login-password combinations. Therefore, use strong salted hashing (bcrypt / phpass) when storing persistent login tokens.
PART III: Using Secret Questions

Don't. Never ever use 'secret questions'. Read the paper from link number 5 from the MUST-READ list. You can ask Sarah Palin about that one, after her Yahoo! email account got hacked during the presidential campaign because the answer to her 'security' question was... (wait for it) ... "Wasilla High School"!

Even with user-specified questions, it is highly likely that most users will choose either:

A 'standard' secret question like mother's maiden name or favourite pet
A simple piece of trivia that anyone could lift from their blog, LinkedIn profile, or similar
Any question that is easier to answer than guessing their password. Which, for any decent password, is every question conceivable.
In conclusion, security questions are inherently insecure in all their forms and variations, and should never be employed in an authentication scheme for any reason.

A secondary question is often considered as adequate for fulfilling a requirement for two-factor authentication. While capturing some of the response via clicks rather than typing in theory provides protection against keylogger attacks, it is still just an extension to the password mechanism - and when users are presented with a text box instead of drop-downs on a phishing site, they rarely perceive this as abnormal. Note that you may be able to fulfill your two-factor obligations by using a long-lasting cookie (granted on submission of multiple authentication questions) in place of a security question asked each and every time - but at the expense of user convenience.

The only reason anyone still uses security questions by choice is that is saves the cost of a few support calls from users who can't remember their email passwords to get to their reactivation codes. At the expense of security and Sara Palin's reputation, that is. Worth it? You be the judge.

PART IV: Forgotten Password Functionality

I already mentioned why you should never use security questions for handling forgotten/lost user passwords; it also goes without saying that you should never e-mail users their passwords. There are at least two more all-too-common pitfalls to avoid in this field:

Don't RESET user's passwords no matter what - 'reset' passwords are harder for the user to remember, which means he/she MUST either change it OR write it down - say, on a bright yellow Post-It on the edge of his monitor. Instead, just let users pick a new one right away - which is what they want to do anyway.
Always hash the lost password code/token in the database. AGAIN, this code is another example of a Password Equivalent, so it MUST be hashed in case an attacker got his hands on your database. When a lost password code is requested, send the plaintext code to the user's email address, then hash it, save the hash in your database -- and throw away the original. Just like a password or a persistent login token.
A final note: always make sure your interface for entering the 'lost password code' is at least as secure as your login form itself, or an attacker will simply use this to gain access instead. Making sure you generate very long 'lost password codes' (for example, 16 case sensitive alphanumeric characters) is a good start, but consider adding the same throttling that you do for logins.

PART V: Checking Password Strength

First, you'll want to read this small article for a reality check: The 500 most common passwords

Okay, so maybe the list isn't the canonical list of most common passwords on any system anywhere ever, but it's a good indication of how poorly people will choose their passwords when there is no enforced policy in place. Plus, the list looks frighteningly close to home when you compare it to the publicly available analyses of 40.000+ recently stolen MySpace passwords.

Well, enough MySpace-bashing for now. Moving on..

So: With no minimum password strength requirements, 2% of users use one of the top 20 most common passwords. Meaning: if an attacker gets just 20 attempts, 1 in 50 accounts on your website will be crackable.

Thwarting this requires calculating the entropy of a password and then applying a threshold.  The National Institute of Standards and Technology (NIST) Special Publication 800-63 has a set of very good suggestions.  That, when combined with a dictionary and keyboard layout analysis (for example, 'qwertyuiop' is a bad password), can reject 99% of all poorly selected passwords at a level of 18 bits of entropy.  Simply calculating password strength and showing a visual strength meter to a user is insufficient.  Unless it is enforced, users will ignore it.

PART VI: Much More - Or: Preventing Rapid-Fire Login Attempts

First, have a look at the numbers: Password Recovery Speeds - How long will your password stand up

If you don't have the time to look through the tables in that link, here's the list of them:

It takes virtually no time to crack a weak password, even if you're cracking it with an abacus
It takes virtually no time to crack an alphanumeric 9-character password, if it is case insensitive
It takes virtually no time to crack an intricate, symbols-and-letters-and-numbers, upper-and-lowercase password, if it is less than 8 characters long (a desktop PC can search the entire keyspace up to 7 characters in a matter of days or even hours)
It would, however, take an inordinate amount of time to crack even a 6-character password, if you were limited to one attempt per second!
So what can we learn from these numbers? Well, lots, but we can focus on the most important part: the fact that preventing large numbers of rapid-fire successive login attempts (ie. the brute force attack) really isn't that difficult. But preventing it right isn't as easy as it seems.

Generally speaking, you have three choices that are all effective against brute-force attacks (and dictionary attacks, but since you are already employing a strong passwords policy, they shouldn't be an issue):

Present a CAPTCHA after N failed attempts (annoying as hell and often ineffective -- but I'm repeating myself here)
Locking accounts and requiring email verification after N failed attempts (this is a DoS attack waiting to happen)
And finally, login throttling: that is, setting a time delay between attempts after N failed attempts (yes, DoS attacks are still possible, but at least they are far less likely and a lot more complicated to pull off).
Best practice #1: A short time delay that increases with the number of failed attempts, like:

1 failed attempt = no delay
2 failed attempts = 2 sec delay
3 failed attempts = 4 sec delay
4 failed attempts = 8 sec delay
5 failed attempts = 16 sec delay
DoS attacking this scheme would be very impractical, but on the other hand, potentially devastating, since the delay increases exponentially. A DoS attack lasting a few days could suspend the user for weeks.

  To clarify: The delay is not a delay before returning the response to the browser. It is more like a timeout or refractory period during which login attempts to a specific account or from a specific IP address will not be accepted or evaluated at all. That is, correct credentials will not return in a successful login, and incorrect credentials will not trigger a delay increase.

Best practice #2: A medium length time delay that goes into effect after N failed attempts, like:

1-4 failed attempts = no delay
5 failed attempts = 15-30 min delay
DoS attacking this scheme would be quite impractical, but certainly doable. Also, it might be relevant to note that such a long delay can be very annoying for a legitimate user. Forgetful users will dislike you.

Best practice #3: Combining the two approaches - either a fixed, short time delay that goes into effect after N failed attempts, like:

1-4 failed attempts = no delay
5+ failed attempts = 20 sec delay
Or, an increasing delay with a fixed upper bound, like:

1 failed attempt = 5 sec delay
2 failed attempts = 15 sec delay
3+ failed attempts = 45 sec delay
This final scheme was taken from the OWASP best-practices suggestions (link 1 from the MUST-READ list), and should be considered best practice, even if it is admittedly on the restrictive side.

  As a rule of thumb however, I would say: the stronger your password policy is, the less you have to bug users with delays. If you require strong (case-sensitive alphanumerics + required numbers and symbols) 9+ character passwords, you could give the users 2-4 non-delayed password attempts before activating the throttling.

DoS attacking this final login throttling scheme would be very impractical. And as a final touch, always allow persistent (cookie) logins (and/or a CAPTCHA-verified login form) to pass through, so legitimate users won't even be delayed while the attack is in progress. That way, the very impractical DoS attack becomes an extremely impractical attack.

Additionally, it makes sense to do more aggressive throttling on admin accounts, since those are the most attractive entry points

PART VII: Distributed Brute Force Attacks

Just as an aside, more advanced attackers will try to circumvent login throttling by 'spreading their activities':

Distributing the attempts on a botnet to prevent IP address flagging
Rather than picking one user and trying the 50.000 most common passwords (which they can't, because of our throttling), they will pick THE most common password and try it against 50.000 users instead. That way, not only do they get around maximum-attempts measures like CAPTCHAs and login throttling, their chance of success increases as well, since the number 1 most common password is far more likely than number 49.995
Spacing the login requests for each user account, say, 30 seconds apart, to sneak under the radar
Here, the best practice would be logging the number of failed logins, system-wide, and using a running average of your site's bad-login frequency as the basis for an upper limit that you then impose on all users.

Too abstract? Let me rephrase:

Say your site has had an average of 120 bad logins per day over the past 3 months. Using that (running average), your system might set the global limit to 3 times that -- ie. 360 failed attempts over a 24 hour period. Then, if the total number of failed attempts across all accounts exceeds that number within one day (or even better, monitor the rate of acceleration and trigger on a calculated treshold), it activates system-wide login throttling - meaning short delays for ALL users (still, with the exception of cookie logins and/or backup CAPTCHA logins).

I also posted a question with more details and a really good discussion of how to avoid tricky pitfals in fending off distributed brute force attacks

PART VIII: Two-Factor Authentication and Authentication Providers

Credentials can be compromised, whether by exploits, passwords being written down and lost, laptops with keys being stolen, or users entering logins into phishing sites.  Logins can be protected with two-factor authentication, which use out-of-band factors such as single-use codes received from a phone call, SMS message, or dongle. Several providers offer two-factor authentication services.

Authentication can be completely delegated to a single-sign-on service such as OAuth, OpenID or Persona (nee BrowserID), where another provider handles collecting credentials.  This pushes the problem to a trusted third party.   Twitter is an example of an OAuth provider, while Facebook provides a similar proprietary solution.

MUST-READ LINKS About Web Authentication

OWASP Guide To Authentication
Dos and Don’ts of Client Authentication on the Web (very readable MIT research paper)
Charles Miller's Persistent Login Cookie Best Practice
Wikipedia: HTTP cookie
Personal knowledge questions for fallback authentication: Security questions in the era of Facebook (very readable Berkeley research paper)
Your Answer