PrimitiveType

Handling quote characters in HTML form input fields


In (X)HTML, attribute values should be enclosed by double or single quotes. But a common source of errors and confusion arises when those values themselves contain double or single quotes. This is especially common for form input fields, where the values might contain data obtained from a database or supplied previously by the user. This article looks at how to deal with this problem using PHP.

Consider the case of an input text field for last name:

<input type='text' name='last_name' value='' />

Usually, attribute values are surrounded by double quotes, but single quotes are also allowed, and serve to highlight the pitfall here. Say that the value of the last name text field is taken from a database of users, and this particular user's last name is "O'Reilly" - the PHP code will be:

<input type='text' name='last_name' value='<?php print $lastName; ?>' />

And the HTML output will be:

<input type='text' name='last_name' value='O'Reilly' />

This will make the last name appear as just "0" in a browser, and will be sent as that when submitting the form. This is because the single quote in "O'Reilly" is taken as marking the end of the value. What we want is to encode the quote character so that HTML understands what we mean is the literal character for a single quote. The encoded version of a single quote is "&#39;". The encoding can be done in a number of ways. For example, we can use the function str_replace() to replace all occurrences of "'" with "&#39;". But the most convenient and complete way is to use the htmlentities() function on the $lastName variable, as in the following PHP code:

<input type='text' name='last_name' value='<?php print htmlentities($lastName, ENT_QUOTES); ?>' />

Which will output:

<input type='text' name='last_name' value='O&#39;Reilly' />

Although "O'Reilly" is now not in its literal form in the HTML code, it will be displayed and sent properly from a form on an HTML page as seen in a browser.

The ENT_QUOTES parameter in the htmlentities() function ensures that single quote characters are encoded, since by default they are not (though double quote characters are). htmlentities() also encodes other characters and is useful for ensuring that characters such as "<" and "&" are not interpreted as HTML special characters.

Say the last name value was enclosed in double quotes; in this case "O'Reilly" would not present a problem. However, a similar situation would arise if the user's last name was set to, say, '"The Legend" Wilson', where the browser would see the last name as empty. Using htmlentities() solves this problem as well.

Note that the user can still enter the literal value "O'Reilly" in a text field - what we're looking at here is what goes on in the HTML behind the page.

As a final note, remember that data to be inserted into a database might also contain quote characters, and may need to be encoded in a similar way. If using a MySQL database, the function mysql_real_escape_string() will encode the variable by escaping quote characters with a backslash. (Note that they might already be escaped automatically depending on the PHP configuration - search for "PHP magic quotes" for more information.)