Skip to content

Counting Characters

Rick Brown edited this page Oct 7, 2015 · 22 revisions

Counting Characters - It's harder than you think!

TextArea and MaxLength

Modern browsers allow developers to restrict the maximum number of characters the user may enter into a [textarea] (http://www.w3.org/TR/html5/forms.html#attr-textarea-maxlength) by setting the maxlength attribute.

For example here is a simple HTML snippet which creates a form with a single textarea that only allows 200 characters:

<form action="" method="post">
	<textarea maxlength="200"></textarea>
	<input type="submit"/>
</form>

The image below shows how this simple page looks when loaded in a web browser.

HTML textarea and submit button

Client Side Issues

A common scenario is for a user to copy text from another application into a textarea. In this example we'll assume the user wants to copy from Microsoft Word and paste into the textarea example above (with a limit of 200 characters).

The count according to MS Word

The image below shows the MS Word document we are copying from.

Selected text in MS Word showing 200 characters are selected

Note that, according to MS Word, the selected text is exactly 200 characters long.

The count according to Web Browsers

When we paste the above 200 characters into our simple HTML form and press the submit button here is what we see:

HTML validation error reporting we have entered 203 characters

According to the browser we have entered 203 characters, how can this be? MS Word wouldn't lie to us right? The browser wouldn't lie either right? But someone is clearly not telling the truth.

The count according to Text Editors

Let's ask a simple text editor how many charaters there are, the image below shows the same text pasted into ATOM:

Text editor reporting the sample text is actually 206 characters

What the?!? The text editor says we have 206 characters long!

The true character count

In summary we have copied text from word and pasted into a web browser and a text editor. Each of these reports the following character counts:

  • MS Word: 200
  • Web Browser: 203
  • Text Editor: 206

To find who is telling the truth we must examine the raw bytes that make up this text using a hex editor, shown below:

Hex editor proving the sample text is 206 characters

We can clearly see twenty rows of ten characters followed by one row of six = 206 characters.

  • MS Word: 200 <- LIE
  • Web Browser: 203 <- LIE
  • Text Editor: 206 <- TRUTH

What's Going On?

In the scenario above MS Word and the Web Browser are counting newline characters differently.

When you hit Enter in MS Word it adds two characters to create the newline, these are shown in the hex editor as 0D 0A. In English 0D is called carriage return and 0A is called line feed.

When counting characters MS Word is completely ignoring these character sequences while the web browser is counting them as if they were only one character. As we have proven above, both are wrong.

Clone this wiki locally