Working with UTF-8 in PHP, MySQL and Apache
En Español  

As I have previously posted in this website, I believe it is a good idea to standardize in one character encoding across all parts of a system, and as my preferred character encoding is UTF-8, when I create a PHP system I check the following things:

  • That the browser receives and interprets the output of my PHP scripts as UTF-8.
  • That the (X)HTML forms accept UTF-8.
  • That PHP treats the data received from a MySQL database as UTF-8.

Of course, we also need to set our editor of choice to our desired character encoding and when we create the MySQL tables choose UTF-8 as it's character encoding (I will do a post about the different character encodings and about collations in the future).

Make the browser interpret our document as UTF-8

We don't always count with the ability to modify php.ini in our server, but lets say that we do, in this case we can set the default character encoding for our PHP scripts by changing:

default_charset = "utf-8"

However, if this is unavailable, we can specify the character encoding in which the file is being served sending a header at the beginning of our script:

<?php header("Content-type: text/html; charset=utf8"); ?>

Even though this may be redundant to do, we can specify the encoding in a meta tag inside our resulting (X)HTML document:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

Make the browser accept UTF-8 in the forms

Since we are handling everything as UTF-8, we can make sure that the forms accept this character encoding as well by adding the attribute accept-charset to the form tag:

<form accept-charset="utf-8">

Treat the data we obtain form a MySQL database as UTF-8

We must get sure that the information that we are receiving from the database is being handled as UTF-8 by our PHP scripts, to do this we can use the functions mysql_set_charset() or mysqli_set_charset():

mysql_set_charset('utf8');

Now everything should be displayed and work correctly within our PHP script.

UTF-8 in plain text files and html files.

Assuming that we have access to the configuration file of Apache, httpd.conf (it may be located in /etc/apache2/httpd.conf, or in /etc/httpd/conf/httpd.conf), we can add the following to this file:

AddDefaultCharset UTF-8

But if we can't modify httpd.conf, we can still do it by adding the same to our .htaccess file.

Of course we can add a meta tag to our HTML files as previously described, but this is a way to get sure that our plain text files are sent as UTF-8.