PHP convert string encoding to UTF-8

Many web pages marked as using the ISO-8859-1 character encoding actually use the similar Windows-1252 encoding, and web browsers will interpret ISO-8859-1 web pages as Windows-1252. Windows-1252 features additional printable characters, such as the Euro sign (€) and curly quotes (“ ”), instead of certain ISO-8859-1 control characters. This function will not convert such Windows-1252 characters correctly. Use a different function if Windows-1252 conversion is required.

Function utf8_encode() is deprecated in ... on line ...
Function uft8_decode() is deprecated in ... on line ...
3 and
Function utf8_encode() is deprecated in ... on line ...
Function uft8_decode() is deprecated in ... on line ...
4 functions deprecated

Version8.2

TypeDeprecation

Function utf8_encode() is deprecated in ... on line ...
Function uft8_decode() is deprecated in ... on line ...
3 and
Function utf8_encode() is deprecated in ... on line ...
Function uft8_decode() is deprecated in ... on line ...
4 functions, despite their names, are used to convert strings between ISO-8859-1 (Also known as "Latin 1") and UTF-8 encodings. These functions do not attempt to detect the actual character encoding in a given text, and always convert character encodings between ISO-8859-1 and UTF-8, even if the source text is not encoded in ISO-8859-1.

Although PHP includes

Function utf8_encode() is deprecated in ... on line ...
Function uft8_decode() is deprecated in ... on line ...
3 and
Function utf8_encode() is deprecated in ... on line ...
Function uft8_decode() is deprecated in ... on line ...
4 functions in its standard library, these functions cannot be used to detect and convert other character encodings such as Windows-1252, UTF-16, and UTF-32 to UTF-8. Passing arbitrary text to
Function utf8_encode() is deprecated in ... on line ...
Function uft8_decode() is deprecated in ... on line ...
3 function is prone to bugs that do not result in any warnings or errors but may lead to undesired results.

Some frequent examples of bugs include:

  • The Euro sign (
    function iso8859_1_to_utf8(string $s): string {
        $s .= $s;
        $len = \strlen($s);
    
        for ($i = $len >> 1, $j = 0; $i < $len; ++$i, ++$j) {
            switch (true) {
                case $s[$i] < "\x80": $s[$j] = $s[$i]; break;
                case $s[$i] < "\xC0": $s[$j] = "\xC2"; $s[++$j] = $s[$i]; break;
                default: $s[$j] = "\xC3"; $s[++$j] = \chr(\ord($s[$i]) - 64); break;
            }
        }
    
        return substr($s, 0, $j);
    }
    0, character sequence
    function iso8859_1_to_utf8(string $s): string {
        $s .= $s;
        $len = \strlen($s);
    
        for ($i = $len >> 1, $j = 0; $i < $len; ++$i, ++$j) {
            switch (true) {
                case $s[$i] < "\x80": $s[$j] = $s[$i]; break;
                case $s[$i] < "\xC0": $s[$j] = "\xC2"; $s[++$j] = $s[$i]; break;
                default: $s[$j] = "\xC3"; $s[++$j] = \chr(\ord($s[$i]) - 64); break;
            }
        }
    
        return substr($s, 0, $j);
    }
    1), when passed to
    Function utf8_encode() is deprecated in ... on line ...
    Function uft8_decode() is deprecated in ... on line ...
    3 function as
    function iso8859_1_to_utf8(string $s): string {
        $s .= $s;
        $len = \strlen($s);
    
        for ($i = $len >> 1, $j = 0; $i < $len; ++$i, ++$j) {
            switch (true) {
                case $s[$i] < "\x80": $s[$j] = $s[$i]; break;
                case $s[$i] < "\xC0": $s[$j] = "\xC2"; $s[++$j] = $s[$i]; break;
                default: $s[$j] = "\xC3"; $s[++$j] = \chr(\ord($s[$i]) - 64); break;
            }
        }
    
        return substr($s, 0, $j);
    }
    3 results in a a garbled (also called as "Mojibake") text output of
    function iso8859_1_to_utf8(string $s): string {
        $s .= $s;
        $len = \strlen($s);
    
        for ($i = $len >> 1, $j = 0; $i < $len; ++$i, ++$j) {
            switch (true) {
                case $s[$i] < "\x80": $s[$j] = $s[$i]; break;
                case $s[$i] < "\xC0": $s[$j] = "\xC2"; $s[++$j] = $s[$i]; break;
                default: $s[$j] = "\xC3"; $s[++$j] = \chr(\ord($s[$i]) - 64); break;
            }
        }
    
        return substr($s, 0, $j);
    }
    4.
  • The German Eszett character (
    function iso8859_1_to_utf8(string $s): string {
        $s .= $s;
        $len = \strlen($s);
    
        for ($i = $len >> 1, $j = 0; $i < $len; ++$i, ++$j) {
            switch (true) {
                case $s[$i] < "\x80": $s[$j] = $s[$i]; break;
                case $s[$i] < "\xC0": $s[$j] = "\xC2"; $s[++$j] = $s[$i]; break;
                default: $s[$j] = "\xC3"; $s[++$j] = \chr(\ord($s[$i]) - 64); break;
            }
        }
    
        return substr($s, 0, $j);
    }
    5, character sequence
    function iso8859_1_to_utf8(string $s): string {
        $s .= $s;
        $len = \strlen($s);
    
        for ($i = $len >> 1, $j = 0; $i < $len; ++$i, ++$j) {
            switch (true) {
                case $s[$i] < "\x80": $s[$j] = $s[$i]; break;
                case $s[$i] < "\xC0": $s[$j] = "\xC2"; $s[++$j] = $s[$i]; break;
                default: $s[$j] = "\xC3"; $s[++$j] = \chr(\ord($s[$i]) - 64); break;
            }
        }
    
        return substr($s, 0, $j);
    }
    6), when passed through
    function iso8859_1_to_utf8(string $s): string {
        $s .= $s;
        $len = \strlen($s);
    
        for ($i = $len >> 1, $j = 0; $i < $len; ++$i, ++$j) {
            switch (true) {
                case $s[$i] < "\x80": $s[$j] = $s[$i]; break;
                case $s[$i] < "\xC0": $s[$j] = "\xC2"; $s[++$j] = $s[$i]; break;
                default: $s[$j] = "\xC3"; $s[++$j] = \chr(\ord($s[$i]) - 64); break;
            }
        }
    
        return substr($s, 0, $j);
    }
    7 results in
    function iso8859_1_to_utf8(string $s): string {
        $s .= $s;
        $len = \strlen($s);
    
        for ($i = $len >> 1, $j = 0; $i < $len; ++$i, ++$j) {
            switch (true) {
                case $s[$i] < "\x80": $s[$j] = $s[$i]; break;
                case $s[$i] < "\xC0": $s[$j] = "\xC2"; $s[++$j] = $s[$i]; break;
                default: $s[$j] = "\xC3"; $s[++$j] = \chr(\ord($s[$i]) - 64); break;
            }
        }
    
        return substr($s, 0, $j);
    }
    8.

Both of the examples above do not emit any warnings or errors although their resulting text is wrong.


Because of the misleading function names, lack of error messages and warnings, and the lack of support for character encodings other than ISO-8859-1,

Function utf8_encode() is deprecated in ... on line ...
Function uft8_decode() is deprecated in ... on line ...
3 and
Function utf8_encode() is deprecated in ... on line ...
Function uft8_decode() is deprecated in ... on line ...
4 functions are deprecated in PHP 8.2.

Using

Function utf8_encode() is deprecated in ... on line ...
Function uft8_decode() is deprecated in ... on line ...
3 and
Function utf8_encode() is deprecated in ... on line ...
Function uft8_decode() is deprecated in ... on line ...
4 functions emit a deprecation notice in PHP 8.2, and the functions will be removed in PHP 9.0.

utf8_encode('foo');
uft8_decode('foo');
Function utf8_encode() is deprecated in ... on line ...
Function uft8_decode() is deprecated in ... on line ...

Function utf8_encode() is deprecated in ... on line ...
Function uft8_decode() is deprecated in ... on line ...
3 function encodes a ISO-8859-1 encoded string text into UTF-8. Most of the
Function utf8_encode() is deprecated in ... on line ...
Function uft8_decode() is deprecated in ... on line ...
3 calls in legacy PHP applications use this function as an additional safe-guard to prevent any potential malformed text to UTF-8, but as shown in the examples above, using this function often results in undesired outcomes rather than fixing any malformed text.

Similarly, calling

Function utf8_encode() is deprecated in ... on line ...
Function uft8_decode() is deprecated in ... on line ...
4 function on a string decodes that string to ISO-8859-1 character encoding. Majority of the web applications, web sites, and text formats in fact expect UTF-8 encoded text and not ISO-8859-1.

It might be ideal to reevaluate the need of

Function utf8_encode() is deprecated in ... on line ...
Function uft8_decode() is deprecated in ... on line ...
3 and
Function utf8_encode() is deprecated in ... on line ...
Function uft8_decode() is deprecated in ... on line ...
4 function calls prior to replacing them, because more often than not, these function calls are not required, and only result in undesired outcomes.

PHP does not bundle multi-byte character encoding functions in its core, but PHP core

- utf8_encode($string);
+ iso8859_1_to_utf8($string);
8,
- utf8_encode($string);
+ iso8859_1_to_utf8($string);
9, and
- utf8_encode($string);
+ mb_convert_encoding($string, 'UTF-8', 'ISO-8859-1');
0 extensions provide a robust and accurate functionality to detect and convert character encodings. Both
- utf8_encode($string);
+ iso8859_1_to_utf8($string);
8 and
- utf8_encode($string);
+ mb_convert_encoding($string, 'UTF-8', 'ISO-8859-1');
0 are core extensions, but
- utf8_encode($string);
+ iso8859_1_to_utf8($string);
8 is used widely in modern PHP applications, and can be polyfilled as well.

If the actual use case of an existing

Function utf8_encode() is deprecated in ... on line ...
Function uft8_decode() is deprecated in ... on line ...
3 function call is to convert a known ISO-8859-1 string to UTF-8, it is possible to use
- utf8_encode($string);
+ mb_convert_encoding($string, 'UTF-8', 'ISO-8859-1');
0,
- utf8_encode($string);
+ iso8859_1_to_utf8($string);
9, or
- utf8_encode($string);
+ iso8859_1_to_utf8($string);
8 extensions to properly convert the encoding. Alternatively, it is possible to directly convert code-points to UTF-8 string as well using user-land PHP albeit with a small performance penalty.

When the use case of

Function utf8_encode() is deprecated in ... on line ...
Function uft8_decode() is deprecated in ... on line ...
3 is to automatically detect the character encoding and convert it to UTF-8, even though the function did not detect character encodings in the first place, the replacement would be detecting the character encoding first, and then converting it to UTF-8.


ISO-8859-1 to UTF-8Any encoding to UTF-8PHP Standard FunctionsN/AWith
- utf8_encode($string);
+ iso8859_1_to_utf8($string);
8With
- utf8_encode($string);
+ iso8859_1_to_utf8($string);
9N/AWith
- utf8_encode($string);
+ mb_convert_encoding($string, 'UTF-8', 'ISO-8859-1');
0N/A

- utf8_encode($string);
+ mb_convert_encoding($string, 'UTF-8', mb_list_encodings());
7 library that mimics the
Function utf8_encode() is deprecated in ... on line ...
Function uft8_decode() is deprecated in ... on line ...
3 functionality using standard PHP functions. For better readability and to convey the meaning of the function, it is renamed to
- utf8_encode($string);
+ mb_convert_encoding($string, 'UTF-8', mb_list_encodings());
9 in the example below.

function iso8859_1_to_utf8(string $s): string {
    $s .= $s;
    $len = \strlen($s);

    for ($i = $len >> 1, $j = 0; $i < $len; ++$i, ++$j) {
        switch (true) {
            case $s[$i] < "\x80": $s[$j] = $s[$i]; break;
            case $s[$i] < "\xC0": $s[$j] = "\xC2"; $s[++$j] = $s[$i]; break;
            default: $s[$j] = "\xC3"; $s[++$j] = \chr(\ord($s[$i]) - 64); break;
        }
    }

    return substr($s, 0, $j);
}

With the function above declared in application code, it is now possible to replace all

Function utf8_encode() is deprecated in ... on line ...
Function uft8_decode() is deprecated in ... on line ...
3 calls with the new
- utf8_encode($string);
+ mb_convert_encoding($string, 'UTF-8', mb_list_encodings());
9 function to avoid the deprecation notice:

- utf8_encode($string);
+ iso8859_1_to_utf8($string);

- utf8_encode($string);
+ iso8859_1_to_utf8($string);
8 extension, one of the most widely used optional PHP extensions, provides a cleaner and straight-forward approach to convert ISO-8859-1 encoded strings to UTF-8. This can be used to replace the
Function utf8_encode() is deprecated in ... on line ...
Function uft8_decode() is deprecated in ... on line ...
3 function deprecated in PHP 8.2.

- utf8_encode($string);
+ mb_convert_encoding($string, 'UTF-8', 'ISO-8859-1');

Without knowing the actual character encoding used in the input text, it might lead to erroneous results when PHP is forced to detect the input character encoding. However, it is possible to make a reasonable guess of the source character encoding and convert it to UTF-8 using

- utf8_encode($string);
+ iso8859_1_to_utf8($string);
8 extension.

- utf8_encode($string);
+ mb_convert_encoding($string, 'UTF-8', mb_list_encodings());

The

- utf8_encode($string);
+ UConverter::transcode($latin1, 'UTF8', 'ISO-8859-1');
8 class in the
- utf8_encode($string);
+ iso8859_1_to_utf8($string);
9 extension also provides a way to convert character encodings from one to another. It follows a similar function signature as as well. Using
- utf8_encode($string);
+ iconv('ISO-8859-1', 'UTF-8', $string);
1, it is possible to replicate
Function utf8_encode() is deprecated in ... on line ...
Function uft8_decode() is deprecated in ... on line ...
3 functionality:

- utf8_encode($string);
+ UConverter::transcode($latin1, 'UTF8', 'ISO-8859-1');

Applications that can use the

- utf8_encode($string);
+ mb_convert_encoding($string, 'UTF-8', 'ISO-8859-1');
0 extension can replace the
Function utf8_encode() is deprecated in ... on line ...
Function uft8_decode() is deprecated in ... on line ...
3 function using
- utf8_encode($string);
+ mb_convert_encoding($string, 'UTF-8', 'ISO-8859-1');
0 function:

- utf8_encode($string);
+ iconv('ISO-8859-1', 'UTF-8', $string);

Function utf8_encode() is deprecated in ... on line ...
Function uft8_decode() is deprecated in ... on line ...
4 function decodes a UTF-8 encoded string to ISO-8859-1. With the
Function utf8_encode() is deprecated in ... on line ...
Function uft8_decode() is deprecated in ... on line ...
4 function deprecated, it is possible to replicate this functionality using PHP standard functions,
- utf8_encode($string);
+ iso8859_1_to_utf8($string);
8 extension,
- utf8_encode($string);
+ iso8859_1_to_utf8($string);
9 extension, or
- utf8_encode($string);
+ mb_convert_encoding($string, 'UTF-8', 'ISO-8859-1');
0 extension.


UTF-8 to ISO-8859-1PHP Standard FunctionsWith
- utf8_encode($string);
+ iso8859_1_to_utf8($string);
8With
- utf8_encode($string);
+ iso8859_1_to_utf8($string);
9With
- utf8_encode($string);
+ mb_convert_encoding($string, 'UTF-8', 'ISO-8859-1');
0

Similar the the

Function utf8_encode() is deprecated in ... on line ...
Function uft8_decode() is deprecated in ... on line ...
3 polyfill,
- utf8_encode($string);
+ mb_convert_encoding($string, 'UTF-8', mb_list_encodings());
7 library that mimics the
Function utf8_encode() is deprecated in ... on line ...
Function uft8_decode() is deprecated in ... on line ...
4 functionality:

function utf8_to_iso8859_1(string $string): string {
    $s = (string) $string;
    $len = \strlen($s);

    for ($i = 0, $j = 0; $i < $len; ++$i, ++$j) {
        switch ($s[$i] & "\xF0") {
            case "\xC0":
            case "\xD0":
                $c = (\ord($s[$i] & "\x1F") << 6) | \ord($s[++$i] & "\x3F");
                $s[$j] = $c < 256 ? \chr($c) : '?';
                break;

            case "\xF0":
                ++$i;
                // no break

            case "\xE0":
                $s[$j] = '?';
                $i += 2;
                break;

            default:
                $s[$j] = $s[$i];
        }
    }

    return substr($s, 0, $j);
}

With the function above included, it is now possible to replace

Function utf8_encode() is deprecated in ... on line ...
Function uft8_decode() is deprecated in ... on line ...
4 calls with the new
- utf8_decode($string);
+ utf8_to_iso8859_1($string);
3 function:

- utf8_decode($string);
+ utf8_to_iso8859_1($string);

Using

- utf8_encode($string);
+ iso8859_1_to_utf8($string);
8, the following example replaces the deprecated
Function utf8_encode() is deprecated in ... on line ...
Function uft8_decode() is deprecated in ... on line ...
4 function with
- utf8_decode($string);
+ utf8_to_iso8859_1($string);
7:

Function utf8_encode() is deprecated in ... on line ...
Function uft8_decode() is deprecated in ... on line ...
0

With help of

- utf8_encode($string);
+ iconv('ISO-8859-1', 'UTF-8', $string);
1 in the
- utf8_encode($string);
+ iso8859_1_to_utf8($string);
9 extension, the following example shows a
Function utf8_encode() is deprecated in ... on line ...
Function uft8_decode() is deprecated in ... on line ...
4 replacement:

Function utf8_encode() is deprecated in ... on line ...
Function uft8_decode() is deprecated in ... on line ...
1

- utf8_encode($string);
+ mb_convert_encoding($string, 'UTF-8', 'ISO-8859-1');
0 function can also be used to mimic and replace the
Function utf8_encode() is deprecated in ... on line ...
Function uft8_decode() is deprecated in ... on line ...
4 functionality to avoid the
Function utf8_encode() is deprecated in ... on line ...
Function uft8_decode() is deprecated in ... on line ...
4 deprecation in PHP 8.2:

Function utf8_encode() is deprecated in ... on line ...
Function uft8_decode() is deprecated in ... on line ...
2

Backwards Compatibility Impact

Function utf8_encode() is deprecated in ... on line ...
Function uft8_decode() is deprecated in ... on line ...
3 and
Function utf8_encode() is deprecated in ... on line ...
Function uft8_decode() is deprecated in ... on line ...
4 functions are sometimes used in legacy PHP applications and applications that process incoming data and files with various character encodings. These functions are deprecated in PHP 8.2, and will be removed in PHP 9.0 because these functions are misleadingly named, and are prone to unexpected and undesired results that emit no warnings or errors.

Since PHP 8.2 and later, using these functions result in a deprecation notice for each time the functions are called.

Function utf8_encode() is deprecated in ... on line ...
Function uft8_decode() is deprecated in ... on line ...
3 and
Function utf8_encode() is deprecated in ... on line ...
Function uft8_decode() is deprecated in ... on line ...
4 functions are to be removed from PHP in PHP 9.0.

A large number of applications that use these functions use them without being aware that they only work with ISO-8859-1 character encoding and nothing else for the source character encoding. It is possible that the ideal fix for the deprecation is to see why these functions are used in the first place, and determine if they are absolutely necessary.

Depending on the availability of PHP extensions and the willingness to use a somewhat slower PHP implementation, it is possible to replace

Function utf8_encode() is deprecated in ... on line ...
Function uft8_decode() is deprecated in ... on line ...
3 and
Function utf8_encode() is deprecated in ... on line ...
Function uft8_decode() is deprecated in ... on line ...
4 function calls.

How to set encoding to UTF

PHP UTF-8 Encoding – modifications to your php. The first thing you need to do is to modify your php. ini file to use UTF-8 as the default character set: default_charset = "utf-8"; (Note: You can subsequently use phpinfo() to verify that this has been set properly.)

What can I use instead of UTF

Replacements for utf8_encode If the actual use case of an existing utf8_encode function call is to convert a known ISO-8859-1 string to UTF-8, it is possible to use iconv , intl , or mbstring extensions to properly convert the encoding.

How to encode string in PHP?

The base64_encode() function is an inbuilt function in PHP which is used to Encodes data with MIME base64. MIME (Multipurpose Internet Mail Extensions) base64 is used to encode the string in base64.

How to convert ASCII to UTF

If we know that the current encoding is ASCII, the 'iconv' function can be used to convert ASCII to UTF-8. The original string can be passed as a parameter to the iconv function to encode it to UTF-8.