Cara menggunakan remove unicode php

Given a string which contains printable and not-printable characters. The task is to remove all non-printable characters from the string. Space ( ) is first printable char and tilde (~) is last printable ASCII characters. So the task is to replace all characters which do fall in that range means to take only those char which occur in range(32-127). This task is done by only different type regex expression. Example:

Input: str = "\n\nGeeks \n\n\n\tfor Geeks\n\t"
Output: Geeks for Geeks

Note: Newline (\n) and tab (\t) are commands not printable character. Method 1: Using general regular expression: There are many regex available. The best solution is to strip all non-ASCII characters from the input string, that can be done with this preg_replace. Example: 

php




<?PHP

// PHP program to remove all non-printable

// character from string

 

// String with non printable characters

$str = "Geeks šžfor

Geeks for Geeks
0

 

Geeks for Geeks
1

Geeks for Geeks
2

$str

Geeks for Geeks
4
Geeks for Geeks
5
Geeks for Geeks
6
Geeks for Geeks
7
Geeks for Geeks
6$str
Geeks for Geeks
0

 

Geeks for Geeks
1

Geeks for Geeks
2
Geeks for Geeks
3$str
Geeks for Geeks
0

 

Geeks for Geeks
6

Output:

Geeks for Geeks

Method 2: Use the ‘print’ regex: Other possible solution is to use the print regular expression. The [:print:] regular expression stands for “any printable character”. Example: 

I don’t know of any built-in PHP functions to remove all non-printable characters from a string, so the solution is to use the preg_replace function with an appropriate regular expression.

Solution: Allow only ASCII characters

For my purposes I don’t have to work with Unicode characters, so one of the best solutions for my purposes is to strip all non-ASCII characters from the input string. That can be done with this preg_replace code:

$result = preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $string);

That code removes any characters in the hex ranges 0-31 and 128-255, leaving only the hex characters 32-127 in the resulting string, which I call

myprompt> php -a
Interactive shell

php > $string = "‘Hello,’ she said.";
php > $result = preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $string);
php > echo $result;
Hello, she said.
0 in this example.

You can see how this works in the interactive PHP shell. In this example I just want to get rid of the characters ‘ and ’, which don’t work well in my current application:

myprompt> php -a
Interactive shell

php > $string = "‘Hello,’ she said.";
php > $result = preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $string);
php > echo $result;
Hello, she said.

As you can see, the characters ‘ and ’ are not in the

myprompt> php -a
Interactive shell

php > $string = "‘Hello,’ she said.";
php > $result = preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $string);
php > echo $result;
Hello, she said.
0 string.

Note: You can read more about hex and octal character sequences on this php.net page.

Also note that if you prefer octal characters to hexadecimal characters, this code should work as well:

$result = preg_replace('/[\000-\031\200-\377]/', '', $string);

I just tested that on my example and it worked fine, but I haven’t tested it with other strings. (This page is a good resource for basic octal and hex values.)

Possible solution: Use the 'print' regex

Another possible solution is to use the ‘print’ regular expression shown in this example with preg_replace:

$result = preg_replace('/[[:^print:]]/', "", $string);

Per the PHP regex doc, the

myprompt> php -a
Interactive shell

php > $string = "‘Hello,’ she said.";
php > $result = preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $string);
php > echo $result;
Hello, she said.
3 regex stands for “any printable character,” so for my example I thought it would leave the ‘ and ’ characters in the resulting string, but to my surprise the output looks like this:

php > $string = "‘Hello,’ she said.";
php > $result = preg_replace('/[[:^print:]]/', "", $string);
php > echo $result;
?Hello,? she said.

I don’t know why that regex ends up putting

myprompt> php -a
Interactive shell

php > $string = "‘Hello,’ she said.";
php > $result = preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $string);
php > echo $result;
Hello, she said.
4 characters in the resulting string, so at the moment I’m calling this a “possible solution” rather than a solution. Note that if you just
myprompt> php -a
Interactive shell

php > $string = "‘Hello,’ she said.";
php > $result = preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $string);
php > echo $result;
Hello, she said.
5 out the original string, it prints fine:

php > echo $string;
‘Hello,’ she said.

More solutions (Unicode)

As I mentioned, I don’t currently have to concern myself with Unicode characters, so the original ASCII character solution I showed works fine for me. If you do need to handle Unicode characters, this SO page shows a possible solution.

More PHP regular expressions

Finally, while I’m in the neighborhood, here’s a list of PHP “range” regular expressions from the php.net regex page. As the “range” name implies, these patterns can be used to match ranges of characters in PHP strings:

[:digit:]      Only the digits 0 to 9
[:alnum:]      Any alphanumeric character 0 to 9 OR A to Z or a to z.
[:alpha:]      Any alpha character A to Z or a to z.
[:blank:]      Space and TAB characters only.
[:xdigit:]     .
[:punct:]      Punctuation symbols . , " ' ? ! ; :
[:print:]      Any printable character.
[:space:]      Any space characters.
[:graph:]      .
[:upper:]      Any alpha character A to Z.
[:lower:]      Any alpha character a to z.
[:cntrl:]      .

As shown in my earlier example, you actually need to use two brackets with these regex patterns when using preg_replace:

$result = preg_replace('/[[:^print:]]/', "", $string);

Summary

In summary, if you wanted to see how to remove non-printable characters from strings in PHP, I hope these examples are helpful.