Given a string which contains printable and not-printable characters. The task is to remove all non-printable characters from the string. Space ( ) is first printable char and tilde (~) is last printable ASCII characters. So the task is to replace all characters which do fall in that range means to take only those char which occur in range(32-127). This task is done by only different type regex expression. Example: Show Input: str = "\n\nGeeks \n\n\n\tfor Geeks\n\t" Output: Geeks for Geeks Note: Newline (\n) and tab (\t) are commands not printable character. Method 1: Using general regular expression: There are many regex available. The best solution is to strip all non-ASCII characters from the input string, that can be done with this preg_replace. Example: php
Geeks for Geeks0
Geeks for Geeks1 Geeks for Geeks2
Geeks for Geeks4 Geeks for Geeks5 Geeks for Geeks6 Geeks for Geeks7 Geeks for Geeks6 $str Geeks for Geeks0
Geeks for Geeks1 Geeks for Geeks2 Geeks for Geeks3 $str Geeks for Geeks0
Geeks for Geeks6 Output: Geeks for Geeks Method 2: Use the ‘print’ regex: Other possible solution is to use the print regular expression. The [:print:] regular expression stands for “any printable character”. Example: I don’t know of any built-in PHP functions to remove all non-printable characters from a string, so the solution is to use the Solution: Allow only ASCII charactersFor my purposes I don’t have to work with Unicode characters, so one of the best solutions for my purposes is to strip all non-ASCII characters from the input string. That can be done with this $result = preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $string); That code removes any characters in the hex ranges 0-31 and 128-255, leaving only the hex characters 32-127 in the resulting string, which I call myprompt> php -a Interactive shell php > $string = "‘Hello,’ she said."; php > $result = preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $string); php > echo $result; Hello, she said.0 in this example. You can see how this works in the interactive PHP shell. In this example I just want to get rid of the characters ‘ and ’, which don’t work well in my current application: myprompt> php -a Interactive shell php > $string = "‘Hello,’ she said."; php > $result = preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $string); php > echo $result; Hello, she said. As you can see, the characters ‘ and ’ are not in the myprompt> php -a Interactive shell php > $string = "‘Hello,’ she said."; php > $result = preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $string); php > echo $result; Hello, she said.0 string.
Also note that if you prefer octal characters to hexadecimal characters, this code should work as well: $result = preg_replace('/[\000-\031\200-\377]/', '', $string); I just tested that on my example and it worked fine, but I haven’t tested it with other strings. (This page is a good resource for basic octal and hex values.) Possible solution: Use the 'print' regexAnother possible solution is to use the ‘print’ regular expression shown in this example with $result = preg_replace('/[[:^print:]]/', "", $string); Per the PHP regex doc, the myprompt> php -a Interactive shell php > $string = "‘Hello,’ she said."; php > $result = preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $string); php > echo $result; Hello, she said.3 regex stands for “any printable character,” so for my example I thought it would leave the ‘ and ’ characters in the resulting string, but to my surprise the output looks like this: php > $string = "‘Hello,’ she said."; php > $result = preg_replace('/[[:^print:]]/', "", $string); php > echo $result; ?Hello,? she said. I don’t know why that regex ends up putting myprompt> php -a Interactive shell php > $string = "‘Hello,’ she said."; php > $result = preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $string); php > echo $result; Hello, she said.4 characters in the resulting string, so at the moment I’m calling this a “possible solution” rather than a solution. Note that if you just myprompt> php -a Interactive shell php > $string = "‘Hello,’ she said."; php > $result = preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $string); php > echo $result; Hello, she said.5 out the original string, it prints fine: php > echo $string; ‘Hello,’ she said. More solutions (Unicode)As I mentioned, I don’t currently have to concern myself with Unicode characters, so the original ASCII character solution I showed works fine for me. If you do need to handle Unicode characters, this SO page shows a possible solution. More PHP regular expressionsFinally, while I’m in the neighborhood, here’s a list of PHP “range” regular expressions from the php.net regex page. As the “range” name implies, these patterns can be used to match ranges of characters in PHP strings: [:digit:] Only the digits 0 to 9 [:alnum:] Any alphanumeric character 0 to 9 OR A to Z or a to z. [:alpha:] Any alpha character A to Z or a to z. [:blank:] Space and TAB characters only. [:xdigit:] . [:punct:] Punctuation symbols . , " ' ? ! ; : [:print:] Any printable character. [:space:] Any space characters. [:graph:] . [:upper:] Any alpha character A to Z. [:lower:] Any alpha character a to z. [:cntrl:] . As shown in my earlier example, you actually need to use two brackets with these regex patterns when using $result = preg_replace('/[[:^print:]]/', "", $string); SummaryIn summary, if you wanted to see how to remove non-printable characters from strings in PHP, I hope these examples are helpful. |