To highlight words in multi-byte text:
<?php
$s = 'Алабала';
$f = 'а';
echo preg_replace('/('.$f.')/iu', '<b>$1</b>', $s);
?>
(PHP 4 >= 4.2.0, PHP 5, PHP 7)
mb_eregi_replace — Replace regular expression with multibyte support ignoring case
$pattern
, string $replacement
, string $string
, string|null $options
= null
) : string|false|null
Scans string
for matches to
pattern
, then replaces the matched text
with replacement
.
pattern
The regular expression pattern. Multibyte characters may be used. The case will be ignored.
replacement
The replacement text.
string
The searched string.
options
The resultant string or false
on error.
If string
is not valid for the current encoding, null
is returned.
Versiune | Descriere |
---|---|
8.0.0 |
options is nullable now.
|
7.1.0 |
The function checks whether string is valid for the
current encoding.
|
7.1.0 |
The e modifier has been deprecated.
|
Notă:
Codificarea internă a caracterelor sau codificarea caracterelor specificată de funcția mb_regex_encoding() va fi utilizată în calitate de codificare a caracterelor pentru această funcție.
Niciodată nu utilizați modificatorul e
când lucrați cu
date introduse ce nu provin din surse de încredere. Nu va avea loc evadarea
automată (cum este cazul la preg_replace()). Ignorarea
acestui fapt va crea probabil vulnerabilități de executare a codului la
distanță în aplicația dumneavoastră.
To highlight words in multi-byte text:
<?php
$s = 'Алабала';
$f = 'а';
echo preg_replace('/('.$f.')/iu', '<b>$1</b>', $s);
?>
Transliterator for cyrillic-to-latin letters for UTF chars:
<?php
function do_translit($st) {
$replacement = array(
"й"=>"i","ц"=>"c","у"=>"u","к"=>"k","е"=>"e","н"=>"n",
"г"=>"g","ш"=>"sh","щ"=>"sh","з"=>"z","х"=>"x","ъ"=>"\'",
"ф"=>"f","ы"=>"i","в"=>"v","а"=>"a","п"=>"p","р"=>"r",
"о"=>"o","л"=>"l","д"=>"d","ж"=>"zh","э"=>"ie","ё"=>"e",
"я"=>"ya","ч"=>"ch","с"=>"c","м"=>"m","и"=>"i","т"=>"t",
"ь"=>"\'","б"=>"b","ю"=>"yu",
"Й"=>"I","Ц"=>"C","У"=>"U","К"=>"K","Е"=>"E","Н"=>"N",
"Г"=>"G","Ш"=>"SH","Щ"=>"SH","З"=>"Z","Х"=>"X","Ъ"=>"\'",
"Ф"=>"F","Ы"=>"I","В"=>"V","А"=>"A","П"=>"P","Р"=>"R",
"О"=>"O","Л"=>"L","Д"=>"D","Ж"=>"ZH","Э"=>"IE","Ё"=>"E",
"Я"=>"YA","Ч"=>"CH","С"=>"C","М"=>"M","И"=>"I","Т"=>"T",
"Ь"=>"\'","Б"=>"B","Ю"=>"YU",
);
foreach($replacement as $i=>$u) {
$st = mb_eregi_replace($i,$u,$st);
}
return $st;
}
?>
when trying to find a way to strip newline from a multibyte UTF-8 string i got to this function just to discover later that POSIX don't "do" newline so i can't strip them, examples of what i tried are : \r\n , \\r\\n , (\\r\\n) (\\r|\\n)
and got no result
so since i wanted something like mb_nl2br() that's simple i wrote this little recursive function for UTF-8:
<?php
function mb_str_replace($find,$replace,&$str)
{
$i = mb_strpos($str,$find, 0,"UTF-8");
if ($index===false) {return;}
$str = mb_substr($str, 0,$i).$replace.mb_substr($str, $i+mb_strlen($find,"UTF-8"),mb_strlen($str,"UTF-8"));
$this->mb_str_replace($find,$replace,$str);
}
?>
note: moderate unit tesing was done, changed to other encodings