Old link to Oniguruma regex syntax is not working anymore, there is a working one:
https://github.com/geoffgarside/oniguruma/blob/master/Syntax.txt
(PHP 4 >= 4.2.0, PHP 5, PHP 7, PHP 8)
mb_ereg — Correspondência de expressão regular com suporte multibyte
Executa a correspondência de expressão regular com suporte multibyte..
pattern
O padrão de busca.
string
A string de busca.
matches
Se forem encontradas correspondências para substrings entre parênteses de
pattern
e a função for chamada com o terceiro argumento matches
,
as correspondências serão armazenadas
nos elementos do array matches
.
Se não forem encontradas correspondências, matches
é definido como um
array vazio.
$matches[1] conterá a substring que começa no primeiro parêntese à esquerda; $matches[2] conterá a substring que começa no segundo parêntese, e assim por diante. $matches[0] conterá uma cópia da string completa correspondida.
Retorna se pattern
corresponde a string
.
Versão | Descrição |
---|---|
8.0.0 |
Esta função agora retorna true em caso de sucesso. Anteriormente, ela retornava o comprimento em bytes
da string correspondida se uma correspondência para pattern fosse encontrada em
string e matches fosse passado.
Se o parâmetro opcional matches não fosse passado ou
o comprimento da string correspondida fosse 0 , esta função retornava 1 .
|
7.1.0 |
mb_ereg() agora define matches como
um array vazio se nada for correspondido. Anteriormente,
matches não era modificado nesse caso.
|
Nota:
A codificação interna ou a codificação de caracteres especificada por mb_regex_encoding() será usada como a codificação de caracteres para esta função.
Old link to Oniguruma regex syntax is not working anymore, there is a working one:
https://github.com/geoffgarside/oniguruma/blob/master/Syntax.txt
Note that mb_ereg() does not support the \uFFFF unicode syntax but uses \x{FFFF} instead:
<?PHP
$text = 'Peter is a boy.'; // english
$text = 'بيتر هو صبي.'; // arabic
//$text = 'פיטר הוא ילד.'; // hebrew
mb_regex_encoding('UTF-8');
if(mb_ereg('[\x{0600}-\x{06FF}]', $text)) // arabic range
//if(mb_ereg('[\x{0590}-\x{05FF}]', $text)) // hebrew range
{
echo "Text has some arabic/hebrew characters.";
}
else
{
echo "Text doesnt have arabic/hebrew characters.";
}
?>
One of the differences between preg_match() & mb_ereg()
about "captured parenthesized subpattern".
<?php
preg_match('/(abc)(.*)/', 'abc', $match);
var_dump($match);
mb_ereg('(abc)(.*)', 'abc', $match);
var_dump($match);
?>
array(3) {
[0]=>
string(3) "abc"
[1]=>
string(3) "abc"
[2]=>
string(0) "" // <-- "string"(0) "" : preg_match()
}
array(3) {
[0]=>
string(3) "abc"
[1]=>
string(3) "abc"
[2]=>
bool(false) // <-- "bool"(false) : mb_ereg()
}
If adding ".*" at the end of the pattern returns "false"
whereas only one "." returns "true",
Suspect the string is too long for the pattern matching.
In this case, using preg_match() returns "true" when putting ".*"
, but adding more "$" or "\z" returns "false" as expected.
mb_ereg() with a named-subpattern
never catches non-named-subpattern.
(Oniguruma's restriction)
<?php
$str = 'abcdefg';
$patternA = '\A(abcd)(.*)\z'; // both caught [1]abcd [2]efg
$patternB = '\A(abcd)(?<rest>.*)\z'; // non-named 'abcd' never caught
mb_ereg($patternA, $str, $match);
echo '<pre>'.print_r($match, true).'</pre>';
mb_ereg($patternB, $str, $match);
echo '<pre>'.print_r($match, true).'</pre>';
?>
Array
(
[0] => abcdefg
[1] => abcd
[2] => efg
)
Array
(
[0] => abcdefg
[1] => efg
[rest] => efg
)
<?php
# What mb_ereg() returns & changes $_3rd_argument into
# (Just run this script)
function dump2str($var) {
ob_start();
var_dump($var);
$output = ob_get_contents();
ob_end_clean();
return $output;
}
# (PHP7)empty pattern returns bool(false) with Warning
# (PHP8)empty pattern throws ValueError
$emp_ptn = '';
try{
$emp_ptn.= dump2str(mb_ereg('', 'abcde'));
}catch(Exception | Error $e){
$emp_ptn.= get_class($e).'<br>';
$emp_ptn.= $e->getMessage();
$emp_ptn.= '<pre>'.$e->getTraceAsString().'</pre>';
}
echo
'PHP '.phpversion().'<br><br>'.
'# match<br>'.
dump2str(mb_ereg("bcd", "abcde")).
' : mb_ereg("bcd", "abcde")<br><br>'.
'# match with 3rd argument<br>'.
dump2str(mb_ereg("bcd", "abcde", $_3rd)).
' : mb_ereg("bcd", "abcde", $_3rd) // '.dump2str($_3rd).'<br><br>'.
'# match (0 byte)<br>'.
dump2str(mb_ereg("^", "abcde")).
' : mb_ereg("^", "abcde")<br><br>'.
'# match (0 byte) with 3rd argument<br>'.
dump2str(mb_ereg("^", "abcde", $_3rd)).
' : mb_ereg("^", "abcde", $_3rd) // '.dump2str($_3rd).'<br><br>'.
'# unmatch<br>'.
dump2str(mb_ereg("f", "abcde")).
' : mb_ereg("f", "abcde")<br><br>'.
'# unmatch with 3rd argument<br>'.
dump2str(mb_ereg("f", "abcde", $_3rd)).
' : mb_ereg("f", "abcde", $_3rd) // '.dump2str($_3rd).'<br><br>'.
'# empty pattern<br>'.
$emp_ptn.
' : mb_ereg("", "abcde")<br><br>'.
'# empty pattern with 3rd argument<br>'.
$emp_ptn.
' : mb_ereg("", "abcde", $_3rd) // '.dump2str($_3rd).'<br><br>';
?>
I hope this information is shown somewhere on php.net.
According to "https://github.com/php/php-src/tree/PHP-5.6/ext/mbstring/oniguruma",
the bundled Oniguruma regex library version seems ...
4.7.1 between PHP 5.3 - 5.4.45,
5.9.2 between PHP 5.5 - 7.1.16,
6.3.0 since PHP 7.2 - .
mb_ereg() seems unable to Use "named subpattern".
preg_match() seems a substitute only in UTF-8 encoding.
<?php
$text = 'multi_byte_string';
$pattern = '.*(?<name>string).*'; // "?P" causes "mbregex compile err" in PHP 5.3.5
if(mb_ereg($pattern, $text, $matches)){
echo '<pre>'.print_r($matches, true).'</pre>';
}else{
echo 'no match';
}
?>
This code ignores "?<name>" in $pattern and displays below.
Array
(
[0] => multi_byte_string
[1] => string
)
$pattern = '/.*(?<name>string).*/u';
if(preg_match($pattern, $text, $matches)){
instead of lines 2 & 3
displays below (in UTF-8 encoding).
Array
(
[0] => multi_byte_string
[name] => string
[1] => string
)
<?php
// in PHP_VERSION 7.1
// WITHOUT $regs (3rd argument)
$int = mb_ereg('abcde', '_abcde_'); // [5 bytes match]
var_dump($int); // int(1)
$int = mb_ereg('ab', '_ab_'); // [2 bytes match]
var_dump($int); // int(1)
$int = mb_ereg('^', '_ab_'); // [0 bytes match]
var_dump($int); // int(1)
$int = mb_ereg('ab', '__'); // [not match]
var_dump($int); // bool(false)
$int = mb_ereg('', '_ab_'); // [error : empty pattern]
// Warning: mb_ereg(): empty pattern in ...
var_dump($int); // bool(false)
$int = mb_ereg('ab'); // [error : fewer arguments]
// Warning: mb_ereg() expects at least 2 parameters, 1 given in ...
var_dump($int); // bool(false)
// Without 3rd argument, mb_ereg() returns either int(1) or bool(false).
// WITH $regs (3rd argument)
$int = mb_ereg('abcde', '_abcde_', $regs);// [5 bytes match]
var_dump($int); // int(5)
var_dump($regs); // array(1) { [0]=> string(5) "abcde" }
$int = mb_ereg('ab', '_ab_', $regs); // [2 bytes match]
var_dump($int); // int(2)
var_dump($regs); // array(1) { [0]=> string(2) "ab" }
$int = mb_ereg('^', '_ab_', $regs); // [0 bytes match]
var_dump($int); // int(1)
var_dump($regs); // array(1) { [0]=> bool(false) }
$int = mb_ereg('ab', '__', $regs); // [not match]
var_dump($int); // bool(false)
var_dump($regs); // array(0) { }
$int = mb_ereg('', '_ab_', $regs); // [error : empty pattern]
// Warning: mb_ereg(): empty pattern in ...
var_dump($int); // bool(false)
var_dump($regs); // array(0) { }
$int = mb_ereg('ab'); // [error : fewer arguments]
// Warning: mb_ereg() expects at least 2 parameters, 1 given in ...
var_dump($int); // bool(false)
var_dump($regs); // array(0) { }
// With 3rd argument, mb_ereg() returns either int(how many bytes matched) or bool(false)
// and 3rd argument is a bit complicated.
?>
While hardly mentioned anywhere, it may be useful to note that mb_ereg uses Oniguruma library internally. The syntax for the default mode (ruby) is described here:
http://www.geocities.jp/kosako3/oniguruma/doc/RE.txt
Hebrew regex tested on PHP 5, Ubuntu 8.04.
Seems to work fine without the mb_regex_encoding lines (commented out).
Didn't seem to work with \uxxxx (also commented out).
<?php
echo "Line ";
//mb_regex_encoding("ISO-8859-8");
//if(mb_ereg(".*([\u05d0-\u05ea]).*", $this->current_line))
if(mb_ereg(".*([א-ת]).*", $this->current_line))
{
echo "has";
}
else
{
echo "doesn't have";
}
echo " Hebrew characters.<br>";
//mb_regex_encoding("UTF-8");
?>