Edit Report a Bug

mb_ereg

(PHP 4 >= 4.2.0, PHP 5, PHP 7, PHP 8)

mb_ereg — Regular expression match with multibyte support

Beschreibung

mb_ereg(string $pattern, string $string, array &$matches = null): bool

Executes the regular expression match with multibyte support.

Parameter-Liste

pattern

The search pattern.

string

The search string.

matches

If matches are found for parenthesized substrings of pattern and the function is called with the third argument matches, the matches will be stored in the elements of the array matches. If no matches are found, matches is set to an empty array.

$matches[1] will contain the substring which starts at the first left parenthesis; $matches[2] will contain the substring starting at the second, and so on. $matches[0] will contain a copy of the complete string matched.

Rückgabewerte

Returns whether pattern matches string.

Changelog

Version	Beschreibung
8.0.0	This function returns `true` on success now. Previously, it returned the byte length of the matched string if a match for `pattern` was found in `string` and `matches` was passed. If the optional parameter `matches` was not passed or the length of the matched string was `0`, this function returned `1`.
7.1.0	mb_ereg() will now set `matches` to an empty array, if nothing matched. Formerly, `matches` was not modified in that case.

Anmerkungen

Hinweis:
Die interne Kodierung oder die mit mb_regex_encoding() festgelegte Zeichenkodierung wird als Zeichenkodierung für diese Funktion genutzt.

Siehe auch

mb_regex_encoding() - Set/Get character encoding for multibyte regex
mb_eregi() - Regular expression match ignoring case with multibyte support

add a note

User Contributed Notes 11 notes

down

Anonymous ¶

7 years ago


Old link to Oniguruma regex syntax is not working anymore, there is a working one:
https://github.com/geoffgarside/oniguruma/blob/master/Syntax.txt

down

pressler at hotmail dot de ¶

12 years ago


Note that mb_ereg() does not support the \uFFFF unicode syntax but uses \x{FFFF} instead:

<?PHP

$text = 'Peter is a boy.'; // english
$text = 'بيتر هو صبي.'; // arabic
//$text = 'פיטר הוא ילד.'; // hebrew

mb_regex_encoding('UTF-8');

if(mb_ereg('[\x{0600}-\x{06FF}]', $text)) // arabic range
//if(mb_ereg('[\x{0590}-\x{05FF}]', $text)) // hebrew range
{
    echo "Text has some arabic/hebrew characters.";
}
else
{
    echo "Text doesnt have arabic/hebrew characters.";
}

?>

down

Anonymous ¶

3 years ago


One of the differences between preg_match() & mb_ereg()
about "captured parenthesized subpattern".

<?php

preg_match('/(abc)(.*)/', 'abc', $match);
var_dump($match);

mb_ereg('(abc)(.*)', 'abc', $match);
var_dump($match);

?>

array(3) {
  [0]=>
  string(3) "abc"
  [1]=>
  string(3) "abc"
  [2]=>
  string(0) ""       // <-- "string"(0) "" : preg_match()
}

array(3) {
  [0]=>
  string(3) "abc"
  [1]=>
  string(3) "abc"
  [2]=>
  bool(false)       // <-- "bool"(false) : mb_ereg()
}

down

Anonymous ¶

2 years ago


If adding ".*" at the end of the pattern returns "false" 
whereas only one "." returns "true",

Suspect the string is too long for the pattern matching.

In this case, using preg_match() returns "true" when putting ".*" 
, but adding more "$" or "\z" returns "false" as expected.

down

Anonymous ¶

3 years ago


mb_ereg() with a named-subpattern
never catches non-named-subpattern.
(Oniguruma's restriction)

<?php

$str = 'abcdefg';
$patternA = '\A(abcd)(.*)\z';        // both caught [1]abcd [2]efg
$patternB = '\A(abcd)(?<rest>.*)\z'; // non-named 'abcd' never caught

mb_ereg($patternA, $str, $match);
echo '<pre>'.print_r($match, true).'</pre>';

mb_ereg($patternB, $str, $match);
echo '<pre>'.print_r($match, true).'</pre>';
?>

Array
(
    [0] => abcdefg
    [1] => abcd
    [2] => efg
)

Array
(
    [0] => abcdefg
    [1] => efg
    [rest] => efg
)

down

Anonymous ¶

4 years ago


<?php

# What mb_ereg() returns & changes $_3rd_argument into
# (Just run this script)

function dump2str($var) {
    ob_start();
    var_dump($var);
    $output = ob_get_contents();
    ob_end_clean();
    return $output;
}

# (PHP7)empty pattern returns bool(false) with Warning
# (PHP8)empty pattern throws ValueError
    $emp_ptn = '';
try{
    $emp_ptn.=  dump2str(mb_ereg('', 'abcde'));
}catch(Exception | Error $e){
    $emp_ptn.=  get_class($e).'<br>';
    $emp_ptn.=  $e->getMessage();
    $emp_ptn.=  '<pre>'.$e->getTraceAsString().'</pre>';
}

echo
'PHP '.phpversion().'<br><br>'.

'# match<br>'.
dump2str(mb_ereg("bcd", "abcde")).
' : mb_ereg("bcd", "abcde")<br><br>'.

'# match with 3rd argument<br>'.
dump2str(mb_ereg("bcd", "abcde", $_3rd)).
' : mb_ereg("bcd", "abcde", $_3rd)    // '.dump2str($_3rd).'<br><br>'.

'# match (0 byte)<br>'.
dump2str(mb_ereg("^", "abcde")).
' : mb_ereg("^", "abcde")<br><br>'.

'# match (0 byte) with 3rd argument<br>'.
dump2str(mb_ereg("^", "abcde", $_3rd)).
' : mb_ereg("^", "abcde", $_3rd)    // '.dump2str($_3rd).'<br><br>'.

'# unmatch<br>'.
dump2str(mb_ereg("f", "abcde")).
' : mb_ereg("f", "abcde")<br><br>'.

'# unmatch with 3rd argument<br>'.
dump2str(mb_ereg("f", "abcde", $_3rd)).
' : mb_ereg("f", "abcde", $_3rd)    // '.dump2str($_3rd).'<br><br>'.

'# empty pattern<br>'.
$emp_ptn.
' : mb_ereg("", "abcde")<br><br>'.

'# empty pattern with 3rd argument<br>'.
$emp_ptn.
' : mb_ereg("", "abcde", $_3rd)    // '.dump2str($_3rd).'<br><br>';

?>

down

lastuser at example dot com ¶

6 years ago


I hope this information is shown somewhere on php.net.

According to "https://github.com/php/php-src/tree/PHP-5.6/ext/mbstring/oniguruma",
the bundled Oniguruma regex library version seems ...
 4.7.1 between PHP 5.3 - 5.4.45,
 5.9.2 between PHP 5.5 - 7.1.16,
 6.3.0 since PHP 7.2 - .

down

mb_ereg() seems unable to Use "named sub ¶

9 years ago


mb_ereg() seems unable to Use "named subpattern".
preg_match() seems a substitute only in UTF-8 encoding.

<?php

$text = 'multi_byte_string';
$pattern = '.*(?<name>string).*';        // "?P" causes "mbregex compile err" in PHP 5.3.5

if(mb_ereg($pattern, $text, $matches)){
    echo '<pre>'.print_r($matches, true).'</pre>';
}else{
    echo 'no match';
}

?>

This code ignores "?<name>" in $pattern and displays below.

Array
(
    [0] => multi_byte_string
    [1] => string
)

$pattern = '/.*(?<name>string).*/u';
if(preg_match($pattern, $text, $matches)){

instead of lines 2 & 3
displays below (in UTF-8 encoding).

Array
(
    [0] => multi_byte_string
    [name] => string
    [1] => string
)

down

-1

Anonymous ¶

5 years ago


<?php

// in PHP_VERSION 7.1

// WITHOUT $regs (3rd argument)
$int = mb_ereg('abcde', '_abcde_'); // [5 bytes match]
var_dump($int);                     // int(1)

$int = mb_ereg('ab', '_ab_');       // [2 bytes match]
var_dump($int);                     // int(1)

$int = mb_ereg('^', '_ab_');        // [0 bytes match]
var_dump($int);                     // int(1)

$int = mb_ereg('ab', '__');         // [not match]
var_dump($int);                     // bool(false)

$int = mb_ereg('', '_ab_');         // [error : empty pattern]
                                    // Warning: mb_ereg(): empty pattern in ...
var_dump($int);                     // bool(false)

$int = mb_ereg('ab');               // [error : fewer arguments]
                                    // Warning: mb_ereg() expects at least 2 parameters, 1 given in ...
var_dump($int);                     // bool(false)

                    // Without 3rd argument, mb_ereg() returns either int(1) or bool(false).

// WITH $regs (3rd argument)
$int = mb_ereg('abcde', '_abcde_', $regs);// [5 bytes match]
var_dump($int);                           // int(5)
var_dump($regs);                          // array(1) { [0]=> string(5) "abcde" }

$int = mb_ereg('ab', '_ab_', $regs);      // [2 bytes match]
var_dump($int);                           // int(2)
var_dump($regs);                          // array(1) { [0]=> string(2) "ab" }

$int = mb_ereg('^', '_ab_', $regs);       // [0 bytes match]
var_dump($int);                           // int(1)
var_dump($regs);                          // array(1) { [0]=> bool(false) }

$int = mb_ereg('ab', '__', $regs);        // [not match]
var_dump($int);                           // bool(false)
var_dump($regs);                          // array(0) { }

$int = mb_ereg('', '_ab_', $regs);        // [error : empty pattern]
                                          // Warning: mb_ereg(): empty pattern in ...
var_dump($int);                           // bool(false)
var_dump($regs);                          // array(0) { }

$int = mb_ereg('ab');                     // [error : fewer arguments]
                                          // Warning: mb_ereg() expects at least 2 parameters, 1 given in ...
var_dump($int);                           // bool(false)
var_dump($regs);                          // array(0) { }

                    // With 3rd argument, mb_ereg() returns either int(how many bytes matched) or bool(false)
                    // and 3rd argument is a bit complicated.

?>

down

-1

Riikka K ¶

10 years ago


While hardly mentioned anywhere, it may be useful to note that mb_ereg uses Oniguruma library internally. The syntax for the default mode (ruby) is described here:

http://www.geocities.jp/kosako3/oniguruma/doc/RE.txt

down

-2

Jon ¶

15 years ago


Hebrew regex tested on PHP 5, Ubuntu 8.04.

Seems to work fine without the mb_regex_encoding lines (commented out).

Didn't seem to work with \uxxxx (also commented out).



<?php

echo "Line ";

//mb_regex_encoding("ISO-8859-8");

//if(mb_ereg(".*([\u05d0-\u05ea]).*", $this->current_line))

if(mb_ereg(".*([א-ת]).*", $this->current_line))

{

    echo "has";

}

else

{

    echo "doesn't have";

}

echo " Hebrew characters.<br>";    

//mb_regex_encoding("UTF-8");

?>

add a note