Edit Report a Bug

mb_split

(PHP 4 >= 4.2.0, PHP 5, PHP 7)

mb_split — Split multibyte string using regular expression

설명

array mb_split ( string $pattern , string $string [, int $limit = -1 ] )

Split a multibyte string using regular expression pattern and returns the result as an array.

인수

pattern: The regular expression pattern.
string: The string being split.
limit: If optional parameter limit is specified, it will be split in limit elements as maximum.

반환값

The result as an array.

주의

Note:
이 함수에서는 mb_regex_encoding()로 지정한 인코딩을 기본 문자 인코딩으로 사용합니다.

참고

mb_regex_encoding() - Set/Get character encoding for multibyte regex
mb_ereg() - Regular expression match with multibyte support

add a note

User Contributed Notes 8 notes

down

Stas Trefilov, Vertilia ¶

9 years ago


a (simpler) way to extract all characters from a UTF-8 string to array with a single call to a built-in function:

<?php
  $str = 'Ма-
руся';
  print_r(preg_split('//u', $str, null, PREG_SPLIT_NO_EMPTY));
?>

Output:

Array
(
    [0] => М
    [1] => а
    [2] => -
    [3] => 

    [4] => р
    [5] => у
    [6] => с
    [7] => я
)

down

boukeversteegh at gmail dot com ¶

13 years ago


The $pattern argument doesn't use /pattern/ delimiters, unlike other regex functions such as preg_match.

<?php
   # Works. No slashes around the /pattern/
   print_r( mb_split("\s", "hello world") );
   Array (
      [0] => hello
      [1] => world
   )

   # Doesn't work:
   print_r( mb_split("/\s/", "hello world") );
   Array (
      [0] => hello world
   )
?>

down

adjwilli at yahoo dot com ¶

17 years ago


I figure most people will want a simple way to break-up a multibyte string into its individual characters. Here's a function I'm using to do that. Change UTF-8 to your chosen encoding method.



<?php

function mbStringToArray ($string) {

    $strlen = mb_strlen($string);

    while ($strlen) {

        $array[] = mb_substr($string,0,1,"UTF-8");

        $string = mb_substr($string,1,$strlen,"UTF-8");

        $strlen = mb_strlen($string);

    }

    return $array;

}

?>

down

boukeversteegh at gmail dot com ¶

14 years ago


In addition to Sezer Yalcin's tip.



This function splits a multibyte string into an array of characters. Comparable to str_split().



<?php

function mb_str_split( $string ) {

    # Split at all position not after the start: ^

    # and not before the end: $

    return preg_split('/(?<!^)(?!$)/u', $string );

}



$string   = '火车票';

$charlist = mb_str_split( $string );



print_r( $charlist );

?>



# Prints:

Array

(

    [0] => 火

    [1] => 车

    [2] => 票

)

down

thflori at gmail ¶

7 years ago


I agree that some people might want a mb_explode('', $string);

this is my solution for it:

<?php

$string = 'Hallöle';

$array = array_map(function ($i) use ($string) { 
    return mb_substr($string, $i, 1); 
}, range(0, mb_strlen($string) -1));

expect($array)->toEqual(['H', 'a', 'l', 'l', 'ö', 'l', 'e']);

?>

down

gunkan at terra dot es ¶

12 years ago


To split an string like this: "日、に、本、ほん、語、ご" using the "、" delimiter i used:

     $v = mb_split('、',"日、に、本、ほん、語、ご");

but didn't work.

The solution was to set this before:

       mb_regex_encoding('UTF-8');
      mb_internal_encoding("UTF-8"); 
     $v = mb_split('、',"日、に、本、ほん、語、ご");

and now it's working:

Array
(
    [0] => 日
    [1] => に
    [2] => 本
    [3] => ほん
    [4] => 語
    [5] => ご
)

down

qdb at kukmara dot ru ¶

14 years ago


an other way to str_split multibyte string:
<?php
$s='әӘөүҗңһ';

//$temp_s=iconv('UTF-8','UTF-16',$s);
$temp_s=mb_convert_encoding($s,'UTF-16','UTF-8');
$temp_a=str_split($temp_s,4);
$temp_a_len=count($temp_a);
for($i=0;$i<$temp_a_len;$i++){
    //$temp_a[$i]=iconv('UTF-16','UTF-8',$temp_a[$i]);
    $temp_a[$i]=mb_convert_encoding($temp_a[$i],'UTF-8','UTF-16');
}

echo('<pre>');
print_r($temp_a);
echo('</pre>');

//also possible to directly use UTF-16:
define('SLS',mb_convert_encoding('/','UTF-16'));
$temp_s=mb_convert_encoding($s,'UTF-16','UTF-8');
$temp_a=str_split($temp_s,4);
$temp_s=implode(SLS,$temp_a);
$temp_s=mb_convert_encoding($temp_s,'UTF-8','UTF-16');
echo($temp_s);
?>

down

-3

gert dot matern at web dot de ¶

15 years ago


We are talking about Multi Byte ( e.g. UTF-8) strings here, so preg_split will fail for the following string: 



'Weiße Rosen sind nicht grün!'



And because I didn't find a regex to simulate a str_split I optimized the first solution from adjwilli a bit:



<?php

$string = 'Weiße Rosen sind nicht grün!'

$stop   = mb_strlen( $string);

$result = array();



for( $idx = 0; $idx < $stop; $idx++)

{

   $result[] = mb_substr( $string, $idx, 1);

}

?>



Here is an example with adjwilli's function:



<?php

mb_internal_encoding( 'UTF-8');

mb_regex_encoding( 'UTF-8');  



function mbStringToArray

( $string

)

{

  $stop   = mb_strlen( $string);

  $result = array();



  for( $idx = 0; $idx < $stop; $idx++)

  {

     $result[] = mb_substr( $string, $idx, 1);

  }



  return $result;

}



echo '<pre>', PHP_EOL, 

print_r( mbStringToArray( 'Weiße Rosen sind nicht grün!', true)), PHP_EOL,

'</pre>'; 

?>



Let me know [by personal email], if someone found a regex to simulate a str_split with mb_split.

add a note