get_html_translation_table

(PHP 4, PHP 5, PHP 7, PHP 8)

get_html_translation_table — 返回使用 htmlspecialchars() 和 htmlentities() 后的转换表

说明

get_html_translation_table(int $table = HTML_SPECIALCHARS, int $flags = ENT_QUOTES | ENT_SUBSTITUTE | ENT_HTML401, string $encoding = "UTF-8"): array

get_html_translation_table() 将返回 htmlspecialchars() 和 htmlentities() 处理后的转换表。

注意:
特殊字符可以使用多种转换方式。例如： " 可以被转换成 ", " 或者 &#x22. get_html_translation_table() 返回其中最常用的。

参数

table

有两个新的常量 (HTML_ENTITIES, HTML_SPECIALCHARS) 允许你指定你想要的表。

flags

A bitmask of one or more of the following flags, which specify which quotes the table will contain as well as which document type the table is for. The default is ENT_QUOTES | ENT_SUBSTITUTE | ENT_HTML401.

**可用的 `flags` 常量**
常量名	说明
`ENT_COMPAT`	表格将包含双引号但不包含单引号实体。
`ENT_QUOTES`	表格将包含双引号和单引号实体。
`ENT_NOQUOTES`	表格不包含双引号实体，也不包含单引号实体。
`ENT_SUBSTITUTE`	Replace invalid code unit sequences with a Unicode Replacement Character U+FFFD (UTF-8) or � (otherwise) instead of returning an empty string.
`ENT_HTML401`	HTML 4.01 表格。
`ENT_XML1`	XML 1 表格。
`ENT_XHTML`	XHTML 表格。
`ENT_HTML5`	HTML 5 表格。

encoding

要使用的编码。如果省略，则此参数的默认值是 UTF-8。

支持以下字符集：

**支持的字符集列表**
字符集	别名	描述
ISO-8859-1	ISO8859-1	西欧，Latin-1
ISO-8859-5	ISO8859-5	Little used cyrillic charset (Latin/Cyrillic).
ISO-8859-15	ISO8859-15	西欧，Latin-9。增加欧元符号，法语和芬兰语字母在 Latin-1(ISO-8859-1) 中缺失。
UTF-8		ASCII 兼容的多字节 8 位 Unicode。
cp866	ibm866, 866	DOS 特有的西里尔编码。本字符集在 4.3.2 版本中得到支持。
cp1251	Windows-1251, win-1251, 1251	Windows 特有的西里尔编码。本字符集在 4.3.2 版本中得到支持。
cp1252	Windows-1252, 1252	Windows 特有的西欧编码。
KOI8-R	koi8-ru, koi8r	俄语。本字符集在 4.3.2 版本中得到支持。
BIG5	950	繁体中文，主要用于中国台湾省。
GB2312	936	简体中文，中国国家标准字符集。
BIG5-HKSCS		繁体中文，附带香港扩展的 Big5 字符集。
Shift_JIS	SJIS, 932	日语
EUC-JP	EUCJP	日语
MacRoman		Mac OS 使用的字符串。
`''`		An empty string activates detection from script encoding (Zend multibyte), default_charset and current locale (see nl_langinfo() and setlocale()), in this order. Not recommended.

注意: 其他字符集没有认可。将会使用默认编码并抛出异常。

返回值

将转换表作为数组返回，原始字符为键，实体为值。

更新日志

版本	说明
8.1.0	`flags` 从 `ENT_COMPAT` 更改为 `ENT_QUOTES` \| `ENT_SUBSTITUTE` \| `ENT_HTML401`。

示例

示例 #1 转换表示例

<?php
var_dump(get_html_translation_table(HTML_ENTITIES, ENT_QUOTES | ENT_HTML5));
?>

以上示例的输出类似于：

array(1510) {
  ["
"]=>
  string(9) "&NewLine;"
  ["!"]=>
  string(6) "&excl;"
  ["""]=>
  string(6) "&quot;"
  ["#"]=>
  string(5) "&num;"
  ["$"]=>
  string(8) "&dollar;"
  ["%"]=>
  string(8) "&percnt;"
  ["&"]=>
  string(5) "&amp;"
  ["'"]=>
  string(6) "&apos;"
  // ...
}

参见

htmlspecialchars() - 将特殊字符转换为 HTML 实体
htmlentities() - 将字符转换为 HTML 转义字符
html_entity_decode() - Convert HTML entities to their corresponding characters

add a note

User Contributed Notes 13 notes

down

kevin at cwsmailbox dot xom ¶

14 years ago


Be careful using get_html_translation_table() in a loop, as it's very slow.

down

michael dot genesis at gmail dot com ¶

13 years ago


The fact that MS-word and some other sources use CP-1252, and that it is so close to Latin1 ('ISO-8859-1') causes a lot of confusion. What confused me the most was finding that mySQL uses CP-1252 by default.



You may run into trouble if you find yourself tempted to do something like this:

<?php

    $trans[chr(149)] = '&bull;';    // Bullet

    $trans[chr(150)] = '&ndash;';    // En Dash

    $trans[chr(151)] = '&mdash;';    // Em Dash

    $trans[chr(152)] = '&tilde;';    // Small Tilde

    $trans[chr(153)] = '&trade;';    // Trade Mark Sign

?>



Don't do it. DON'T DO IT!



You can use:

<?php

    $translationTable = get_html_translation_table(HTML_ENTITIES, ENT_NOQUOTES, 'WINDOWS-1252');

?>



or just convert directly:

<?php

    $output = htmlentities($input, ENT_NOQUOTES, 'WINDOWS-1252');

?>



But your web page is probably encoded UTF-8, and you probably don't really want CP-1252 text flying around, so fix the character encoding first:

<?php

    $output = mb_convert_encoding($input, 'UTF-8', 'WINDOWS-1252');

    $ouput = htmlentities($output);

?>

down

Kenneth Kin Lum ¶

16 years ago


to display the mapping on a webpage no matter what the server encoding is, this can be used

  echo "<pre>\n";
  echo htmlentities(print_r((get_html_translation_table(HTML_SPECIALCHARS)), true));
  echo htmlentities(print_r((get_html_translation_table(HTML_ENTITIES)), true));

since get_html_translation_table() actually gives the special chars in iso-8859-1 (Latin-1) encoding, so to see the tables correctly using

  print_r(get_html_translation_table(HTML_ENTITIES));

your server needs to give a HTTP header as iso-8859-1, unless you use header() or manually set the browser's encoding setting to iso-8859-1.  And you need to view the source of the page to see the mapping.  (except English version of IE 7 outputs the page source as iso-8859-1 anyway).

down

dirk at hartmann dot net ¶

23 years ago


get_html_translation_table

It works only with the first 256 Codepositions.

For Higher Positions, for Example &#1092;

(a kyrillic Letter) it shows the same.

down

iain (duh) workingsoftware.com.au ¶

17 years ago


I wrote a quick little function for converting something like '&middot;' into '&#183;':

$to_convert = '&middot;'; 
$table = get_html_translation_table(HTML_ENTITIES);
$equiv = '&#'.ord(array_search($to_convert,$table)).';';

down

Patrick nospam at nospam mesopia dot com ¶

19 years ago


Not sure what's going on here but I've run into a problem that others might face as well...

<?php

$translations = array_flip(get_html_translation_table(HTML_ENTITIES,ENT_QUOTES));

?>

returns the single quote ' as being equal to &#39; while

<?php

$translatedString = htmlentities($string,ENT_QUOTES);

?>
returns it as being equal to &#039;

I've had to do a specific string replacement for the time being... Not sure if it's an issue with the function or the array manipulation.

-Pat

down

-1

Jérôme Jaglale ¶

18 years ago


htmlentities includes htmlspecialchars, so here's how to convert an UTF-8 string :
htmlentities($string, ENT_QUOTES, 'UTF-8');

down

-2

Maurizio Siliani at trident dot it ¶

17 years ago


If you have troubles (like me) getting data from ISO-8859-1 encoded forms where user copy and paste from word, this routine could be useful.
It adds to the standard get_html_translation_table the codes of the characters usually M$ Word replacs into typed text.
Otherwise those characters would never be displayed correctly in html output.

function get_html_translation_table_CP1252() {
    $trans = get_html_translation_table(HTML_ENTITIES);
    $trans[chr(130)] = '&sbquo;';    // Single Low-9 Quotation Mark
    $trans[chr(131)] = '&fnof;';    // Latin Small Letter F With Hook
    $trans[chr(132)] = '&bdquo;';    // Double Low-9 Quotation Mark
    $trans[chr(133)] = '&hellip;';    // Horizontal Ellipsis
    $trans[chr(134)] = '&dagger;';    // Dagger
    $trans[chr(135)] = '&Dagger;';    // Double Dagger
    $trans[chr(136)] = '&circ;';    // Modifier Letter Circumflex Accent
    $trans[chr(137)] = '&permil;';    // Per Mille Sign
    $trans[chr(138)] = '&Scaron;';    // Latin Capital Letter S With Caron
    $trans[chr(139)] = '&lsaquo;';    // Single Left-Pointing Angle Quotation Mark
    $trans[chr(140)] = '&OElig;    ';    // Latin Capital Ligature OE
    $trans[chr(145)] = '&lsquo;';    // Left Single Quotation Mark
    $trans[chr(146)] = '&rsquo;';    // Right Single Quotation Mark
    $trans[chr(147)] = '&ldquo;';    // Left Double Quotation Mark
    $trans[chr(148)] = '&rdquo;';    // Right Double Quotation Mark
    $trans[chr(149)] = '&bull;';    // Bullet
    $trans[chr(150)] = '&ndash;';    // En Dash
    $trans[chr(151)] = '&mdash;';    // Em Dash
    $trans[chr(152)] = '&tilde;';    // Small Tilde
    $trans[chr(153)] = '&trade;';    // Trade Mark Sign
    $trans[chr(154)] = '&scaron;';    // Latin Small Letter S With Caron
    $trans[chr(155)] = '&rsaquo;';    // Single Right-Pointing Angle Quotation Mark
    $trans[chr(156)] = '&oelig;';    // Latin Small Ligature OE
    $trans[chr(159)] = '&Yuml;';    // Latin Capital Letter Y With Diaeresis
    ksort($trans);
    return $trans;
}

down

-3

Alex Minkoff ¶

19 years ago


If you want to display special HTML entities in a web browser, you can use the following code:

<?
$entities = get_html_translation_table(HTML_ENTITIES);
foreach ($entities as $entity) {
    $new_entities[$entity] = htmlspecialchars($entity);
}
echo "<pre>";
print_r($new_entities);
echo "</pre>";
?>

If you don't, the key name of each element will appear to be the same as the element content itself, making it look mighty stupid. ;)

down

-3

kumar at chicagomodular.com ¶

22 years ago


without heavy scientific analysis, this seems to work as a quick fix to making text originating from a Microsoft Word document display as HTML:



<?php

function DoHTMLEntities ($string)

    {

        $trans_tbl = get_html_translation_table (HTML_ENTITIES);

        

        // MS Word strangeness.. 

        // smart single/ double quotes:

        $trans_tbl[chr(145)] = '\''; 

        $trans_tbl[chr(146)] = '\''; 

        $trans_tbl[chr(147)] = '&quot;'; 

        $trans_tbl[chr(148)] = '&quot;'; 



                // Acute 'e'

        $trans_tbl[chr(142)] = '&eacute;';

        

        return strtr ($string, $trans_tbl);

    }

?>

down

-6

robertn972 at gmail dot com ¶

16 years ago


I found this useful in converting latin characters



<?php

function convertLatin1ToHtml($str) { 

$allEntities = get_html_translation_table(HTML_ENTITIES, ENT_NOQUOTES); 

$specialEntities = get_html_translation_table(HTML_SPECIALCHARS, ENT_NOQUOTES); 

$noTags = array_diff($allEntities, $specialEntities); 

$str = strtr($str, $noTags); 

return $str; 

}

?>

down

-5

kevin_bro at hostedstuff dot com ¶

22 years ago


Alans version didn't seem to work right. If you're having the same problem consider using this slightly modified version instead:

function unhtmlentities ($string)  {
   $trans_tbl = get_html_translation_table (HTML_ENTITIES);
   $trans_tbl = array_flip ($trans_tbl);
   $ret = strtr ($string, $trans_tbl);
   return preg_replace('/&#(\d+);/me', 
      "chr('\\1')",$ret);
}

down

-8

alan at akbkhome dot com ¶

22 years ago


If you want to decode all those &#123; symbols as well.... 

function unhtmlentities ($string)  {
    $trans_tbl = get_html_translation_table (HTML_ENTITIES);
    $trans_tbl = array_flip ($trans_tbl);
    $ret = strtr ($string, $trans_tbl);
    return  preg_replace('/\&\#([0-9]+)\;/me', 
        "chr('\\1')",$ret);
}

add a note