You can also use this function to repair xml, for example if stray ampersands etc are breaking it:
<?php
$xml = tidy_repair_string($xml, array(
'output-xml' => true,
'input-xml' => true
));
?>
(PHP 5, PHP 7, PECL tidy >= 0.7.0)
tidy::repairString -- tidy_repair_string — Repair a string using an optionally provided configuration file
객체 기반 형식
절차식 형식
Repairs the given string.
data
The data to be repaired.
config
The config config
can be passed either as an
array or as a string. If a string is passed, it is interpreted as the
name of the configuration file, otherwise, it is interpreted as the
options themselves.
Check » http://tidy.sourceforge.net/docs/quickref.html for an explanation about each option.
encoding
The encoding
parameter sets the encoding for
input/output documents. The possible values for encoding are:
ascii, latin0, latin1,
raw, utf8, iso2022,
mac, win1252, ibm858,
utf16, utf16le, utf16be,
big5, and shiftjis.
Returns the repaired string.
Example #1 tidy::repairString() example
<?php
ob_start();
?>
<html>
<head>
<title>test</title>
</head>
<body>
<p>error</i>
</body>
</html>
<?php
$buffer = ob_get_clean();
$tidy = new tidy();
$clean = $tidy->repairString($buffer);
echo $clean;
?>
위 예제의 출력:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN"> <html> <head> <title>test</title> </head> <body> <p>error</p> </body> </html>
Note: The optional parameters
config
andencoding
were added in Tidy 2.0.
You can also use this function to repair xml, for example if stray ampersands etc are breaking it:
<?php
$xml = tidy_repair_string($xml, array(
'output-xml' => true,
'input-xml' => true
));
?>
Using tidy is very simple to fix a broken ods/odt document
I wrote the following code to be run from command line
<?php
$zip = new ZipArchive();
if ($zip->open($argv[1])) {
$fp = $zip->getStream('content.xml'); //file inside archive
if(!$fp)
die("Error: can't get stream to document file");
$stat = $zip->statName('content.xml');
$buf = ""; //file buffer
ob_start(); //to capture CRC error message
while (!feof($fp)) {
$buf .= fread($fp, 2048);
}
$s = ob_get_contents();
ob_end_clean();
fclose($fp);
$zip->close();
$config = array(
'indent' => true,
'clean' => true,
'input-xml' => true,
'output-xml' => true,
'wrap' => false
);
$tidy = new Tidy();
$xml = $tidy->repairstring($buf, $config);
$array=split("\n",$xml);
$file=tempnam("/tmp","xml");
$fp=fopen($file,"rw+");
foreach ($array as $key=>$value) {
fwrite($fp,trim($value),strlen(trim($value)));
if ($key==0) {
fwrite($fp,"\n");
}
}
fclose($fp);
if ($zip->open($argv[1]) === TRUE) {
$zip->deleteName('content.xml');
$zip->addFile($file, 'content.xml');
$zip->close();
echo 'recovery complete';
} else {
echo 'recovery failed';
}
unlink($file);
}
?>
save it to a file called fixdoc and invoke as:
php fixdoc yourbrokendoc
for your safety, please work on a copy of your doc.
The docs referenced at http://tidy.sourceforge.net/docs/quickref.html above state that the configuration option 'sort-attributes' is an enumeration of 'none' and 'alpha', thereby specifying that strings of either form are the acceptable values. This may not be the case, however - on my system, the option was not honored until I set it to true. This may also be the case with other options, so experiment a bit. The output of tidy::getConfig() may be useful in this regard.