编程语言
首页 > 编程语言> > 使用PHP从Google Chrome书签导出中提取数据

使用PHP从Google Chrome书签导出中提取数据

作者:互联网

我想将Google chrome书签保存到数据库中,所以第一步是使用PHP从chrome导出的.html文件并将数据放入变量中,我希望获得一些能够运行下方的数据,它将URL,ADD_DATE,ICON和“链接”文本都提取到各自的变量中.

我知道我需要为此使用一些正则表达式,有人可以帮忙吗?谢谢,我会在时间允许的情况下为您提供赏金.

<A HREF="http://snipt.net/public/tag/css"
 ADD_DATE="1271801059" 
 ICON="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAYAAAAf8/9hAAACtklEQVQ4jXWSS2gTURSGvzszyaSxpsS2vhe2WosgilgVHyDqzo2iIoog+EIKCiIuFNTGjUoVBLWCiKArFcSFi7hQFLT4Qqp10SK11mKbgk3SmjSdJDNzj4s+0Fb/zTmL/3z8596jmKDElxcVYTuwxS3+Gu7O9DysqzvsTvT8KfVnP9DdvBfRZ3w3N197DqGAepV2AyePPuj9FDKNGUZBG68/dzo/Hjcm/gL0dcQrS4KRO9pzNvt+EdvUDOVdWr6lSKSdYUeFr39NhuNdP7N2KvNrZti21brF856eO7AloQAGul40iHgx3ysQsoNXP3Znih/avp6YX2lSXWESDRvprFe2fNHqfd8BdsduViQzxQ19mcxLAwAxporWKKXwXIyQJWxdMZu1i2YTjUTxsKeV2dlLsVjMALgXO5yMRqYMhE1zpjW6SBalQBSuXziyoNzC9UPk3QJaRsFa7QjOil5YWX/15Yqa6VYinc3m0vl2C0BEJxUKQQCh6Gu074MIIoIWjWhh55LipkiopDGpnVzT8UN5AGskgDRjmL74YooWEI2IIGhAA4IWQWD55prc1uo1R26P/YIBEK3e2KoM+5HCGB8ADTJSR2CC1oInXqz92anyvwAAnngNygrmRDQylmC8CogQDviIl5v7NrXg9CRAxbz17UpZTUqZiOjRNUYAQVMzNeDQ0muyL76Jg893Hdt+Y2jJ+BuMqeANXw5YJXs8d2iOiGAqTant0tVf5Mr7Wu53rsOX6ZSEvZ62nqyeeMoAJDuf1nvO4A2bQTLOMHdbolxrXUV/fiGEKFRFBm5VlfZffH66tvefgI6OuF0u7pt4a2pZ47vFfE4thWCQytLck9qy/nPNZ6veTZyZpPP3m7cF6n8K+0VKjxba6xp6d/3POynBmJaed07afs4s+tmmT7Gqwf/5fgMaeWl1u/QPfAAAAABJRU5ErkJggg=="
>Snipt - public - css | Share and store code or command snippets.</A>

更新

我喜欢yc用户的建议,而不是使用正则表达式

$s = '<A HREF="http://snipt.net/public/tag/css"
 ADD_DATE="1271801059" 
 ICON="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAYAAAAf8/9hAAACtklEQVQ4jXWSS2gTURSGvzszyaSxpsS2vhe2WosgilgVHyDqzo2iIoog+EIKCiIuFNTGjUoVBLWCiKArFcSFi7hQFLT4Qqp10SK11mKbgk3SmjSdJDNzj4s+0Fb/zTmL/3z8596jmKDElxcVYTuwxS3+Gu7O9DysqzvsTvT8KfVnP9DdvBfRZ3w3N197DqGAepV2AyePPuj9FDKNGUZBG68/dzo/Hjcm/gL0dcQrS4KRO9pzNvt+EdvUDOVdWr6lSKSdYUeFr39NhuNdP7N2KvNrZti21brF856eO7AloQAGul40iHgx3ysQsoNXP3Znih/avp6YX2lSXWESDRvprFe2fNHqfd8BdsduViQzxQ19mcxLAwAxporWKKXwXIyQJWxdMZu1i2YTjUTxsKeV2dlLsVjMALgXO5yMRqYMhE1zpjW6SBalQBSuXziyoNzC9UPk3QJaRsFa7QjOil5YWX/15Yqa6VYinc3m0vl2C0BEJxUKQQCh6Gu074MIIoIWjWhh55LipkiopDGpnVzT8UN5AGskgDRjmL74YooWEI2IIGhAA4IWQWD55prc1uo1R26P/YIBEK3e2KoM+5HCGB8ADTJSR2CC1oInXqz92anyvwAAnngNygrmRDQylmC8CogQDviIl5v7NrXg9CRAxbz17UpZTUqZiOjRNUYAQVMzNeDQ0muyL76Jg893Hdt+Y2jJ+BuMqeANXw5YJXs8d2iOiGAqTant0tVf5Mr7Wu53rsOX6ZSEvZ62nqyeeMoAJDuf1nvO4A2bQTLOMHdbolxrXUV/fiGEKFRFBm5VlfZffH66tvefgI6OuF0u7pt4a2pZ47vFfE4thWCQytLck9qy/nPNZ6veTZyZpPP3m7cF6n8K+0VKjxba6xp6d/3POynBmJaed07afs4s+tmmT7Gqwf/5fgMaeWl1u/QPfAAAAABJRU5ErkJggg=="
>Snipt - public - css | Share and store code or command snippets.</A>';


$bookmarks = simplexml_load_string($s2);
echo $bookmarks["HREF"]; //URL
echo '<br>';
echo $bookmarks[0]; //Name
echo '<br>'; 
echo $bookmarks['ICON']; //Icon
echo '<br>'; 
echo $bookmarks['ADD_DATE']; //Add_Date

但是我还没有弄清楚如何使其与html页面或字符串上的多个链接一起使用.

然后,我找到了这个PHP DOMDocument类,并且似乎使它像这样工作…

$html = '<DT><A HREF="http://stackapps.com/questions/518/stacktack-a-javascript-widget-you-can-stick-anywhere" ADD_DATE="1301274664" ICON="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAYAAAAf8/9hAAACY0lEQVQ4jX2SS0jVQRTGv//MnJnxGjctpbCFIrgO2rRr06KiRdtKEYLwUj4gohdBZlFUEmmp0N8WIZXrIMiF25Zu27hQIaKiuHaze/93ni3ykY/bWR3O/M43zDdfghp14fpEo67HZwCx+ONby8uR28s7cayWQH0DP6G01lrruqb9LcdrcclaMzA0doqIHnESrxQnmfBkgHGeTwD4EEoxhKfWWuOc7/LGXH0y2Pd2XaCn5znl2zBPUraSJAghwAUHYxwAEGOAcw7eeVhjYapmqbQYO9K0YBkApGnBMqJJIoKSEkorKK2hlCwpJUtKa2itIZUEEUGQmEzTgt3kAZHgggSEJBBRJjjvz4Jpz4Jp5yT6BVG2ugxOgq97cO7G/ea9u5uO5nK5CaVVo9IanLG+S10nx/81a3R6ptd7N5ZVMmSVbLlcznpXvhdnWUN9wxIJMc0Za2SMgyVJyVTd1Fa3s3J1KkmSEmMcgvMGKfjrXOOuxe3fmGybbD9PNjhWWf7dZl3odCEshxAQI/JSqe6tezqnumNEPvgA50PRene2uFJsXb/v5sjULaXkkNYagihjgl8pV8vTAJBTuTPBh2Fnra5mGYyxg3cHuu4AgFgT8NZ5zzmssYiAFjE+qxPqHgAE5/PeOVhj4Z2DrTq/6cV/g5TMSyVbSQoIQRtBSoAY1oLkVoNkl34uhI40LVgOAHNz78LhI8cWInAohjCKGD947w+GELT3DtbYkvN+2JjqrLXugDfm8vjjix//6/m1By9OM+LTABCs73x4/fybnTix0xAAvn75NLOneV8FQFL5Fd/X4v4ArZQWGyLoDDcAAAAASUVORK5CYII=">app - StackTack, a JavaScript widget you can stick anywhere - Stack Apps</A>

            <DT><A HREF="https://chrome.google.com/extensions/detail/paoeolblihedcagbofkkkecjilmpehmo" ADD_DATE="1301275461" ICON="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAYAAAAf8/9hAAADSklEQVQ4jW2TX2gbBQDGf5e7NGmTlOZ/xKVtbOfWrjWUNa6dOB9ay4aMqvgwEPRFRYTBGEPwwRcnfVB8ERQn+CBS6XSoVAWZImMWOttJtcZ2bdpm69qkjfnTXq7J3SV358uEyvzePvh9fC/fJ/A/OptIPPH6yMjzmq4PIgh2h9O5tZTPf33uypWJZC63vZ8V9ptwRzj0WvvhCy9EDrxUmJnxWsUilmUhOJ1IPp9VdjqTP+Ryb4+tr39xX+vQqWPdE9Pjy99cOG9lgiHrjiham0e6rc1HE9ZGLGYtgbUA1prTab3n8bzxb04ECAaD7g8n35z0NAd62FA5fPsuW0+d4nqsk6Q/TPlQB13xOL70bWyKwqDbPeR7su3Pq0v5RRFg9NzJs9E+x4t2a49SYxMlVxsflyDde4Z08DG+N0PoITcBu4F5J0tDRSMx8HBivLjzmRgOh13D50ferRibDzrRiQRb+fTaCguto4Tb++jv9qO5Olis5BG9axSG27jhCrArNHkdsZ1ZKXikNS43GO3ynkLE4aRumeQNGaVSIyfryB4RRQW5mCXQXufMcxmKp7P8nowSmbGdkMpq+YGNvVKT3aaTrZSJtuzQdTDC4i+XmZNcpIoRasVlfDevEjvmg/IWLUKGwa46yXkrIlk1dK1u1/NmzeW3a7QUFxgZOUEm+zPJ6TF2qh58lswrz/TQfzSDJWdQa40oio2KqqnS1tpmqjOnbZf8dm9KqSAKINqnOf10goOODR55qJGe7iY6W1dg5ztqpoSqSqiawPyt8i3RqOqy5LZ3O45G+zdKOUwRcms1WtLw8rMCfX2r+BzXYfcnTMNA0+wolWbWs5py8VJhzAboqW9/+9y5aeRw+5hf3WPIauXV0QLeph+hMAXKEqYpoGoOdmUPRt3kky/lr0ol/hAB0M2/95bzZufjvQNii6/heKiZ/tANqK6AUcUwLGp1G1W1Gb0mcWmiMPX+ePkikBbvLbKmFspr8uz6brg9FF0NBANoMQ55y0i2EjZTQNcl/kohv/XB7uRHl+V3gFnA/M+ZgCBw3N97YDg8EI93tYX8cVuqKuVuKlNz3P11Tp0pyFwDFoH6fW+8JwmIAFHAA9SAKpAHtoHyfvgfh8p7963YqU4AAAAASUVORK5CYII=">StackStalker - Google Chrome extension gallery</A>

            <DT><A HREF="http://stackapps.com/questions/319/phpstack-a-php-wrapper-to-the-se-api" ADD_DATE="1301276371" ICON="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAYAAAAf8/9hAAACY0lEQVQ4jX2SS0jVQRTGv//MnJnxGjctpbCFIrgO2rRr06KiRdtKEYLwUj4gohdBZlFUEmmp0N8WIZXrIMiF25Zu27hQIaKiuHaze/93ni3ykY/bWR3O/M43zDdfghp14fpEo67HZwCx+ONby8uR28s7cayWQH0DP6G01lrruqb9LcdrcclaMzA0doqIHnESrxQnmfBkgHGeTwD4EEoxhKfWWuOc7/LGXH0y2Pd2XaCn5znl2zBPUraSJAghwAUHYxwAEGOAcw7eeVhjYapmqbQYO9K0YBkApGnBMqJJIoKSEkorKK2hlCwpJUtKa2itIZUEEUGQmEzTgt3kAZHgggSEJBBRJjjvz4Jpz4Jp5yT6BVG2ugxOgq97cO7G/ea9u5uO5nK5CaVVo9IanLG+S10nx/81a3R6ptd7N5ZVMmSVbLlcznpXvhdnWUN9wxIJMc0Za2SMgyVJyVTd1Fa3s3J1KkmSEmMcgvMGKfjrXOOuxe3fmGybbD9PNjhWWf7dZl3odCEshxAQI/JSqe6tezqnumNEPvgA50PRene2uFJsXb/v5sjULaXkkNYagihjgl8pV8vTAJBTuTPBh2Fnra5mGYyxg3cHuu4AgFgT8NZ5zzmssYiAFjE+qxPqHgAE5/PeOVhj4Z2DrTq/6cV/g5TMSyVbSQoIQRtBSoAY1oLkVoNkl34uhI40LVgOAHNz78LhI8cWInAohjCKGD947w+GELT3DtbYkvN+2JjqrLXugDfm8vjjix//6/m1By9OM+LTABCs73x4/fybnTix0xAAvn75NLOneV8FQFL5Fd/X4v4ArZQWGyLoDDcAAAAASUVORK5CYII=">library - PHPstack - A PHP wrapper to the SE API - Stack Apps</A>
';

$dom = new DOMDocument;
$dom->loadHTML($html);
foreach ($dom->getElementsByTagName('a') as $node)
{
  echo 'Title = ' .$node->nodeValue. '</br>';
  echo 'URL = ' .$node->getAttribute("href"). '</br>';
  echo 'Icon = ' . $node->getAttribute("icon"). '</br>';
  echo 'Date Added = ' . $node->getAttribute("add_date"). '</br>';
  echo '<br>';
}

解决方法:

请勿使用regex,因为HTML(即使由Chrome提供)也不是常规语言.

使用XML解析器,例如SimpleXML.

如果上面的字符串是$s,

$bookmarks = simplexml_load_string($s);

echo $bookmarks["HREF"]; //URL
echo $bookmarks[0]; //Name

object(SimpleXMLElement)#1 (2) {
[“@attributes”]=> array(3) {
[“HREF”]=> string(31)
“07001”
[“ADD_DATE”]=> string(10) “1271801059”
[“ICON”]=> string(1026)
“data:image/png;base64,iVBh….=” }
[0]=> string(64) “Snipt – public – css
| Share and store code or command
snippets.” }

标签:bookmarks,php,regex
来源: https://codeday.me/bug/20191208/2092150.html