zrr
zrr
@zrr

PHP爬虫的微博热搜

14313-rra7omsm2i.png

首先我们看到微博热搜主要在table里面

27954-51jv5dg7cxi.png

function getUrlContent($url){//通过url获取html内容 https://s.weibo.com/top/summary
    $ch = curl_init();
    curl_setopt($ch,CURLOPT_URL,$url);
    curl_setopt($ch,CURLOPT_USERAGENT,"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1 )");
    curl_setopt($ch,CURLOPT_HEADER,1);
    curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
    $output = curl_exec($ch);
    curl_close($ch);
    return $output;
}

function getTable($html) {
    preg_match_all("/<table>[\s\S]*?<\/table>/i",$html,$table);
    $table = $table[0][0];
    $table = preg_replace("'<table[^>]*?>'si","",$table);
    $table = preg_replace("'<tr[^>]*?>'si","",$table);
    $table = preg_replace("'<td[^>]*?>'si","",$table);
    $table = str_replace("</tr>","{tr}",$table);
    $table = str_replace("</td>","{td}",$table);
    //去掉 HTML 标记
    $table = preg_replace("'<[/!]*?[^<>]*?>'si","",$table);
    //去掉空白字符加上#号标记
    $table = preg_replace("'([rn])[s]+'","",$table);
    $table = str_replace(" ","|",$table);
    $table = preg_replace("'[|]+'","#",$table);
    $table = explode('{tr}', $table);
    array_pop($table);
    foreach ($table as $key=>$tr) {
      // 自己可添加对应的替换
        $tr = str_replace("\n\n","",$tr);
        $td = explode('{td}', $tr);
        array_pop($td);
        $td_array[] = $td;
    }
    return $td_array;
}

$html = getUrlContent("https://s.weibo.com/top/summary?Refer=top_hot&topnav=1&wvr=6");
$table = getTable($html);
$table = array_slice($table,2,6);
for ($i = 0; $i < count($table)-1; $i++) {
    $str = (string)$table[$i][1];
    $login = (string)$table[$i][2];
    $login = str_replace("#", "", $login);
    $str = explode('#',$str);
    $hot = $str[count($str)-2];
    $title = '';
    for($j = 0; $j < count($str)-2; $j++){
        $title .= $str[$j];
    }
        echo($login." ".$title." ".$hot);
        echo('<hr>');
}

打印$table

30612-h6dnatt0hn5.png
这里的的title,hot,login,依次表示的微博热搜标题,热度,(热,爆,荐)。

我们看一下成品

下午8:45 · 2021年02月06日
573
0
1
发表留言

LEARN
PHP爬虫的微博热搜
PHP爬虫的微博热搜...
扫描右侧二维码继续阅读
February 6, 2021
小丸子好萌!
统计
文章:23 篇
分类:4 个
评论:10 条
运行时长:0年334天
by yoniu.
小丸子好萌!