编程语言
首页 > 编程语言> > 简单的网页抓取小程序

简单的网页抓取小程序

作者:互联网

简单的网页抓取小程序


仅限一些没有太多安全防备的网站,本程序使用hao123


  • 简单的一些解析域名方法
  • 简单的向网站发送请求后接受并打印
  • 熟悉lambda表达式,Socket编程,装饰模式

package GetHTMLContent;

import java.io.*;
import java.net.*;
import java.nio.charset.StandardCharsets;

public class GetHtmlContentAPP {
    public static void main(String[] args) throws IOException {
        System.out.println("解析域名......");
        //InetAddress域名解析
        //InetAddress.getByName("www.hao123.com")通过得到hao123的域名来获取IP地址
        InetAddress inetAddress = InetAddress.getByName("www.hao123.com");
        System.out.println("网站地址为:" + inetAddress);
        System.out.println("尝试链接到主机......");
        //新建Socket包,未进行链接
        Socket s = new Socket();
        //通过IP地址和端口号来确定连接对象
        SocketAddress sa = new InetSocketAddress(inetAddress, 80);
        //链接sa,允许时延10000毫秒
        s.connect(sa, 10000);
        System.out.println("已经连接到主机,开始模拟发送HTTP请求......");

        PrintWriter printWriter = new PrintWriter(new OutputStreamWriter(s.getOutputStream(), StandardCharsets.UTF_8));

        StringBuffer stringBuffer = new StringBuffer();

        //这是HTTP协议标准的请求头
        stringBuffer.append("GET /index.html HTTP/1.1\r\n");
        stringBuffer.append("Host: www.hao123.com\r\n");
        stringBuffer.append("Connection: Keep-Alive\r\n");
        stringBuffer.append("\r\n");
        printWriter.write(stringBuffer.toString());
        printWriter.flush();

        System.out.println("请求已经发送,开始读取主页内容……");

        BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(s.getInputStream(), StandardCharsets.UTF_8));
        bufferedReader.lines().forEach(System.out::println);

    }
}

标签:网页,stringBuffer,程序,System,抓取,println,new,hao123,out
来源: https://blog.csdn.net/LWC1436756712/article/details/114697114