简单的网页抓取小程序
作者:互联网
简单的网页抓取小程序
仅限一些没有太多安全防备的网站,本程序使用hao123
- 简单的一些解析域名方法
- 简单的向网站发送请求后接受并打印
- 熟悉lambda表达式,Socket编程,装饰模式
package GetHTMLContent;
import java.io.*;
import java.net.*;
import java.nio.charset.StandardCharsets;
public class GetHtmlContentAPP {
public static void main(String[] args) throws IOException {
System.out.println("解析域名......");
//InetAddress域名解析
//InetAddress.getByName("www.hao123.com")通过得到hao123的域名来获取IP地址
InetAddress inetAddress = InetAddress.getByName("www.hao123.com");
System.out.println("网站地址为:" + inetAddress);
System.out.println("尝试链接到主机......");
//新建Socket包,未进行链接
Socket s = new Socket();
//通过IP地址和端口号来确定连接对象
SocketAddress sa = new InetSocketAddress(inetAddress, 80);
//链接sa,允许时延10000毫秒
s.connect(sa, 10000);
System.out.println("已经连接到主机,开始模拟发送HTTP请求......");
PrintWriter printWriter = new PrintWriter(new OutputStreamWriter(s.getOutputStream(), StandardCharsets.UTF_8));
StringBuffer stringBuffer = new StringBuffer();
//这是HTTP协议标准的请求头
stringBuffer.append("GET /index.html HTTP/1.1\r\n");
stringBuffer.append("Host: www.hao123.com\r\n");
stringBuffer.append("Connection: Keep-Alive\r\n");
stringBuffer.append("\r\n");
printWriter.write(stringBuffer.toString());
printWriter.flush();
System.out.println("请求已经发送,开始读取主页内容……");
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(s.getInputStream(), StandardCharsets.UTF_8));
bufferedReader.lines().forEach(System.out::println);
}
}
标签:网页,stringBuffer,程序,System,抓取,println,new,hao123,out 来源: https://blog.csdn.net/LWC1436756712/article/details/114697114