网页源代码怎么打开(java爬虫获取网页源码2种方式(纯净版))

第一种:URL

package InternetTest; import java.io.byteArrayOutputStream; import java.io.InputStream; import java.net.HttpURLConnection; import java.net.URL; public class a44 { public static void main(String[] args) throws Exception { URL url = new URL("http://www.baidu.com"); HttpURLConnection conn = (HttpURLConnection)url.openConnection(); conn.setrequestMethod("GET"); conn.setConnectTimeout(5 * 1024); InputStream inStream = conn.getInputStream(); ByteArrayOutputStream outStream = new ByteArrayOutputStream(); byte[] buffer = new byte[1024]; int len = 0; while ((len = inStream.read(buffer)) != -1) { outStream.write(buffer, 0, len); } inStream.close(); byte[] data =outStream.toByteArray(); String htmlSource = new String(data); System.out.println(htmlSource); } }

第二种:HttpClient

网页源代码

package InternetTest; import org.apache.http.HttpEntity; import org.apache.http.HttpStatus; import org.apache.http.client.methods.CloseableHttpResponse; import org.apache.http.client.methods.HttpGet; import org.apache.http.client.utils.HttpClientUtils; import org.apache.http.impl.client.CloseableHttpClient; import org.apache.http.impl.client.HttpClients; import org.apache.http.util.EntityUtils; public class a45 { public static void main(String[] args) throws Exception{ String url1 = "http://www.baidu.com"; CloseableHttpClient closeableHttpClient = HttpClients.createDefault(); CloseableHttpResponse closeableHttpResponse = null; HttpGet request = new HttpGet(url1); closeableHttpResponse = closeableHttpClient.execute(request); if(closeableHttpResponse.getStatusLine().getStatusCode() == HttpStatus.SC_OK) { HttpEntity httpEntity = closeableHttpResponse.getEntity(); String html = EntityUtils.toString(httpEntity, "utf-8"); System.out.println(html); } else { System.out.println(EntityUtils.toString(closeableHttpResponse.getEntity(), "utf-8")); } HttpClientUtils.closeQuietly(closeableHttpResponse); HttpClientUtils.closeQuietly(closeableHttpClient); } }

您可以还会对下面的文章感兴趣

最新评论

  1. 万里寻鹤
    万里寻鹤
    发布于:2022-04-27 15:46:53 回复TA
    第一种:URLpackage InternetTest;import java.io.byteArrayOutputStream;import java.io.InputStream;i
  1. 抱熊掌
    抱熊掌
    发布于:2022-04-27 13:36:09 回复TA
    tpClient = HttpClients.createDefault(); CloseableHttpResponse closeableHttpResponse = null; HttpGet request
  1. 小崽崽儿
    小崽崽儿
    发布于:2022-04-27 12:20:38 回复TA
    nnection conn = (HttpURLConnection)url.openConnection(); conn.setrequestMethod("GET")
  1. 劳莺和云
    劳莺和云
    发布于:2022-04-27 10:14:19 回复TA
    对人恭敬,就是在庄严你自我。

◎欢迎参与讨论,请在这里发表您的看法、交流您的观点。

使用微信扫描二维码后

点击右上角发送给好友