java-为什么无论我键入什么网址(可抓取的GWT APP),HTMLUnit始终显示HostPage?
作者:互联网
这是完整的代码
public class CrawlServlet implements Filter{
public static String getFullURL(HttpServletRequest request) {
StringBuffer requestURL = request.getRequestURL();
String queryString = request.getQueryString();
if (queryString == null) {
return requestURL.toString();
} else {
return requestURL.append('?').append(queryString).toString();
}
}
@Override
public void destroy() {
// TODO Auto-generated method stub
}
@Override
public void doFilter(ServletRequest request, ServletResponse response,
FilterChain chain) throws IOException, ServletException {
HttpServletRequest httpRequest = (HttpServletRequest) request;
String fullURLQueryString = getFullURL(httpRequest);
System.out.println(fullURLQueryString+" what wrong");
if ((fullURLQueryString != null) && (fullURLQueryString.contains("_escaped_fragment_"))) {
// remember to unescape any %XX characters
fullURLQueryString=URLDecoder.decode(fullURLQueryString,"UTF-8");
// rewrite the URL back to the original #! version
String url_with_hash_fragment=fullURLQueryString.replace("?_escaped_fragment_=", "#!");
final WebClient webClient = new WebClient();
WebClientOptions options = webClient.getOptions();
options.setCssEnabled(false);
options.setThrowExceptionOnScriptError(false);
options.setThrowExceptionOnFailingStatusCode(false);
options.setJavaScriptEnabled(false);
HtmlPage page = webClient.getPage(url_with_hash_fragment);
// important! Give the headless browser enough time to execute JavaScript
// The exact time to wait may depend on your application.
webClient.waitForBackgroundJavaScript(20000);
// return the snapshot
//String originalHtml=page.getWebResponse().getContentAsString();
//System.out.println(originalHtml+" +++++++++");
System.out.println(page.asXml()+" +++++++++");
PrintWriter out = response.getWriter();
out.println(page.asXml());
//out.println(originalHtml);
} else {
try {
// not an _escaped_fragment_ URL, so move up the chain of servlet (filters)
chain.doFilter(request, response);
} catch (ServletException e) {
System.err.println("Servlet exception caught: " + e);
e.printStackTrace();
}
}
}
@Override
public void init(FilterConfig arg0) throws ServletException {
// TODO Auto-generated method stub
}
}
打开网址“ http://127.0.0.1:8888/Myproject.html?gwt.codesvr=127.0.0.1:9997?_escaped_fragment_=article”后,它显示了主机页面html代码,如下所示:
<html>
<head>
<meta name="fragment" content="!">
<meta http-equiv="content-type" content="text/html; charset=UTF-8"/>
<!-- -->
<!--
Consider inlining CSS to reduce the number of requested files
-->
<!-- -->
<link type="text/css" rel="stylesheet" href="MyProject.css"/>
<!-- -->
<!-- Any title is fine -->
<!-- -->
<title>MyProject</title>
<!-- -->
<!-- This script loads your compiled module. -->
<!-- If you add any GWT meta tags, they must -->
<!-- be added before this line. -->
<!-- -->
<script type="text/javascript" language="javascript" ></script>
<!-- -->
<!-- The body can have arbitrary html, or -->
<!-- you can leave the body empty if you want -->
<!-- to create a completely dynamic UI. -->
<!-- -->
</head>
<body>
<div id="loading">
Loading
<br/>
<img src="../images/loading.gif"/>
</div>
<!-- OPTIONAL: include this if you want history support -->
<iframe src="javascript:''" id="__gwt_historyFrame" tabindex="-1" style="position: absolute; width: 0;height: 0; border:0;"></iframe>
<!--
RECOMMENDED if your web app will not function without JavaScript enabled
-->
<noscript>
<div style="width: 22em; position: absolute; left: 50%; margin-left: -11em; color: red; background-color: white; border: 1pxsolid red; padding: 4px; font-family: sans-serif;">
Your web browser must have JavaScript enabled in order for this application to display correctly.
</div>
</noscript>
</body>
</html>
另一方面,“ http://127.0.0.1:8888/Myproject.html?gwt.codesvr=127.0.0.1:9997#!article”可以正常工作显示文章没有任何问题.
我还编译了整个项目在Tomcat7下运行它,但是我有同样的问题.它始终显示宿主页面的html.
注意:文章页面是嵌套在演示者演示文稿中的嵌套演示者.但是我不认为这不是cos甚至不显示标题页的主要原因.
解决方法:
首先,可以尝试使用& _escaped_fragment_ = article而不是?_escaped_fragment_ = article.对于gwt.codesvr,所以2?可能会混淆url参数解析.
其次,您需要确保过滤器处理具有参数gwt.codesvr的情况.看起来您的过滤器假定它是第一个参数-即以?开头.我相信示例here确实可以两种方式工作.
标签:java,gwt,htmlunit,gwtp 来源: https://codeday.me/bug/20191009/1882887.html