编程语言
首页 > 编程语言> > java-为什么无论我键入什么网址(可抓取的GWT APP),HTMLUnit始终显示HostPage?

java-为什么无论我键入什么网址(可抓取的GWT APP),HTMLUnit始终显示HostPage?

作者:互联网

这是完整的代码

public class CrawlServlet implements Filter{
 public static String getFullURL(HttpServletRequest request) {
    StringBuffer requestURL = request.getRequestURL();
    String queryString = request.getQueryString();


    if (queryString == null) {
        return requestURL.toString();
    } else {
        return requestURL.append('?').append(queryString).toString();
    }
 }

 @Override
 public void destroy() {
 // TODO Auto-generated method stub

 }

 @Override
 public void doFilter(ServletRequest request, ServletResponse response,
 FilterChain chain) throws IOException, ServletException {

 HttpServletRequest httpRequest = (HttpServletRequest) request;
 String fullURLQueryString = getFullURL(httpRequest);
 System.out.println(fullURLQueryString+" what wrong");

 if ((fullURLQueryString != null) && (fullURLQueryString.contains("_escaped_fragment_"))) {
     // remember to unescape any %XX characters
     fullURLQueryString=URLDecoder.decode(fullURLQueryString,"UTF-8");
     // rewrite the URL back to the original #! version
         String url_with_hash_fragment=fullURLQueryString.replace("?_escaped_fragment_=", "#!");


         final WebClient webClient = new WebClient();

         WebClientOptions options = webClient.getOptions();
         options.setCssEnabled(false);
         options.setThrowExceptionOnScriptError(false);
         options.setThrowExceptionOnFailingStatusCode(false);
         options.setJavaScriptEnabled(false);
         HtmlPage page = webClient.getPage(url_with_hash_fragment);

         // important!  Give the headless browser enough time to execute JavaScript
         // The exact time to wait may depend on your application.

         webClient.waitForBackgroundJavaScript(20000);

         // return the snapshot
         //String originalHtml=page.getWebResponse().getContentAsString();
         //System.out.println(originalHtml+" +++++++++");
         System.out.println(page.asXml()+" +++++++++");

         PrintWriter out = response.getWriter();
         out.println(page.asXml());
         //out.println(originalHtml);
     } else {
      try {
        // not an _escaped_fragment_ URL, so move up the chain of servlet (filters)
        chain.doFilter(request, response);
      } catch (ServletException e) {
        System.err.println("Servlet exception caught: " + e);
        e.printStackTrace();
      }
    }

 }


 @Override
 public void init(FilterConfig arg0) throws ServletException {
 // TODO Auto-generated method stub

 }


}

打开网址“ http://127.0.0.1:8888/Myproject.html?gwt.codesvr=127.0.0.1:9997?_escaped_fragment_=article”后,它显示了主机页面html代码,如下所示:

<html>

<head>
<meta name="fragment" content="!">
<meta http-equiv="content-type" content="text/html; charset=UTF-8"/>
<!-- -->
<!--
 Consider inlining CSS to reduce the number of requested files 
-->
<!-- -->
<link type="text/css" rel="stylesheet" href="MyProject.css"/>
<!-- -->
<!-- Any title is fine -->
<!-- -->
<title>MyProject</title>
<!-- -->
<!-- This script loads your compiled module. -->
<!-- If you add any GWT meta tags, they must -->
<!-- be added before this line. -->
<!-- -->
<script type="text/javascript" language="javascript" ></script>
<!-- -->
<!-- The body can have arbitrary html, or -->
<!-- you can leave the body empty if you want -->
<!-- to create a completely dynamic UI. -->
<!-- -->
</head>
<body>

<div id="loading">
Loading
<br/>
<img src="../images/loading.gif"/>
</div>
<!-- OPTIONAL: include this if you want history support -->
<iframe src="javascript:''" id="__gwt_historyFrame" tabindex="-1" style="position: absolute; width: 0;height: 0; border:0;"></iframe>
<!--
 RECOMMENDED if your web app will not function without JavaScript enabled 
-->
<noscript>

<div style="width: 22em; position: absolute; left: 50%; margin-left: -11em; color: red; background-color: white; border: 1pxsolid red; padding: 4px; font-family: sans-serif;">
Your web browser must have JavaScript enabled in order for this application to display correctly.
</div>
</noscript>
</body>
</html>

另一方面,“ http://127.0.0.1:8888/Myproject.html?gwt.codesvr=127.0.0.1:9997#!article”可以正常工作显示文章没有任何问题.

我还编译了整个项目在Tomcat7下运行它,但是我有同样的问题.它始终显示宿主页面的html.

注意:文章页面是嵌套在演示者演示文稿中的嵌套演示者.但是我不认为这不是cos甚至不显示标题页的主要原因.

解决方法:

首先,可以尝试使用& _escaped_fragment_ = article而不是?_escaped_fragment_ = article.对于gwt.codesvr,所以2?可能会混淆url参数解析.

其次,您需要确保过滤器处理具有参数gwt.codesvr的情况.看起来您的过滤器假定它是第一个参数-即以?开头.我相信示例here确实可以两种方式工作.

标签:java,gwt,htmlunit,gwtp
来源: https://codeday.me/bug/20191009/1882887.html