下载网页的源代码,用Javascript和grunt文件。

[英]Download source of webpage to file with Javascript and grunt


I have a problem and question with my task. I wrote some app in GruntJs. I have to download source of web-page by gruntJs.

我的任务有问题和疑问。我在GruntJs中写了一些应用。我必须通过gruntJs下载网页源代码。

For example I have a page: example.com/index.html.

例如,我有一个页面:example.com/index.html。

I would like to give URL in Grunt task, like this: scr: "example.com/index.html".

我想在Grunt任务中给出URL,比如:scr: example.com/index.html。

And then, I have to have this source in file, ex: source.txt.

然后,我必须在文件中有这个源,ex: source.txt。

How can i do this?

我该怎么做呢?

1 个解决方案

#1


3  

There are a couple approaches to this.

这里有一些方法。

First is the raw http.get from the node.js API as mentioned in the comments. This will get you the raw source as served up by the initial load of the page. The problem comes when that site makes extensive use of javascript to build further html after ajax requests.

首先是原始http。从节点。在评论中提到的js API。这将使您获得由页面初始加载所提供的原始资源。当这个站点在ajax请求之后大量使用javascript构建更多的html时,问题就来了。

Second approach is to use an actual browser engine to load the site and execute whatever javascript & further HTML building runs on page load. The most common engine for this is PhantomJS and it's wrapped in a Grunt library called grunt-lib-phantomjs.

第二种方法是使用一个实际的浏览器引擎来加载站点并执行任何javascript和进一步的HTML构建在页面加载上运行。最常见的引擎是PhantomJS,它包在一个叫做Grunt -lib- PhantomJS的Grunt库中。

Fortunately, someone has provided another layer on top of that to do almost exactly what you're asking for: https://github.com/cburgdorf/grunt-html-snapshot

幸运的是,有人提供了另一层,几乎完全可以完成你所要求的:https://github.com/cburgdorf/grunt-html-snapshot。

The example config from the link above:

示例配置来自上面的链接:

grunt.initConfig({
    htmlSnapshot: {
        all: {
          options: {
            //that's the path where the snapshots should be placed
            //it's empty by default which means they will go into the directory
            //where your Gruntfile.js is placed
            snapshotPath: 'snapshots/',
            //This should be either the base path to your index.html file
            //or your base URL. Currently the task does not use it's own
            //webserver. So if your site needs a webserver to be fully
            //functional configure it here.
            sitePath: 'http://localhost:8888/my-website/',
            //you can choose a prefix for your snapshots
            //by default it's 'snapshot_'
            fileNamePrefix: 'sp_',
            //by default the task waits 500ms before fetching the html.
            //this is to give the page enough time to to assemble itself.
            //if your page needs more time, tweak here.
            msWaitForPages: 1000,
            //if you would rather not keep the script tags in the html snapshots
            //set `removeScripts` to true. It's false by default
            removeScripts: true,
            //he goes the list of all urls that should be fetched
            urls: [
              '',
              '#!/en-gb/showcase'
            ]
          }
        }
    }
});
智能推荐

注意!

本站翻译的文章,版权归属于本站,未经许可禁止转摘,转摘请注明本文地址:http://www.itdaan.com/blog/2013/07/04/33bef64d9297988fb77ba2d98bbcda7d.html



 
© 2014-2019 ITdaan.com 粤ICP备14056181号  

赞助商广告