如何从网站中提取数据计数器,以便在另一个HTML项目中作为JS变量使用

[英]How to pull data counter from a website to use in another HTML project as a JS variable


Hi I am trying to find a way to be able to pull a variable that is displayed on a website that is not my own onto one that is my own so I can use it on mine as a JavaScript variable. Ideally, i would like to be able to display this variable and also have it update when it is updated within the website's system.

你好,我想找到一种方法,可以把显示在网站上的变量放到我自己的变量上,这样我就可以把它用作JavaScript变量。理想情况下,我希望能够显示这个变量,并且当它在网站的系统中更新时也能更新它。

I have been searching for a few days now and can't seem to find a clear explanation on how i could accomplish this. To be more specific, my school keeps track of parking data and how many spaces are available and i would like to be able to use these numbers in the HTML project i'm working on.

我已经找了几天了,似乎找不到一个明确的解释,关于我如何能做到这一点。更具体地说,我的学校一直在跟踪泊车数据以及有多少空位,我希望能够在我正在进行的HTML项目中使用这些数字。

I would greatly appreciate it if someone could explain to me how to make these two numbers accessible as JavaScript variables in my project using whatever means necessary.

如果有人能向我解释如何在我的项目中使用任何必要的方法使这两个数字作为JavaScript变量进行访问,我将非常感激。

3 个解决方案

#1


1  

In your particular case, you could use it, but you don't need web scraping. As mentioned in the comments by JasonK You can use the same API call that the page is using:

在您的特殊情况下,您可以使用它,但是您不需要web抓取。正如JasonK评论中提到的,您可以使用页面正在使用的API调用:

https://www.jmu.edu/cgi-bin/parking_get_sign_data.cgi?date=1441292695108

Now, you cannot use that API from your website because of same-origin policy, but you can create a small service to get your data from. In node.js it could look like this, but you can easily implement the same function in php:

现在,由于同源策略,您不能从您的网站使用该API,但是您可以创建一个小服务来获取数据。在节点。它可以是这样的,但是你可以很容易地在php中实现相同的功能:

var request = require("request");
var http    = require('http');

var server  = http.createServer(onRequest);

server.listen(3000);


//----------------------------------------------------
function onRequest(req, res){

    var parkingUrl = 'https://www.jmu.edu/cgi-bin/parking_get_sign_data.cgi?date=' + (new Date()).getTime();

    request(parkingUrl, function (error, response, body) {

        var data   = error;
        var status = 404;

        if(!error){
            status = 200;
            data = {
                championStatus : getStatus(body, '2'), 
                warsawStatus   : getStatus(body, '10')
            };
        }

        res.writeHead(status, { 'Content-Type': 'application/json', "Access-Control-Allow-Origin":"*" });
        res.write(JSON.stringify(data));
        res.end();
    });
}


//----------------------------------------------------
function getStatus(ss, si){
    var status = ss;

    status = status.split("<SignId>"+si+"</SignId>"); 
    status = status[1];
    status = status.split("<Display>"); 
    status = status[1];
    status = status.split("</Display>"); 
    status = status[0];
    status = status.replace(' ','');
    if(isNaN(status)){
        // do nothing 
    } else {
        status = parseInt(status);
    }

    if( status == 'Errors'){status = '';}
    else if(status != 'FULL' && isNaN(status)){status = 'Unavailable';}
    else if(status != '' && status != 'FULL'  && status != 'OPEN'){
        if(status == '   1'){status = status + ' space available'; }
        else{status = status + ' spaces available'; }
    } 
    return status;
}

The getStatus function is taken straight from the https://www.jmu.edu/parking/ website, i'd rather use xml2js or a similar module to parse the response and the data.

getStatus函数直接取自https://www.jmu.edu/parking/网站,我宁愿使用xml2js或类似的模块来解析响应和数据。

From your website you can now get the status like this:

现在你可以从你的网站上获得如下状态:

function httpGetAsync(url, callback)
{
    var xmlHttp = new XMLHttpRequest();
    xmlHttp.onreadystatechange = function() { 
        if (xmlHttp.readyState == 4 && xmlHttp.status == 200){
            callback(xmlHttp.responseText);
        }  
    }
    xmlHttp.open("GET", url, true); // true for asynchronous
    xmlHttp.send(null);
}

httpGetAsync("http://localhost:3000/", function(res){
    var data = JSON.parse(res);
    console.log(data);
});

Don't forget to change localhost:3000 to your server address, adjust the Access-Control-Allow-Origin header to limit who can use your service and add some error handling.

不要忘记更改本地主机:3000到您的服务器地址,调整访问控制允许的源头,以限制谁可以使用您的服务,并添加一些错误处理。

#2


2  

Pasting previous comment as an answer to have more space:

粘贴之前的评论作为有更多空间的回答:

A possible way would be to do some web page scraping.

一个可能的方法是做一些网页抓取。

Every X amount of time you grab a copy of the page you are interested in, and then you can just scan the page source for the value you want, with a regular expression for example. Then you can return that value after scanning and put that into a variable.

每隔X段时间,您就会获取感兴趣的页面的副本,然后您就可以扫描页面源代码来获取您想要的值,例如使用正则表达式。然后您可以在扫描后返回该值,并将其放入一个变量中。

It's not the most efficient route (ideally they would provide an API, but I think this would be overkill for their use case) but it can work.

这不是最有效的路径(理想情况下,他们会提供一个API,但是我认为这对他们的用例来说是多余的),但是它可以工作。

For example, a quick Google search for "web page scraper" gives:

例如,谷歌快速搜索“网页刮刀”会得到:

You can either use one of those (or similar, I really haven't used those particular ones) or you can build your own, but the concept is the same:

你可以使用其中的一种(或者类似的,我真的没有使用过那些特殊的)或者你可以构建自己的,但是概念是一样的:

Get the web page source code, discard anything that you don't need or alternatively extract only what you want and that's it.

获取web页面源代码,删除不需要的内容,或者只提取需要的内容,仅此而已。

#3


1  

Unless you have a way to communicate with the school server and get that data, you're probably stuck with scraping. If you look at the code of the school website, you see that the amount of free spaces is generated by calling a cgi script and parsing it.

除非您有办法与学校服务器通信并获取数据,否则您可能会陷入抓取。如果您查看学校网站的代码,您会发现大量的空闲空间是通过调用cgi脚本并对其进行解析生成的。

If you have access to this cgi script, you can just use that script to get your values, and parse it as described in the function getStatus from the source code of the school site.

如果您可以访问这个cgi脚本,您可以使用该脚本获取您的值,并按照函数getStatus中的代码从学校站点的源代码解析它。

If you don't have access to the cgi, you can try doing an ajax call to this website and check if the node containing the numbers is available for you to select from the DOM.

如果您没有访问cgi的权限,您可以尝试对这个网站进行ajax调用,并检查包含数字的节点是否可用,以便从DOM中进行选择。

If you can't access the DOM of the website and/or if accessing it is too slow, load the site with ajax, but instead of text/html, ask for text/plain so you just get a long string containing the website. Then you can scrape this string with a regular expression to get your value.

如果你不能访问网站的DOM,或者访问太慢,用ajax加载网站,而不是文本/html,请求文本/纯文本,这样你就会得到一个包含网站的长字符串。然后您可以用正则表达式刮下这个字符串以获得值。

If all of this fails, load the site into a hidden iframe, to ensure that the script that inserts the parking lot numbers, is run. Then continue as normal, by selecting the correct node from out of this iframe.

如果所有这些都失败了,请将站点加载到一个隐藏的iframe中,以确保插入停车场号的脚本得到运行。然后通过从这个iframe中选择正确的节点继续正常工作。

These are all options for clientside. There are probably more options serverside, (like easier interaction with the schools cgi) but the general principles remain. Either use their own API (the cgi script), use the website itsself to scrape, or use a text representation of the fully loaded website to regex.

这些都是客户端的选项。可能有更多的选择服务器端,(比如更容易与学校的cgi交互),但总的原则仍然存在。或者使用他们自己的API (cgi脚本),使用网站的self来抓取,或者使用一个完全载入的网站的文本表示来regex。


注意!

本站翻译的文章,版权归属于本站,未经许可禁止转摘,转摘请注明本文地址:https://www.itdaan.com/blog/2015/09/03/19088cf304f4edec3b330b4eb5ce8c98.html



 
粤ICP备14056181号  © 2014-2021 ITdaan.com