如何從網站中提取數據計數器,以便在另一個HTML項目中作為JS變量使用

[英]How to pull data counter from a website to use in another HTML project as a JS variable


Hi I am trying to find a way to be able to pull a variable that is displayed on a website that is not my own onto one that is my own so I can use it on mine as a JavaScript variable. Ideally, i would like to be able to display this variable and also have it update when it is updated within the website's system.

你好,我想找到一種方法,可以把顯示在網站上的變量放到我自己的變量上,這樣我就可以把它用作JavaScript變量。理想情況下,我希望能夠顯示這個變量,並且當它在網站的系統中更新時也能更新它。

I have been searching for a few days now and can't seem to find a clear explanation on how i could accomplish this. To be more specific, my school keeps track of parking data and how many spaces are available and i would like to be able to use these numbers in the HTML project i'm working on.

我已經找了幾天了,似乎找不到一個明確的解釋,關於我如何能做到這一點。更具體地說,我的學校一直在跟蹤泊車數據以及有多少空位,我希望能夠在我正在進行的HTML項目中使用這些數字。

I would greatly appreciate it if someone could explain to me how to make these two numbers accessible as JavaScript variables in my project using whatever means necessary.

如果有人能向我解釋如何在我的項目中使用任何必要的方法使這兩個數字作為JavaScript變量進行訪問,我將非常感激。

3 个解决方案

#1


1  

In your particular case, you could use it, but you don't need web scraping. As mentioned in the comments by JasonK You can use the same API call that the page is using:

在您的特殊情況下,您可以使用它,但是您不需要web抓取。正如JasonK評論中提到的,您可以使用頁面正在使用的API調用:

https://www.jmu.edu/cgi-bin/parking_get_sign_data.cgi?date=1441292695108

Now, you cannot use that API from your website because of same-origin policy, but you can create a small service to get your data from. In node.js it could look like this, but you can easily implement the same function in php:

現在,由於同源策略,您不能從您的網站使用該API,但是您可以創建一個小服務來獲取數據。在節點。它可以是這樣的,但是你可以很容易地在php中實現相同的功能:

var request = require("request");
var http    = require('http');

var server  = http.createServer(onRequest);

server.listen(3000);


//----------------------------------------------------
function onRequest(req, res){

    var parkingUrl = 'https://www.jmu.edu/cgi-bin/parking_get_sign_data.cgi?date=' + (new Date()).getTime();

    request(parkingUrl, function (error, response, body) {

        var data   = error;
        var status = 404;

        if(!error){
            status = 200;
            data = {
                championStatus : getStatus(body, '2'), 
                warsawStatus   : getStatus(body, '10')
            };
        }

        res.writeHead(status, { 'Content-Type': 'application/json', "Access-Control-Allow-Origin":"*" });
        res.write(JSON.stringify(data));
        res.end();
    });
}


//----------------------------------------------------
function getStatus(ss, si){
    var status = ss;

    status = status.split("<SignId>"+si+"</SignId>"); 
    status = status[1];
    status = status.split("<Display>"); 
    status = status[1];
    status = status.split("</Display>"); 
    status = status[0];
    status = status.replace(' ','');
    if(isNaN(status)){
        // do nothing 
    } else {
        status = parseInt(status);
    }

    if( status == 'Errors'){status = '';}
    else if(status != 'FULL' && isNaN(status)){status = 'Unavailable';}
    else if(status != '' && status != 'FULL'  && status != 'OPEN'){
        if(status == '   1'){status = status + ' space available'; }
        else{status = status + ' spaces available'; }
    } 
    return status;
}

The getStatus function is taken straight from the https://www.jmu.edu/parking/ website, i'd rather use xml2js or a similar module to parse the response and the data.

getStatus函數直接取自https://www.jmu.edu/parking/網站,我寧願使用xml2js或類似的模塊來解析響應和數據。

From your website you can now get the status like this:

現在你可以從你的網站上獲得如下狀態:

function httpGetAsync(url, callback)
{
    var xmlHttp = new XMLHttpRequest();
    xmlHttp.onreadystatechange = function() { 
        if (xmlHttp.readyState == 4 && xmlHttp.status == 200){
            callback(xmlHttp.responseText);
        }  
    }
    xmlHttp.open("GET", url, true); // true for asynchronous
    xmlHttp.send(null);
}

httpGetAsync("http://localhost:3000/", function(res){
    var data = JSON.parse(res);
    console.log(data);
});

Don't forget to change localhost:3000 to your server address, adjust the Access-Control-Allow-Origin header to limit who can use your service and add some error handling.

不要忘記更改本地主機:3000到您的服務器地址,調整訪問控制允許的源頭,以限制誰可以使用您的服務,並添加一些錯誤處理。

#2


2  

Pasting previous comment as an answer to have more space:

粘貼之前的評論作為有更多空間的回答:

A possible way would be to do some web page scraping.

一個可能的方法是做一些網頁抓取。

Every X amount of time you grab a copy of the page you are interested in, and then you can just scan the page source for the value you want, with a regular expression for example. Then you can return that value after scanning and put that into a variable.

每隔X段時間,您就會獲取感興趣的頁面的副本,然后您就可以掃描頁面源代碼來獲取您想要的值,例如使用正則表達式。然后您可以在掃描后返回該值,並將其放入一個變量中。

It's not the most efficient route (ideally they would provide an API, but I think this would be overkill for their use case) but it can work.

這不是最有效的路徑(理想情況下,他們會提供一個API,但是我認為這對他們的用例來說是多余的),但是它可以工作。

For example, a quick Google search for "web page scraper" gives:

例如,谷歌快速搜索“網頁刮刀”會得到:

You can either use one of those (or similar, I really haven't used those particular ones) or you can build your own, but the concept is the same:

你可以使用其中的一種(或者類似的,我真的沒有使用過那些特殊的)或者你可以構建自己的,但是概念是一樣的:

Get the web page source code, discard anything that you don't need or alternatively extract only what you want and that's it.

獲取web頁面源代碼,刪除不需要的內容,或者只提取需要的內容,僅此而已。

#3


1  

Unless you have a way to communicate with the school server and get that data, you're probably stuck with scraping. If you look at the code of the school website, you see that the amount of free spaces is generated by calling a cgi script and parsing it.

除非您有辦法與學校服務器通信並獲取數據,否則您可能會陷入抓取。如果您查看學校網站的代碼,您會發現大量的空閑空間是通過調用cgi腳本並對其進行解析生成的。

If you have access to this cgi script, you can just use that script to get your values, and parse it as described in the function getStatus from the source code of the school site.

如果您可以訪問這個cgi腳本,您可以使用該腳本獲取您的值,並按照函數getStatus中的代碼從學校站點的源代碼解析它。

If you don't have access to the cgi, you can try doing an ajax call to this website and check if the node containing the numbers is available for you to select from the DOM.

如果您沒有訪問cgi的權限,您可以嘗試對這個網站進行ajax調用,並檢查包含數字的節點是否可用,以便從DOM中進行選擇。

If you can't access the DOM of the website and/or if accessing it is too slow, load the site with ajax, but instead of text/html, ask for text/plain so you just get a long string containing the website. Then you can scrape this string with a regular expression to get your value.

如果你不能訪問網站的DOM,或者訪問太慢,用ajax加載網站,而不是文本/html,請求文本/純文本,這樣你就會得到一個包含網站的長字符串。然后您可以用正則表達式刮下這個字符串以獲得值。

If all of this fails, load the site into a hidden iframe, to ensure that the script that inserts the parking lot numbers, is run. Then continue as normal, by selecting the correct node from out of this iframe.

如果所有這些都失敗了,請將站點加載到一個隱藏的iframe中,以確保插入停車場號的腳本得到運行。然后通過從這個iframe中選擇正確的節點繼續正常工作。

These are all options for clientside. There are probably more options serverside, (like easier interaction with the schools cgi) but the general principles remain. Either use their own API (the cgi script), use the website itsself to scrape, or use a text representation of the fully loaded website to regex.

這些都是客戶端的選項。可能有更多的選擇服務器端,(比如更容易與學校的cgi交互),但總的原則仍然存在。或者使用他們自己的API (cgi腳本),使用網站的self來抓取,或者使用一個完全載入的網站的文本表示來regex。


注意!

本站翻译的文章,版权归属于本站,未经许可禁止转摘,转摘请注明本文地址:https://www.itdaan.com/blog/2015/09/03/19088cf304f4edec3b330b4eb5ce8c98.html



 
粤ICP备14056181号  © 2014-2021 ITdaan.com