在foreach中未分配的Perl變量:范圍問題

[英]Perl variable not assigned in foreach: scope issues


I am trying to normalize some scores from a .txt file by dividing each score for each possible sense (eg. take#v#2; referred to as $tokpossense in my code) by the sum of all scores for a wordtype (e.g. take#v; referred to as $tokpos). The difficulty is in grouping the wordtypes together when processing each line of the so that the normalized scores are printed upon finding a new wordtype/$tokpos. I used two hashes and an if block to achieve this.

我試圖通過將每個可能意義的每個分數(例如,在我的代碼中將#v#2;稱為$ tokpossense)除以一個單詞類型的所有分數的總和來從.txt文件中標准化一些分數(例如,取#v;簡稱$ tokpos)。難點在於在處理每一行時將單詞類型分組在一起,以便在找到新的單詞類型/ $ tokpos時打印標准化分數。我使用了兩個哈希和一個if塊來實現這一目標。

Currently, the problem seems to be that $tokpos is undefined as a key in SumHash{$tokpos} at line 20 resulting in a division by zero. However, I believe $tokpos is properly defined within the scope of this block. What is the problem exactly and how would I best solve it? I would also gladly hear alternative approaches to this problem.

目前,問題似乎是$ tokpos未定義為第20行SumHash {$ tokpos}中的一個鍵,導致除以零。但是,我認為$ tokpos已在此塊的范圍內正確定義。究竟是什么問題,我最好如何解決它?我也很樂意聽到解決這個問題的其他方法。

Here's an example inputfile:

這是一個示例輸入文件:

i#CL take#v#17 my#CL checks#n#1 to#CL the#CL bank#n#2 .#IT 
Context: i#CL <target>take#v</target> my#CL checks#n to#CL the#CL bank#n
  Scores for take#v
    take#v#1: 17
    take#v#10: 158
    take#v#17: 174
  Winning score: 174
Context: i#CL take#v my#CL <target>checks#n</target> to#CL the#CL bank#n .#IT
  Scores for checks#n
    check#n#1: 198
    check#n#2: 117
    check#n#3: 42
  Winning score: 198
Context: take#v my#CL checks#n to#CL the#CL <target>bank#n</target> .#IT
  Scores for bank#n
    bank#n#1: 81
    bank#n#2: 202
    bank#n#3: 68
    bank#n#4: 37
  Winning score: 202

My erroneous Code:

我的錯誤代碼:

@files = @ARGV;
foreach $file(@files){
    open(IN, $file);
    @lines=<IN>;
    foreach (@lines){
        chomp;
        #store tokpossense (eg. "take#v#1") and rawscore (eg. 4)
        if (($tokpossense,$rawscore)= /^\s{4}(.+): (\d+)/) {
            #split tokpossense for recombination
            ($tok,$pos,$sensenr)=split(/#/,$tokpossense);
            #tokpos (eg. take#v) will be a unique identifier when calculating normalized score
            $tokpos="$tok\#$pos";
            #block for when new tokpos(word) is found in inputfile
            if (defined($prevtokpos) and
                ($tokpos ne $prevtokpos)) {
                    # normalize hash: THE PROBLEM LIES IN $SumHash{$tokpos} which is returned as zero > WHY?
                    foreach (keys %ScoreHash) {
                        $normscore=$ScoreHash{$_}/$SumHash{$tokpos};
                        #print the results to a file
                        print "$_\t$ScoreHash{$_}\t$normscore\n";
                    }
                    #empty hashes
                    undef %ScoreHash;
                    undef %SumHash;
            }
            #prevtokpos is assigned to tokpos for condition above
            $prevtokpos = $tokpos;
            #store the sum of scores for a tokpos identifier for normalization
            $SumHash{$tokpos}+=$rawscore;
            #store the scores for a tokpossense identifier for normalization
            $ScoreHash{$tokpossense}=$rawscore;
        }
        #skip the irrelevant lines of inputfile
        else {next;}
    }
}

Extra info: I am doing Word Sense Disambiguation using Pedersen's Wordnet WSD tool which uses Wordnet::Similarity::AllWords. The output file is generated by this package and the found scores have to be normalized for implementation in our toolset.

額外信息:我正在使用Pedersen的Wordnet WSD工具進行Word Sense Disambiguation,它使用Wordnet :: Similarity :: AllWords。輸出文件由此包生成,必須對找到的分數進行標准化,以便在我們的工具集中實現。

2 个解决方案

#1


You don't assign anything to $tokpos. The assignment is part of a comment - syntax highlighting in your editor should've told you. strict would've told you, too.

你沒有給$ tokpos分配任何東西。作業是評論的一部分 - 編輯器中的語法高亮應該告訴你。嚴格也會告訴你的。

Also, you should probably use $prevtokpos in the division: $tokpos is the new value that you haven't met before. To get the output for the last token, you have to process it outside the loop, as there's no $tokpos to replace it. To avoid code repetition, use a subroutine to do that:

此外,您應該在分部中使用$ prevtokpos:$ tokpos是您之前未遇到的新值。要獲取最后一個令牌的輸出,您必須在循環外處理它,因為沒有$ tokpos來替換它。為避免代碼重復,請使用子例程來執行此操作:

#!/usr/bin/perl
use warnings;
use strict;

my %SumHash;
my %ScoreHash;

sub output {
    my $token = shift;
    for (keys %ScoreHash) {
        my $normscore = $ScoreHash{$_} / $SumHash{$token};
        print "$_\t$ScoreHash{$_}\t$normscore\n";
    }
    undef %ScoreHash;
    undef %SumHash;
}

my $prevtokpos;
while (<DATA>){
    chomp;
    if (my ($tokpossense,$rawscore) = /^\s{4}(.+): (\d+)/) {
        my ($tok, $pos, $sensenr) = split /#/, $tokpossense;
        my $tokpos = "$tok\#$pos";
        if (defined $prevtokpos && $tokpos ne $prevtokpos) {
            output($prevtokpos);
        }

        $prevtokpos = $tokpos;
        $SumHash{$tokpos} += $rawscore;
        $ScoreHash{$tokpossense} = $rawscore;
    }
}
output($prevtokpos);

__DATA__
i#CL take#v#17 my#CL checks#n#1 to#CL the#CL bank#n#2 .#IT 
Context: i#CL <target>take#v</target> my#CL checks#n to#CL the#CL bank#n
  Scores for take#v
    take#v#1: 17
    take#v#10: 158
    take#v#17: 174
  Winning score: 174
Context: i#CL take#v my#CL <target>checks#n</target> to#CL the#CL bank#n .#IT
  Scores for checks#n
    check#n#1: 198
    check#n#2: 117
    check#n#3: 42
  Winning score: 198
Context: take#v my#CL checks#n to#CL the#CL <target>bank#n</target> .#IT
  Scores for bank#n
    bank#n#1: 81
    bank#n#2: 202
    bank#n#3: 68
    bank#n#4: 37
  Winning score: 202

#2


You're confusing yourself by trying to print the results as soon as $tokpos changes. For one thing it's the values for $prevtokpos that are complete, but your trying to output the data for $tokpos; and also you're never going to display the last block of data because you require a change in $tokpos to trigger the output.

$ tokpos更改后,嘗試打印結果會讓您感到困惑。首先,$ prevtokpos的值是完整的,但是你試圖輸出$ tokpos的數據;而且你永遠不會顯示最后一個數據塊,因為你需要更改$ tokpos來觸發輸出。

It's far easier to accumulate all the data for a given file and then print it when the end of file is reached. This program works by keeping the three values $tokpos, $sense, and $rawscore for each line of the output in array @results, together with the total score for each value of $tokpos in %totals. Then it's simply a matter of dumping the contents of @results with an extra column that divides each value by the corresponding total.

累積給定文件的所有數據然后在到達文件末尾時打印它會容易得多。這個程序的工作原理是保持數組@results中輸出的每一行的三個值$ tokpos,$ sense和$ rawscore,以及%tokpos中每個值的總得分。%total。然后,只需將@results的內容轉儲為一個額外的列,將每個值除以相應的總數。

use strict;
use warnings;
use 5.014; # For non-destructive substitution

for my $file ( @ARGV ) {

    open my $fh, '<', $file or die $!;

    my (@results, %totals);

    while ( <$fh> ) {
        chomp;
        next unless my ($tokpos, $sense, $rawscore) = / ^ \s{4} ( [^#]+ \# [^#]+ ) \# (\d+) : \s+ (\d+)  /x;
        push @results, [ $tokpos, $sense, $rawscore ];
        $totals{$tokpos} += $rawscore;
    }

    print "** $file **\n";
    for my $item ( @results ) {
        my ($tokpos, $sense, $rawscore) = @$item;
        printf "%s\t%s\t%6.4f\n", $tokpos.$sense, $rawscore, $rawscore / $totals{$tokpos};
    }
    print "\n";
}

output

** tokpos.txt **
take#v#1  17  0.0487
take#v#10 158 0.4527
take#v#17 174 0.4986
check#n#1 198 0.5546
check#n#2 117 0.3277
check#n#3 42  0.1176
bank#n#1  81  0.2088
bank#n#2  202 0.5206
bank#n#3  68  0.1753
bank#n#4  37  0.0954

注意!

本站翻译的文章,版权归属于本站,未经许可禁止转摘,转摘请注明本文地址:https://www.itdaan.com/blog/2015/05/04/72593c5949f48b2ace00015aa61f2a8c.html



 
粤ICP备14056181号  © 2014-2021 ITdaan.com