如何在没有fseek和ftell的情况下获得ANSI C中的文件大小?

[英]How to get file size in ANSI C without fseek and ftell?


While looking for ways to find the size of a file given a FILE*, I came across this article advising against it. Instead, it seems to encourage using file descriptors and fstat.

在寻找找到FILE *文件大小的方法时,我发现这篇文章反对它。相反,它似乎鼓励使用文件描述符和fstat。

However I was under the impression that fstat, open and file descriptors in general are not as portable (After a bit of searching, I've found something to this effect).

但是我的印象是fstat,open和文件描述符一般都不那么便携(经过一​​些搜索,我发现了这个效果)。

Is there a way to get the size of a file in ANSI C while keeping in line with the warnings in the article?

有没有办法在ANSI C中获取文件的大小,同时保持与文章中的警告一致?

7 个解决方案

#1


13  

In standard C, the fseek/ftell dance is pretty much the only game in town. Anything else you'd do depends at least in some way on the specific environment your program runs in. Unfortunately said dance also has its problems as described in the articles you've linked.

在标准C中,fseek / ftell舞蹈几乎是城里唯一的游戏。你做的任何其他事情至少在某种程度上取决于你的程序运行的特定环境。不幸的是,舞蹈也存在问题,如你所链接的文章所述。

I guess you could always read everything out of the file until EOF and keep track along the way - with fread() for example.

我想你总是可以读出文件中的所有内容,直到EOF并沿途保持跟踪 - 例如fread()。

#2


6  

The article claims fseek(stream, 0, SEEK_END) is undefined behaviour by citing an out-of-context footnote.

文章声称fseek(stream,0,SEEK_END)是一个未定义的行为,引用了一个不完整的脚注。

The footnote appears in text dealing with wide-oriented streams, which are streams that the first operation that is performed on them is an operation on wide-characters.

脚注出现在处理面向广泛的流的文本中,这些流是对它们执行的第一个操作的流是对宽字符的操作。

This undefined behaviour stems from the combination of two paragraphs. First §7.19.2/5 says that:

这种未定义的行为源于两个段落的组合。首先§7.19.2/ 5说:

— Binary wide-oriented streams have the file-positioning restrictions ascribed to both text and binary streams.

- 二进制宽向导流具有归因于文本和二进制流的文件定位限制。

And the restrictions for file-positioning with text streams (§7.19.9.2/4) are:

文本流(§7.19.9.2/ 4)对文件定位的限制是:

For a text stream, either offset shall be zero, or offset shall be a value returned by an earlier successful call to the ftell function on a stream associated with the same file and whence shall be SEEK_SET.

对于文本流,偏移量应为零,或者offset应为先前成功调用与同一文件关联的流上的ftell函数返回的值,并且应为SEEK_SET。

This makes fseek(stream, 0, SEEK_END) undefined behaviour for wide-oriented streams. There is no such rule like §7.19.2/5 for byte-oriented streams.

这使得面向广泛的流的fseek(stream,0,SEEK_END)未定义的行为。对于面向字节的流,没有像§7.19.2/ 5这样的规则。

Furthermore, when the standard says:

此外,当标准说:

A binary stream need not meaningfully support fseek calls with a whence value of SEEK_END.

二进制流不需要有意义地支持具有SEEK_END值的fseek调用。

It doesn't mean it's undefined behaviour to do so. But if the stream supports it, it's ok.

这并不意味着它是未定义的行为。但如果流支持它,那没关系。

Apparently this exists to allow binary files can have coarse size granularity, i.e. for the size to be a number of disk sectors rather than a number of bytes, and as such allows for an unspecified number of zeros to magically appear at the end of binary files. SEEK_END cannot be meaningfully supported in this case. Other examples include pipes or infinite files like /dev/zero. However, the C standard provides no way to distinguish between such cases, so you're stuck with system-dependent calls if you want to consider that.

显然存在允许二进制文件具有粗粒度粒度,即大小为多个磁盘扇区而不是多个字节,因此允许未指定数量的零在神奇地出现在二进制文件的末尾。在这种情况下,SEEK_END无法得到有意义的支持。其他示例包括管道或无限文件,如/ dev / zero。但是,C标准无法区分这些情况,因此如果您想考虑这种情况,则会遇到与系统相关的调用。

#3


3  

Use fstat - requires the file descriptor - can get that from fileno from the FILE* - Hence the size is in your grasp along with other details.

使用fstat - 需要文件描述符 - 可以从文件中获取文件描述符* - 因此,大小随你掌握以及其他细节。

i.e.

fstat(fileno(filePointer), &buf);

Where filePointer is the FILE *

filePointer是FILE *

and

buf is

buf是

struct stat {
    dev_t     st_dev;     /* ID of device containing file */
    ino_t     st_ino;     /* inode number */
    mode_t    st_mode;    /* protection */
    nlink_t   st_nlink;   /* number of hard links */
    uid_t     st_uid;     /* user ID of owner */
    gid_t     st_gid;     /* group ID of owner */
    dev_t     st_rdev;    /* device ID (if special file) */
    off_t     st_size;    /* total size, in bytes */
    blksize_t st_blksize; /* blocksize for file system I/O */
    blkcnt_t  st_blocks;  /* number of 512B blocks allocated */
    time_t    st_atime;   /* time of last access */
    time_t    st_mtime;   /* time of last modification */
    time_t    st_ctime;   /* time of last status change */
};

#4


2  

different OS's provide different apis for this. For example in windows we have:

不同的操作系统为此提供不同的api。例如在Windows中我们有:

GetFileAttributes()

GetFileAttributes()

In MAC we have:

在MAC我们有:

[[[NSFileManager defaultManager] attributesOfItemAtPath:someFilePath error:nil] fileSize];

[[[NSFileManager defaultManager] attributesOfItemAtPath:someFilePath error:nil] fileSize];

But raw method is only by fread and fseek only: How can I get a file's size in C?

但原始方法只有fread和fseek:我怎样才能在C中获得文件的大小?

#5


2  

You can't always avoid writing platform-specific code, especially when you have to deal with things that are a function of the platform. File sizes are a function of the file system, so as a rule I'd use the native filesystem API to get that information over the fseek/ftell dance. I'd create my own generic wrapper around it, so as to not pollute application logic with platform-specific details and make the code easier to port.

您不能总是避免编写特定于平台的代码,尤其是当您必须处理作为平台功能的事物时。文件大小是文件系统的函数,因此作为规则我会使用本机文件系统API通过fseek / ftell舞蹈获取该信息。我将围绕它创建自己的通用包装器,以便不会使用特定于平台的详细信息污染应用程序逻辑,并使代码更容易移植。

#6


2  

The executive summary is that you must use fseek/ftell because there is no alternative (even the implementation specific ones) that is better.

执行摘要是你必须使用fseek / ftell,因为没有其他选择(甚至是特定于实现的)更好。

The underlying issue is that the "size" of a file in bytes is not always the same as the length of the data in the file and that, in some circumstances, the length of the data is not available.

根本问题是文件的“大小”(以字节为单位)并不总是与文件中数据的长度相同,并且在某些情况下,数据的长度不可用。

A POSIX example is what happens when you write data to a device; the operating system only knows the size of the device. Once the data has been written and the (FILE*) closed there is no record of the length of the data written. If the device is opened for read the fseek/ftell approach will either fail or give you the size of the whole device.

POSIX示例是将数据写入设备时发生的情况;操作系统只知道设备的大小。一旦写入数据并且(FILE *)关闭,就没有写入数据长度的记录。如果打开设备进行读取,则fseek / ftell方法将失败或为您提供整个设备的大小。

When the ANSI-C committee was sitting at the end of the 1980's a number of operating systems the members remembered simply did not store the length of the data in a file; rather they stored the disk blocks of the file and assumed that something in the data terminated it. The 'text' stream represents this. Opening a 'binary' stream on those files shows not only the magic terminator byte, but also any bytes beyond it that were never written but happen to be in the same disk block.

当ANSI-C委员会在1980年代结束时,成员记住的许多操作系统根本没有将数据的长度存储在文件中;而是他们存储了文件的磁盘块,并假设数据中的某些东西终止了它。 'text'流表示这一点。在这些文件上打开“二进制”流不仅显示了魔术终结符字节,还显示了从未写入但恰好位于同一磁盘块中的任何超出它的字节。

Consequently the C-90 standard was written so that it is valid to use the fseek trick; the result is a conformant program, but the result may not be what you expect. The behavior of that program is not 'undefined' in the C-90 definition and it is not 'implementation-defined' (because on UN*X it varies with the file). Neither is it 'invalid'. Rather you get a number you can't completely rely on or, maybe, depending on the parameters to fseek, -1 and an errno.

因此编写了C-90标准,以便使用fseek技巧是有效的;结果是符合要求的程序,但结果可能不是您所期望的。该程序的行为在C-90定义中并未“未定义”,并且不是“实现定义的”(因为在UN * X上它随文件而变化)。也不是'无效'。相反,你得到一个你不能完全依赖的数字,或者,取决于fseek,-1和errno的参数。

In practice if the trick succeeds you get a number that includes at least all the data, and this is probably what you want, and if the trick fails it is almost certainly someone else's fault.

在实践中,如果技巧成功,你会获得一个至少包含所有数据的数字,这可能就是你想要的,如果技巧失败,那几乎肯定是别人的错。

John Bowler

约翰鲍勒

#7


-2  

The article has a little problem of logic.

这篇文章有一点逻辑问题。

It (correctly) identifies that a certain usage of C functions has behavior which is not defined by ISO C. But then, to avoid this undefined behavior, the article proposes a solution: replace that usage with platform-specific functions. Unfortunately, the use of platform-specific functions is also undefined according to ISO C. Therefore, the advice does not solve the problem of undefined behavior.

它(正确地)标识C函数的某些用法具有未由ISO C定义的行为。但是,为了避免这种未定义的行为,本文提出了一种解决方案:用平台特定的函数替换该用法。遗憾的是,根据ISO C,未定义特定于平台的功能的使用。因此,该建议不能解决未定义行为的问题。

The quote in my copy of the 1999 standard confirms that the alleged behavior is indeed undefined:

我的1999年标准副本中的引用证实了所谓的行为确实未定义:

A binary stream need no meaningfully support fseek calls with a whence value of SEEK_END. [ISO 9899:1999 7.19.9.2 paragraph 3]

二进制流不需要有意义地支持具有SEEK_END值的fseek调用。 [ISO 9899:1999 7.19.9.2第3段]

But undefined behavior does not mean "bad behavior"; it is simply behavior for which the ISO C standard gives no definition. Not all undefined behaviors are the same.

但未定义的行为并不意味着“不良行为”;它只是ISO C标准没有定义的行为。并非所有未定义的行为都是相同的。

Some undefined behaviors are areas in the language where meaningful extensions can be provided. The platform fills the gap by defining a behavior.

一些未定义的行为是语言中可以提供有意义的扩展的区域。该平台通过定义行为来填补空白。

Providing a working fseek which can seek from SEEK_END is an example of an extension in place of undefined behavior. It is possible to confirm whether or not a given platform supports fseek from SEEK_END, and if this is provisioned, then it is fine to use it.

提供可以从SEEK_END寻求的工作fseek是代替未定义行为的扩展的示例。可以确认给定平台是否支持来自SEEK_END的fseek,如果已经配置,则可以使用它。

Providing a separate function like lseek is also an extension in place of undefined behavior (the undefined behavior of calling a function which is not in ISO C and not defined in the C program). It is fine to use that, if available.

提供像lseek这样的单独函数也是代替未定义行为的扩展(调用函数的未定义行为,该函数不在ISO C中并且未在C程序中定义)。如果可以的话,可以使用它。

Note that those platforms which have functions like the POSIX lseek will also likely have an ISO C fseek which works from SEEK_END. Also note that on platforms where fseek on a binary file cannot seek from SEEK_END, the likely reason is that this is impossible to do (no API can be provided to do it and that is why the C library function fseek is not able to support it).

请注意,具有POSIX lseek等功能的平台也可能具有适用于SEEK_END的ISO C fseek。另请注意,在二进制文件上的fseek无法从SEEK_END查找的平台上,可能的原因是这是不可能的(没有提供API来执行此操作,这就是为什么C库函数fseek无法支持它)。

So, if fseek does provide the desired behavior on the given platform, then nothing has to be done to the program; it is a waste of effort to change it to use that platform's special function. On the other hand, if fseek does not provide the behavior, then likely nothing does, anyway.

因此,如果fseek确实在给定平台上提供了所需的行为,那么就不需要对程序进行任何操作;改变它以使用该平台的特殊功能是浪费精力。另一方面,如果fseek没有提供行为,那么无论如何也可能没有。

Note that even including a nonstandard header which is not in the program is undefined behavior. (By omission of the definition of behavior.) For instance if the following appears in a C program:

请注意,即使包含不在程序中的非标准标头也是未定义的行为。 (通过省略行为的定义。)例如,如果以下内容出现在C程序中:

#include <unistd.h>

the behavior is not defined after that. [See References below.] The behavior of the preprocessing directive #include is defined, of course. But this creates two possibilities: either the header <unistd.h> does not exist, in which case a diagnostic is required. Or the header does exist. But in that case, the contents are not known (as far as ISO C is concerned; no such header is documented for the Library). In this case, the include directive brings in an unknown chunk of code, incorporating it into the translation unit. It is impossible to define the behavior of an unknown chunk of code.

之后没有定义行为。 [请参阅下面的参考资料。]当然,还会定义预处理指令#include的行为。但这会产生两种可能性:标题 不存在,在这种情况下需要诊断。或者标题确实存在。但在这种情况下,内容是未知的(就ISO而言;没有为库记录这样的标题)。在这种情况下,include伪指令会引入一个未知的代码块,并将其合并到转换单元中。无法定义未知代码块的行为。

#include <platform-specific-header.h> is one of the escape hatches in the language for doing anything whatsoever on a given platform.

#include 是语言中用于在给定平台上执行任何操作的逃生舱之一。

In point form:

以点形式:

  1. Undefined behavior is not inherently "bad" and not inherently a security flaw (though of course it can be! E.g. buffer overruns linked to the undefined behaviors in the area of pointer arithmetic and dereferencing.)
  2. 未定义的行为本身并不是“坏”,并且本质上不是安全漏洞(当然它可以是!例如缓冲区溢出链接到指针算术和解除引用区域中的未定义行为。)
  3. Replacing one undefined behavior with another, only for the purpose of avoiding undefined behavior, is pointless.
  4. 将一个未定义的行为替换为另一个,仅为了避免未定义的行为,是没有意义的。
  5. Undefined behavior is just a special term used in ISO C to denote things that are outside of the scope of ISO C's definition. It does not mean "not defined by anyone in the world" and doesn't imply something is defective.
  6. 未定义的行为只是ISO C中使用的一个特殊术语,用于表示超出ISO C定义范围的内容。它并不意味着“没有世界上任何人定义”,也不意味着某些东西是有缺陷的。
  7. Relying on some undefined behaviors is necessary for making most real-world, useful programs, because many extensions are provided through undefined behavior, including platform-specific headers and functions.
  8. 依赖一些未定义的行为对于制作大多数真实世界的有用程序是必要的,因为许多扩展是通过未定义的行为提供的,包括特定于平台的头和函数。
  9. Undefined behavior can be supplanted by definitions of behavior from outside of ISO C. For instance the POSIX.1 (IEEE 1003.1) series of standards defines the behavior of including <unistd.h>. An undefined ISO C program can be a well defined POSIX C program.
  10. 未定义的行为可以通过ISO C外部行为的定义来取代。例如,POSIX.1(IEEE 1003.1)系列标准定义了包含 的行为。未定义的ISO C程序可以是定义良好的POSIX C程序。
  11. Some problems cannot be solved in C without relying on some kind of undefined behavior. An example of this is a program that wants to seek so many bytes backwards from the end of a file.
  12. 在不依赖某种未定义行为的情况下,C中的某些问题无法解决。一个例子是想要从文件末尾向后搜索这么多字节的程序。

References:

参考文献:


注意!

本站翻译的文章,版权归属于本站,未经许可禁止转摘,转摘请注明本文地址:http://www.itdaan.com/blog/2012/03/22/44c134f102434ef31f3b5ba39ce8ea24.html



 
© 2014-2018 ITdaan.com 粤ICP备14056181号