Reading from Files
Character Input Functions
The fgetc function
#include <stdio.h>
int fgetc(FILE *stream);
The fgetc function obtains the next character (if present) as an unsigned char converted to an int, from the input stream pointed to by stream, and advances the associated file position indicator for the stream (if defined).
The fgetc function returns the next character from the input stream pointed to by stream. If the stream is at endoffile, the endoffile indicator for the stream is set and fgetc returns EOF (EOF is a negative value defined in <stdio.h>, usually (1)). If a read error occurs, the error indicator for the stream is set and fgetc returns EOF.
The fgets function
#include <stdio.h>
char *fgets(char *s, int n, FILE *stream);
The fgets function reads at most one less than the number of characters specified by n from the stream pointed to by stream into the array pointed to by s. No additional characters are read after a newline character (which is retained) or after endoffile. A null character is written immediately after the last character read into the array.
The fgets function returns s if successful. If endoffile is encountered and no characters have
been read into the array, the contents of the array remain unchanged and a null pointer is returned. If
a read error occurs during the operation, the array contents are indeterminate and a null pointed is returned.
a read error occurs during the operation, the array contents are indeterminate and a null pointed is returned.
Warning: some filesystems use the terminator \r\n in text files; fgets will read those lines,
removing the \n but keeping the \r as the last character of s. This expurious character should be removed in the string s before the string is used for anything.
The getc function
#include <stdio.h>
int getc(FILE *stream);
The getc function is equivalent to fgetc, except that it may be implemented as a macro. If it is implemented as a macro, the stream argument may be evaluated more than once, so the argument should never be an expression with side effects (i.e. have an assignment, increment, or decrement operators, or be a function call).
The getc function returns the next character from the input stream pointed to by stream. If the stream is at endoffile, the endoffile indicator for the stream is set and getc returns EOF (EOF is a negative value defined in <stdio.h>, usually (1)). If a read error occurs, the error indicator for the stream is set and getc returns EOF.
The getchar function
#include <stdio.h>int getchar(void);
The getchar function is equivalent to getc with the argument stdin.
The getchar function returns the next character from the input stream pointed to by stdin. If
stdin is at endoffile, the endoffile indicator for stdin is set and getchar returns EOF (EOF is a negative value defined in <stdio.h>, usually (1)). If a read error occurs, the error indicator for stdin is set and getchar returns EOF.
The gets function
#include <stdio.h>
char *gets(char *s);
The gets function reads characters from the input stream pointed to by stdin into the array pointed to by s until an endoffile is encountered or a newline character is read. Any newline character is discarded, and a null character is written immediately after the last character read into the array.
The gets function returns s if successful. If the endoffile is encountered and no characters have been read into the array, the contents of the array remain unchanged and a null pointer is returned. If
a read error occurs during the operation, the array contents are indeterminate and a null pointer is returned.
a read error occurs during the operation, the array contents are indeterminate and a null pointer is returned.
This function and description is only included here for completeness. Most C programmers
nowadays shy away from using gets, as there is no way for the function to know how big the buffer is that the programmer wants to read into. Commandment #5 of Henry Spencer's The Ten
Commandments for C Programmers (Annotated Edition) reads, "Thou shalt check the array bounds of all strings (indeed, all arrays), for surely where thou typest foo someone someday shall type supercalifragilisticexpialidocious." It mentions gets in the annotation: "As demonstrated by the deeds of the Great Worm, a consequence of this commandment is that robust production software should never make use of gets(), for it is truly a tool of the Devil. Thy interfaces should always inform thy servants of the bounds of thy arrays, and servants who spurn such advice or quietly fail to follow it should be dispatched forthwith to the Land Of Rm, where they can do no further harm to thee."
Commandments for C Programmers (Annotated Edition) reads, "Thou shalt check the array bounds of all strings (indeed, all arrays), for surely where thou typest foo someone someday shall type supercalifragilisticexpialidocious." It mentions gets in the annotation: "As demonstrated by the deeds of the Great Worm, a consequence of this commandment is that robust production software should never make use of gets(), for it is truly a tool of the Devil. Thy interfaces should always inform thy servants of the bounds of thy arrays, and servants who spurn such advice or quietly fail to follow it should be dispatched forthwith to the Land Of Rm, where they can do no further harm to thee."
The ungetc function
#include <stdio.h>
int ungetc(int c, FILE *stream);
The ungetc function pushes the character specified by c (converted to an unsigned char) back onto the input stream pointed to by stream. The pushedback characters will be returned by
subsequent reads on that stream in the reverse order of their pushing. A successful intervening call (with the stream pointed to by stream) to a filepositioning function (fseek, fsetpos, or rewind) discards any pushedback characters for the stream. The external storage corresponding to the stream is unchanged.
subsequent reads on that stream in the reverse order of their pushing. A successful intervening call (with the stream pointed to by stream) to a filepositioning function (fseek, fsetpos, or rewind) discards any pushedback characters for the stream. The external storage corresponding to the stream is unchanged.
One character of pushback is guaranteed. If the ungetc function is called too many times on the same stream without an intervening read or file positioning operation on that stream, the operation may fail.
If the value of c equals that of the macro EOF, the operation fails and the input stream is unchanged.
A successful call to the ungetc function clears the endoffile indicator for the stream. The value of the file position indicator for the stream after reading or discarding all pushedback characters shall be the same as it was before the characters were pushed back. For a text stream, the value of its fileposition indicator after a successful call to the ungetc function is unspecified until all pushed back characters are read or discarded. For a binary stream, its file position indicator is decremented by each successful call to the ungetc function; if its value was zero before a call, it is indeterminate after the call.
The ungetc function returns the character pushed back after conversion, or EOF if the operation fails.
Direct input function: the fread function
#include <stdio.h>
size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream);
The fread function reads, into the array pointed to by ptr, up to nmemb elements whose size is specified by size, from the stream pointed to by stream. The file position indicator for the stream (if defined) is advanced by the number of characters successfully read. If an error occurs, the resulting value of the file position indicator for the stream is indeterminate. If a partial element is read, its value is indeterminate.
The fread function returns the number of elements successfully read, which may be less than
nmemb if a read error or endoffile is encountered. If size or nmemb is zero, fread returns zero and the contents of the array and the state of the stream remain unchanged.
Formatted input functions: the scanf family of functions
#include <stdio.h>
int fscanf(FILE *stream, const char *format, ...); int scanf(const char *format, ...);
int sscanf(const char *s, const char *format, ...);
The fscanf function reads input from the stream pointed to by stream, under control of the string pointed to by format that specifies the admissible sequences and how they are to be converted for assignment, using subsequent arguments as pointers to the objects to receive
converted input. If there are insufficient arguments for the format, the behavior is undefined. If the format is exhausted while arguments remain, the excess arguments are evaluated (as always) but are otherwise ignored.
The format shall be a multibyte character sequence, beginning and ending in its initial shift state.
The format is composed of zero or more directives: one or more whitespace characters; an ordinary
multibyte character (neither % or a whitespace character); or a conversion specification. Each conversion specification is introduced by the character %. After the %, the following appear in sequence:
multibyte character (neither % or a whitespace character); or a conversion specification. Each conversion specification is introduced by the character %. After the %, the following appear in sequence:
• An optional assignmentsuppressing character *.
• An optional nonzero decimal integer that specifies the maximum field width.
• An optional h, l (ell) or L indicating the size of the receiving object. The conversion
specifiers d, i, and n shall be preceded by h if the corresponding argument is a pointer to
short int rather than a pointer to int, or by l if it is a pointer to long int. Similarly, the conversion specifiers o, u, and x shall be preceded by h if the corresponding argument is a pointer to unsigned short int rather than unsigned int, or by l if it is a pointer to unsigned long int. Finally, the conversion specifiers e, f, and g shall be preceded by l if the corresponding argument is a pointer to double rather than a pointer to float, or by L if it is a pointer to long double. If an h, l, or L appears with any other
format specifier, the behavior is undefined.
format specifier, the behavior is undefined.
• A character that specifies the type of conversion to be applied. The valid conversion
specifiers are described below.
specifiers are described below.
The fscanf function executes each directive of the format in turn. If a directive fails, as detailed below, the fscanf function returns. Failures are described as input failures (due to the unavailability of input characters) or matching failures (due to inappropriate input).
A directive composed of whitespace character(s) is executed by reading input up to the first non whitespace character (which remains unread) or until no more characters remain unread.
A directive that is an ordinary multibyte character is executed by reading the next characters of the stream. If one of the characters differs from one comprising the directive, the directive fails, and the differing and subsequent characters remain unread.
A directive that is a conversion specification defines a set of matching input sequences, as described below for each specifier. A conversion specification is executed in the following steps:
Input whitespace characters (as specified by the isspace function) are skipped, unless the
specification includes a [, c, or n specifier. (The whitespace characters are not counted against the specified field width.)
An input item is read from the stream, unless the specification includes an n specifier. An input item is defined as the longest matching sequences of input characters, unless that exceeds a specified field width, in which case it is the initial subsequence of that length in the sequence. The first character, if any, after the input item remains unread. If the length of the input item is zero, the execution of the directive fails; this condition is a matching failure, unless an error prevented input from the stream, in which case it is an input failure.
Except in the case of a % specifier, the input item (or, in the case of a %n directive, the count of input characters) is converted to a type appropriate to the conversion specifier. If the input item is
not a matching sequence, the execution of the directive fails; this condition is a matching failure. Unless assignment suppression was indicated by a *, the result of the conversion is placed in the object pointed to by the first argument following the format argument that has not already received a conversion result. If this object does not have an appropriate type, or if the result of the
not a matching sequence, the execution of the directive fails; this condition is a matching failure. Unless assignment suppression was indicated by a *, the result of the conversion is placed in the object pointed to by the first argument following the format argument that has not already received a conversion result. If this object does not have an appropriate type, or if the result of the
conversion cannot be represented in the space provided, the behavior is undefined. The following conversion specifiers are valid:
d
Matches an optionally signed decimal integer, whose format is the same as expected for the subject sequence of the strtol function with the value 10 for the base argument. The corresponding argument shall be a pointer to integer.
i
Matches an optionally signed integer, whose format is the same as expected for the subject sequence of the strtol function with the value 0 for the base argument. The corresponding argument shall be a pointer to integer.
o
Matches an optionally signed octal integer, whose format is the same as expected for the subject sequence of the strtoul function with the value 8 for the base argument. The corresponding argument shall be a pointer to unsigned integer.
u
Matches an optionally signed decimal integer, whose format is the same as expected for the subject sequence of the strtoul function with the value 10 for the base argument. The corresponding argument shall be a pointer to unsigned integer.
x
Matches an optionally signed hexadecimal integer, whose format is the same as expected for the subject sequence of the strtoul function with the value 16 for the base argument. The corresponding argument shall be a pointer to unsigned integer.
e, f, g
Matches an optionally signed floatingpoint number, whose format is the same as expected for the subject string of the strtod function. The corresponding argument will be a pointer to floating.
s
Matches a sequence of nonwhitespace characters. (No special provisions are made for
multibyte characters.) The corresponding argument shall be a pointer to the initial character of an array large enough to accept the sequence and a terminating null character, which will be added automatically.
[
Matches a nonempty sequence of characters (no special provisions are made for multibyte
characters) from a set of expected characters (the scanset). The corresponding argument shall
be a pointer to the initial character of an array large enough to accept the sequence and a terminating null character, which will be added automatically. The conversion specifier includes all subsequent characters in the format string, up to and including the matching right bracket (]). The characters between the brackets (the scanlist) comprise the scanset, unless the character after the left bracket is a circumflex (^), in which case the scanset contains all the characters that do not appear in the scanlist between the circumflex and the right bracket. If the conversion specifier begins with [] or [^], the rightbracket character is in the scanlist and the next right bracket character is the matching right bracket that ends the
be a pointer to the initial character of an array large enough to accept the sequence and a terminating null character, which will be added automatically. The conversion specifier includes all subsequent characters in the format string, up to and including the matching right bracket (]). The characters between the brackets (the scanlist) comprise the scanset, unless the character after the left bracket is a circumflex (^), in which case the scanset contains all the characters that do not appear in the scanlist between the circumflex and the right bracket. If the conversion specifier begins with [] or [^], the rightbracket character is in the scanlist and the next right bracket character is the matching right bracket that ends the
specification; otherwise, the first right bracket character is the one that ends the specification. If a character is in the scanlist and is not the first, nor the second where the first character is a ^, nor the last character, the behavior is implementationdefined.
c
Matches a sequence of characters (no special provisions are made for multibyte characters) of the number specified by the field width (1 if no field width is present in the directive). The corresponding argument shall be a pointer to the initial character of an array large enough to accept the sequence. No null character is added.
p
Matches an implementationdefined set of sequences, which should be the same as the set of sequences that may be produced by the %p conversion of the fprintf function. The corresponding argument shall be a pointer to void. The interpretation of the input then is implementationdefined. If the input item is a value converted earlier during the same program execution, the pointer that results shall compare equal to that value; otherwise the behavior of the %p conversion is undefined.
n
No input is consumed. The corresponding argument shall be a pointer to integer into which is to be written the number of characters read from the input stream so far by this call to the fscanf function. Execution of a %n directive does not increment the assignment count returned at the completion of execution of the fscanf function.
%
Matches a single %; no conversion or assignment occurs. The complete conversion specification shall be %%.
If a conversion specification is invalid, the behavior is undefined.
The conversion specifiers E, G, and X are also valid and behave the same as, respectively, e, g, and
x.
If endoffile is encountered during input, conversion is terminated. If endoffile occurs before any characters matching the current directive have been read (other than leading white space, where permitted), execution of the current directive terminates with an input failure; otherwise, unless execution of the current directive is terminated with a matching failure, execution of the following directive (if any) is terminated with an input failure.
If conversion terminates on a conflicting input character, the offending input character is left unread in the input stream. Trailing white space (including newline characters) is left unread unless matched by a directive. The success of literal matches and suppressed assignments is not directly determinable other than via the %n directive.
The fscanf function returns the value of the macro EOF if an input failure occurs before any
conversion. Otherwise, the fscanf funciton returns the number of input items assigned, which can be fewer than provided for, or even zero, in the event of an early matching failure.
The scanf function is equivalent to fscanf with the argument stdin interposed before the arguments to scanf. Its return value is similar to that of fscanf.
The sscanf function is equivalent to fscanf, except that the argument s specifies a string from which the input is to be obtained, rather than from a stream. Reaching the end of the string is
equivalent to encountering the endoffile for the fscanf function. If copying takes place between objects that overlap, the behavior is undefined.
equivalent to encountering the endoffile for the fscanf function. If copying takes place between objects that overlap, the behavior is undefined.
0 comments:
Post a Comment