port of Israel Ekpo\'s CSV parser library
Dependents: parser_sample IoTGateway_Basic
csv_parser Class Reference
The csv_parser object. More...
#include <csv_parser.h>
Public Member Functions | |
csv_parser () | |
Class constructor. | |
~csv_parser () | |
Class destructor. | |
bool | init (FILE *input_file_pointer) |
bool | init (const char *input_filename) |
void | set_enclosed_char (char fields_enclosed_by, enclosure_type_t enclosure_mode) |
Defines the Field Enclosure character used in the Text File. | |
void | set_field_term_char (char fields_terminated_by) |
Defines the Field Delimiter character used in the text file. | |
void | set_line_term_char (char lines_terminated_by) |
Defines the Record Terminator character used in the text file. | |
bool | has_more_rows (void) |
Returns whether there is still more data. | |
void | set_skip_lines (unsigned int lines_to_skip) |
Defines the number of records to discard. | |
csv_row | get_row (void) |
Return the current row from the CSV file. | |
unsigned int | get_record_count (void) |
Returns the number of times the csv_parser::get_row() method has been invoked. | |
void | reset_record_count (void) |
Resets the record_count internal attribute to zero. | |
Protected Attributes | |
char | enclosed_char |
The enclosure character. | |
char | escaped_char |
The escape character. | |
char | field_term_char |
The field terminator. | |
char | line_term_char |
The record terminator. | |
unsigned int | enclosed_length |
Enclosure length. | |
unsigned int | escaped_length |
The length of the escape character. | |
unsigned int | field_term_length |
Length of the field terminator. | |
unsigned int | line_term_length |
Length of the record terminator. | |
unsigned int | ignore_num_lines |
Number of records to discard. | |
unsigned int | record_count |
Number of times the get_row() method has been called. | |
FILE * | input_fp |
The CSV File Pointer. | |
char * | input_filename |
Buffer to input file name. | |
enclosure_type_t | enclosure_type |
Mode in which the CSV file will be parsed. | |
bool | more_rows |
There are still more records to parse. |
Detailed Description
The csv_parser object.
Used to parse text files to extract records and fields.
We are making the following assumptions :
- The record terminator is only one character in length.
- The field terminator is only one character in length.
- The fields are enclosed by single characters, if any.
- The parser can handle documents where fields are always enclosed, not enclosed at all or optionally enclosed.
- When fields are strictly all enclosed, there is an assumption that any enclosure characters within the field are escaped by placing a backslash in front of the enclosure character.
The CSV files can be parsed in 3 modes.
- (a) No enclosures
- (b) Fields always enclosed.
- (c) Fields optionally enclosed.
For option (c) when the enclosure character is optional, if an enclosure character is spotted at either the beginning or the end of the string, it is assumed that the field is enclosed.
The csv_parser::init() method can accept a character array as the path to the CSV file. Since it is overloaded, it can also accept a FILE pointer to a stream that is already open for reading.
The set_enclosed_char() method accepts the field enclosure character as the first parameter and the enclosure mode as the second parameter which controls how the text file is going to be parsed.
- See also:
- csv_parser::set_enclosed_char()
- enclosure_type_t
Definition at line 177 of file csv_parser.h.
Constructor & Destructor Documentation
csv_parser | ( | ) |
Class constructor.
This is the default constructor.
All the internal attributes are initialized here
- The enclosure character is initialized to NULL 0x00.
- The escape character is initialized to the backslash character 0x5C.
- The field delimiter character is initialized to a comma 0x2C.
- The record delimiter character is initialized to a new line character 0x0A.
- The lengths of all the above-mentioned fields are initialized to 0,1,1 and 1 respectively.
- The number of records to ignore is set to zero.
- The more_rows internal attribute is set to false.
- The pointer to the CSV input file is initialized to NULL
- The pointer to the buffer for the file name is also initialized to NULL
Definition at line 200 of file csv_parser.h.
~csv_parser | ( | ) |
Class destructor.
In the class destructor the file pointer to the input CSV file is closed and the buffer to the input file name is also deallocated.
Definition at line 219 of file csv_parser.h.
Member Function Documentation
unsigned int get_record_count | ( | void | ) |
Returns the number of times the csv_parser::get_row() method has been invoked.
- See also:
- csv_parser::reset_record_count()
- Returns:
- unsigned int The number of times the csv_parser::get_row() method has been invoked.
Definition at line 338 of file csv_parser.h.
csv_row get_row | ( | void | ) |
Return the current row from the CSV file.
The row is returned as a vector of string objects.
This method should be called only if csv_parser::has_more_rows() is true
- See also:
- csv_parser::has_more_rows()
- csv_parser::get_record_count()
- csv_parser::reset_record_count()
- csv_parser::more_rows
- Returns:
- csv_row A vector type containing an array of strings
Definition at line 114 of file csv_parser.cpp.
bool has_more_rows | ( | void | ) |
Returns whether there is still more data.
This method returns a boolean value indicating whether or not there are still more records to be extracted in the current file being parsed.
Call this method to see if there are more rows to retrieve before invoking csv_parser::get_row()
- Returns:
- bool Returns true if there are still more rows and false if there is not.
Definition at line 294 of file csv_parser.h.
bool init | ( | const char * | input_filename ) |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
- This init method accepts a character array as the path to the csv file.
- It sets the value of the csv_parser::input_filename property.
- Then it creates a pointer to the csv_parser::input_fp property.
- Parameters:
-
[in] input_filename
- Returns:
- bool Returns true on success and false on error.
Definition at line 44 of file csv_parser.cpp.
bool init | ( | FILE * | input_file_pointer ) |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
This init method accepts a pointer to the CSV file that has been opened for reading
It also resets the file pointer to the beginning of the stream
- Parameters:
-
[in] input_file_pointer
- Returns:
- bool Returns true on success and false on error.
Definition at line 23 of file csv_parser.cpp.
void reset_record_count | ( | void | ) |
Resets the record_count internal attribute to zero.
This may be used if the object is reused multiple times.
- Returns:
- void
Definition at line 352 of file csv_parser.h.
void set_enclosed_char | ( | char | fields_enclosed_by, |
enclosure_type_t | enclosure_mode | ||
) |
Defines the Field Enclosure character used in the Text File.
Setting this to NULL means that the enclosure character is optional.
If the enclosure is optional, there could be fields that are enclosed, and fields that are not enclosed within the same line/record.
- Parameters:
-
[in] fields_enclosed_by The character used to enclose the fields. [in] enclosure_mode How the CSV file should be parsed.
- Returns:
- void
Definition at line 86 of file csv_parser.cpp.
void set_field_term_char | ( | char | fields_terminated_by ) |
Defines the Field Delimiter character used in the text file.
- Parameters:
-
[in] fields_terminated_by
- Returns:
- void
Definition at line 96 of file csv_parser.cpp.
void set_line_term_char | ( | char | lines_terminated_by ) |
Defines the Record Terminator character used in the text file.
- Parameters:
-
[in] lines_terminated_by
- Returns:
- void
Definition at line 105 of file csv_parser.cpp.
void set_skip_lines | ( | unsigned int | lines_to_skip ) |
Defines the number of records to discard.
The number of records specified will be discarded during the parsing process.
- See also:
- csv_parser::_skip_lines()
- csv_parser::get_row()
- csv_parser::has_more_rows()
- Parameters:
-
[in] lines_to_skip How many records should be skipped
- Returns:
- void
Definition at line 311 of file csv_parser.h.
Field Documentation
enclosed_char [protected] |
The enclosure character.
If present or used for a field it is assumed that both ends of the fields are wrapped.
This is that single character used in the document to wrap the fields.
- See also:
- csv_parser::_get_fields_without_enclosure()
- csv_parser::_get_fields_with_enclosure()
- csv_parser::_get_fields_with_optional_enclosure()
Definition at line 432 of file csv_parser.h.
enclosed_length [protected] |
Enclosure length.
This is the length of the enclosure character
Definition at line 489 of file csv_parser.h.
enclosure_type [protected] |
Mode in which the CSV file will be parsed.
The various values are explained below
- ENCLOSURE_NONE (1) means the CSV file does not use any enclosure characters for the fields
- ENCLOSURE_REQUIRED (2) means the CSV file requires enclosure characters for all the fields
- ENCLOSURE_OPTIONAL (3) means the use of enclosure characters for the fields is optional
- See also:
- csv_parser::get_row()
- csv_parser::_read_single_line()
- csv_parser::_get_fields_without_enclosure()
- csv_parser::_get_fields_with_enclosure()
- csv_parser::_get_fields_with_optional_enclosure()
Definition at line 577 of file csv_parser.h.
escaped_char [protected] |
The escape character.
For now the only valid escape character allowed is the backslash character 0x5C
This is only important when the enclosure character is required or optional.
This is the backslash character used to escape enclosure characters found within the fields.
- See also:
- csv_parser::_get_fields_with_enclosure()
- csv_parser::_get_fields_with_optional_enclosure()
Definition at line 449 of file csv_parser.h.
escaped_length [protected] |
The length of the escape character.
Right now this is really not being used.
It may be used in future versions of the object.
Definition at line 502 of file csv_parser.h.
field_term_char [protected] |
The field terminator.
This is the single character used to mark the end of a column in the text file.
Common characters used include the comma, tab, and semi-colons.
This is the single character used to separate fields within a record.
Definition at line 462 of file csv_parser.h.
field_term_length [protected] |
Length of the field terminator.
For now this is not being used. It will be used in future versions of the object.
Definition at line 511 of file csv_parser.h.
ignore_num_lines [protected] |
Number of records to discard.
This variable controls how many records in the file are skipped before parsing begins.
- See also:
- csv_parser::_skip_lines()
- csv_parser::set_skip_lines()
Definition at line 532 of file csv_parser.h.
input_filename [protected] |
Buffer to input file name.
This buffer is used to store the name of the file that is being parsed
Definition at line 558 of file csv_parser.h.
input_fp [protected] |
The CSV File Pointer.
This is the pointer to the CSV file
Definition at line 549 of file csv_parser.h.
line_term_char [protected] |
The record terminator.
This is the single character used to mark the end of a record in the text file.
The most popular one is the new line character however it is possible to use others as well.
This is the single character used to mark the end of a record
- See also:
- csv_parser::get_row()
Definition at line 477 of file csv_parser.h.
line_term_length [protected] |
Length of the record terminator.
For now this is not being used. It will be used in future versions of the object.
Definition at line 520 of file csv_parser.h.
more_rows [protected] |
There are still more records to parse.
This boolean property is an internal indicator of whether there are still records in the file to be parsed.
- See also:
- csv_parser::has_more_rows()
Definition at line 588 of file csv_parser.h.
record_count [protected] |
Number of times the get_row() method has been called.
- See also:
- csv_parser::get_row()
Definition at line 540 of file csv_parser.h.
Generated on Fri Jul 22 2022 13:54:56 by 1.7.2