port of Israel Ekpo\'s CSV parser library

Dependents:   parser_sample IoTGateway_Basic

Embed: (wiki syntax)

« Back to documentation index

csv_parser Class Reference

csv_parser Class Reference

The csv_parser object. More...

#include <csv_parser.h>

Public Member Functions

 csv_parser ()
 Class constructor.
 ~csv_parser ()
 Class destructor.
bool init (FILE *input_file_pointer)
bool init (const char *input_filename)
void set_enclosed_char (char fields_enclosed_by, enclosure_type_t enclosure_mode)
 Defines the Field Enclosure character used in the Text File.
void set_field_term_char (char fields_terminated_by)
 Defines the Field Delimiter character used in the text file.
void set_line_term_char (char lines_terminated_by)
 Defines the Record Terminator character used in the text file.
bool has_more_rows (void)
 Returns whether there is still more data.
void set_skip_lines (unsigned int lines_to_skip)
 Defines the number of records to discard.
csv_row get_row (void)
 Return the current row from the CSV file.
unsigned int get_record_count (void)
 Returns the number of times the csv_parser::get_row() method has been invoked.
void reset_record_count (void)
 Resets the record_count internal attribute to zero.

Protected Attributes

char enclosed_char
 The enclosure character.
char escaped_char
 The escape character.
char field_term_char
 The field terminator.
char line_term_char
 The record terminator.
unsigned int enclosed_length
 Enclosure length.
unsigned int escaped_length
 The length of the escape character.
unsigned int field_term_length
 Length of the field terminator.
unsigned int line_term_length
 Length of the record terminator.
unsigned int ignore_num_lines
 Number of records to discard.
unsigned int record_count
 Number of times the get_row() method has been called.
FILE * input_fp
 The CSV File Pointer.
char * input_filename
 Buffer to input file name.
enclosure_type_t enclosure_type
 Mode in which the CSV file will be parsed.
bool more_rows
 There are still more records to parse.

Detailed Description

The csv_parser object.

Used to parse text files to extract records and fields.

We are making the following assumptions :

  • The record terminator is only one character in length.
  • The field terminator is only one character in length.
  • The fields are enclosed by single characters, if any.
  • The parser can handle documents where fields are always enclosed, not enclosed at all or optionally enclosed.
  • When fields are strictly all enclosed, there is an assumption that any enclosure characters within the field are escaped by placing a backslash in front of the enclosure character.

The CSV files can be parsed in 3 modes.

  • (a) No enclosures
  • (b) Fields always enclosed.
  • (c) Fields optionally enclosed.

For option (c) when the enclosure character is optional, if an enclosure character is spotted at either the beginning or the end of the string, it is assumed that the field is enclosed.

The csv_parser::init() method can accept a character array as the path to the CSV file. Since it is overloaded, it can also accept a FILE pointer to a stream that is already open for reading.

The set_enclosed_char() method accepts the field enclosure character as the first parameter and the enclosure mode as the second parameter which controls how the text file is going to be parsed.

See also:
csv_parser::set_enclosed_char()
enclosure_type_t
Author:
Israel Ekpo <israel.ekpo@israelekpo.com>

Definition at line 177 of file csv_parser.h.


Constructor & Destructor Documentation

csv_parser (  )

Class constructor.

This is the default constructor.

All the internal attributes are initialized here

  • The enclosure character is initialized to NULL 0x00.
  • The escape character is initialized to the backslash character 0x5C.
  • The field delimiter character is initialized to a comma 0x2C.
  • The record delimiter character is initialized to a new line character 0x0A.
  • The lengths of all the above-mentioned fields are initialized to 0,1,1 and 1 respectively.
  • The number of records to ignore is set to zero.
  • The more_rows internal attribute is set to false.
  • The pointer to the CSV input file is initialized to NULL
  • The pointer to the buffer for the file name is also initialized to NULL

Definition at line 200 of file csv_parser.h.

~csv_parser (  )

Class destructor.

In the class destructor the file pointer to the input CSV file is closed and the buffer to the input file name is also deallocated.

See also:
csv_parser::input_fp
csv_parser::input_filename

Definition at line 219 of file csv_parser.h.


Member Function Documentation

unsigned int get_record_count ( void   )

Returns the number of times the csv_parser::get_row() method has been invoked.

See also:
csv_parser::reset_record_count()
Returns:
unsigned int The number of times the csv_parser::get_row() method has been invoked.

Definition at line 338 of file csv_parser.h.

csv_row get_row ( void   )

Return the current row from the CSV file.

The row is returned as a vector of string objects.

This method should be called only if csv_parser::has_more_rows() is true

See also:
csv_parser::has_more_rows()
csv_parser::get_record_count()
csv_parser::reset_record_count()
csv_parser::more_rows
Returns:
csv_row A vector type containing an array of strings

Definition at line 114 of file csv_parser.cpp.

bool has_more_rows ( void   )

Returns whether there is still more data.

This method returns a boolean value indicating whether or not there are still more records to be extracted in the current file being parsed.

Call this method to see if there are more rows to retrieve before invoking csv_parser::get_row()

See also:
csv_parser::get_row()
csv_parser::more_rows
Returns:
bool Returns true if there are still more rows and false if there is not.

Definition at line 294 of file csv_parser.h.

bool init ( const char *  input_filename )

This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.

Parameters:
[in]input_filename
Returns:
bool Returns true on success and false on error.

Definition at line 44 of file csv_parser.cpp.

bool init ( FILE *  input_file_pointer )

This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.

This init method accepts a pointer to the CSV file that has been opened for reading

It also resets the file pointer to the beginning of the stream

Parameters:
[in]input_file_pointer
Returns:
bool Returns true on success and false on error.

Definition at line 23 of file csv_parser.cpp.

void reset_record_count ( void   )

Resets the record_count internal attribute to zero.

This may be used if the object is reused multiple times.

See also:
csv_parser::record_count
csv_parser::get_record_count()
Returns:
void

Definition at line 352 of file csv_parser.h.

void set_enclosed_char ( char  fields_enclosed_by,
enclosure_type_t  enclosure_mode 
)

Defines the Field Enclosure character used in the Text File.

Setting this to NULL means that the enclosure character is optional.

If the enclosure is optional, there could be fields that are enclosed, and fields that are not enclosed within the same line/record.

Parameters:
[in]fields_enclosed_byThe character used to enclose the fields.
[in]enclosure_modeHow the CSV file should be parsed.
Returns:
void

Definition at line 86 of file csv_parser.cpp.

void set_field_term_char ( char  fields_terminated_by )

Defines the Field Delimiter character used in the text file.

Parameters:
[in]fields_terminated_by
Returns:
void

Definition at line 96 of file csv_parser.cpp.

void set_line_term_char ( char  lines_terminated_by )

Defines the Record Terminator character used in the text file.

Parameters:
[in]lines_terminated_by
Returns:
void

Definition at line 105 of file csv_parser.cpp.

void set_skip_lines ( unsigned int  lines_to_skip )

Defines the number of records to discard.

The number of records specified will be discarded during the parsing process.

See also:
csv_parser::_skip_lines()
csv_parser::get_row()
csv_parser::has_more_rows()
Parameters:
[in]lines_to_skipHow many records should be skipped
Returns:
void

Definition at line 311 of file csv_parser.h.


Field Documentation

enclosed_char [protected]

The enclosure character.

If present or used for a field it is assumed that both ends of the fields are wrapped.

This is that single character used in the document to wrap the fields.

See also:
csv_parser::_get_fields_without_enclosure()
csv_parser::_get_fields_with_enclosure()
csv_parser::_get_fields_with_optional_enclosure()

Definition at line 432 of file csv_parser.h.

enclosed_length [protected]

Enclosure length.

This is the length of the enclosure character

See also:
csv_parser::csv_parser()
csv_parser::set_enclosed_char()

Definition at line 489 of file csv_parser.h.

enclosure_type [protected]

Mode in which the CSV file will be parsed.

The various values are explained below

  • ENCLOSURE_NONE (1) means the CSV file does not use any enclosure characters for the fields
  • ENCLOSURE_REQUIRED (2) means the CSV file requires enclosure characters for all the fields
  • ENCLOSURE_OPTIONAL (3) means the use of enclosure characters for the fields is optional
See also:
csv_parser::get_row()
csv_parser::_read_single_line()
csv_parser::_get_fields_without_enclosure()
csv_parser::_get_fields_with_enclosure()
csv_parser::_get_fields_with_optional_enclosure()

Definition at line 577 of file csv_parser.h.

escaped_char [protected]

The escape character.

For now the only valid escape character allowed is the backslash character 0x5C

This is only important when the enclosure character is required or optional.

This is the backslash character used to escape enclosure characters found within the fields.

See also:
csv_parser::_get_fields_with_enclosure()
csv_parser::_get_fields_with_optional_enclosure()

Definition at line 449 of file csv_parser.h.

escaped_length [protected]

The length of the escape character.

Right now this is really not being used.

It may be used in future versions of the object.

Definition at line 502 of file csv_parser.h.

field_term_char [protected]

The field terminator.

This is the single character used to mark the end of a column in the text file.

Common characters used include the comma, tab, and semi-colons.

This is the single character used to separate fields within a record.

Definition at line 462 of file csv_parser.h.

field_term_length [protected]

Length of the field terminator.

For now this is not being used. It will be used in future versions of the object.

Definition at line 511 of file csv_parser.h.

ignore_num_lines [protected]

Number of records to discard.

This variable controls how many records in the file are skipped before parsing begins.

See also:
csv_parser::_skip_lines()
csv_parser::set_skip_lines()

Definition at line 532 of file csv_parser.h.

input_filename [protected]

Buffer to input file name.

This buffer is used to store the name of the file that is being parsed

Definition at line 558 of file csv_parser.h.

input_fp [protected]

The CSV File Pointer.

This is the pointer to the CSV file

Definition at line 549 of file csv_parser.h.

line_term_char [protected]

The record terminator.

This is the single character used to mark the end of a record in the text file.

The most popular one is the new line character however it is possible to use others as well.

This is the single character used to mark the end of a record

See also:
csv_parser::get_row()

Definition at line 477 of file csv_parser.h.

line_term_length [protected]

Length of the record terminator.

For now this is not being used. It will be used in future versions of the object.

Definition at line 520 of file csv_parser.h.

more_rows [protected]

There are still more records to parse.

This boolean property is an internal indicator of whether there are still records in the file to be parsed.

See also:
csv_parser::has_more_rows()

Definition at line 588 of file csv_parser.h.

record_count [protected]

Number of times the get_row() method has been called.

See also:
csv_parser::get_row()

Definition at line 540 of file csv_parser.h.