c++ - Tokenize sentence into words, considering special characters -
i have function receive sentence, , tokenize words, based on space " ". now, want improve function eliminate special characters, example:
i boy. => {i, am, a, boy}, no period after "boy" said :"are ok?" => {i, said, are, you, ok}, no question , quotation mark
the original function here, how can improve it?
void tokenize(const string& str, vector<string>& tokens, const string& delimiters = " ") { string::size_type lastpos = str.find_first_not_of(delimiters, 0); string::size_type pos = str.find_first_of(delimiters, lastpos); while (string::npos != pos || string::npos != lastpos) { tokens.push_back(str.substr(lastpos, pos - lastpos)); lastpos = str.find_first_not_of(delimiters, pos); pos = str.find_first_of(delimiters, lastpos); } }
Comments
Post a Comment