java - Extracting Double Byte Characters/substring from a UTF-8 formatted String -


i'm trying extract emojis , other special characters strings further processing (e.g. string contains '😅' 1 of characters).

but neither string.charat(i) nor string.substring(i, i+1) work me. original string formatted in utf-8 , means, escaped form of above emoji encoded '\ud83d\ude05'. that's why receive '?' (\ud83d) , '?' (\ude05) instead position, causing @ 2 positions when iterating on string.

does have solution problem?

thanks john kugelman help. solution looks now:

for(int codepoint : codepoints(string)) {          char[] chars = character.tochars(codepoint);         system.out.println(codepoint + " : " + string.copyvalueof(chars));      } 

with codepoints(string string)-method looking this:

private static iterable<integer> codepoints(final string string) {     return new iterable<integer>() {         public iterator<integer> iterator() {             return new iterator<integer>() {                 int nextindex = 0;                  public boolean hasnext() {                     return nextindex < string.length();                 }                  public integer next() {                     int result = string.codepointat(nextindex);                     nextindex += character.charcount(result);                     return result;                 }                  public void remove() {                     throw new unsupportedoperationexception();                 }             };         }     }; } 

Comments

Popular posts from this blog

c# - Validate object ID from GET to POST -

node.js - Custom Model Validator SailsJS -

php - Find a regex to take part of Email -