java - Extracting Double Byte Characters/substring from a UTF-8 formatted String -
i'm trying extract emojis , other special characters strings further processing (e.g. string contains '😅' 1 of characters).
but neither string.charat(i)
nor string.substring(i, i+1)
work me. original string formatted in utf-8 , means, escaped form of above emoji encoded '\ud83d\ude05'. that's why receive '?' (\ud83d) , '?' (\ude05) instead position, causing @ 2 positions when iterating on string.
does have solution problem?
thanks john kugelman help. solution looks now:
for(int codepoint : codepoints(string)) { char[] chars = character.tochars(codepoint); system.out.println(codepoint + " : " + string.copyvalueof(chars)); }
with codepoints(string string)-method looking this:
private static iterable<integer> codepoints(final string string) { return new iterable<integer>() { public iterator<integer> iterator() { return new iterator<integer>() { int nextindex = 0; public boolean hasnext() { return nextindex < string.length(); } public integer next() { int result = string.codepointat(nextindex); nextindex += character.charcount(result); return result; } public void remove() { throw new unsupportedoperationexception(); } }; } }; }
Comments
Post a Comment