Emoticons are those symbols that are commonly used on mobile phones but aren’t always recognised on all platforms.
For example, when converting tweets to @delta (Delta Airlines), I got the following error:
Error in tolower(text) : invalid input [email protected]: First time I've seen a foot-rest in first class! Oh @Delta, how I love thee \ud83d\ude0a✈\ud83d\udc78 http://t.co/noKI9CiM' in 'utf8towcs'
When I looked up the actual tweet, it looked liked this.
The two unicode characters that weren’t recognised were \ud83d\ude0a (SMILING FACE WITH SMILING EYES) and \ud83d\udc78 (PRINCESS).
Gaston Sanchez has posted a solution to this problem in his blog Data Analysis Visually Enforced. I’ve used the code and it works well. When I have time, I’ll extend it to replace the offending characters instead of returning NA for the entire string.