Sunday, August 2, 2009

In C, how can I convert from a non-utf8 character string containing Chinese charcters into utf8?

I have a string of data that usually is 7-bit ASCII (latin-US) but occassionally contains characters with the high-bit set for multibyte Chinese (standard) characters. It could actually be any language but it is most commonly Chinese. I need to put this data into a database that uses utf8 natively. I am programming in C++. Would you use iconv to convert this to utf8? Or would you roll your own code? What is the most likely encoding I'm dealing with here?

In C, how can I convert from a non-utf8 character string containing Chinese charcters into utf8?
I would "roll my own code" out of lack of knowing a better way. It is a matter of adding a constant value to the ascii value and expanding the datatype size which seems simple enough to implement yourself. Good luck!


No comments:

Post a Comment