UTF-8 in IRC

I’ve read this post of Q-FUNK.

Some thoughts about that:

What if someone recycled Gaim’s encoding detection bits into an IRC server patch that does the reverse action: order anyone whose client outputs non-UTF encoded 8-bit to reconfigure their character-set to UTF-8?

That’s the easy part. ICONV does charset conversion, and in case input or output charset is wrong, it exits with an error, or converts every unsuitable character into a questionmark. Then, I guess, you can simply use something like this:

if ( iconv("utf-8", "utf-8", $string1) ) { return "it's UTF-8"; }
else { return "it's not UTF-8"; }

at least that’s how that detection would look in PHP. In other languages, it’s no harder.

But this approach has a few disadvantages:

  • That would require more resources from the server. Here in Lithuania we have an irc network of 13 servers, with a maximum of 20599 users. That makes most of those servers capable of handling approx. 2000 users simultaniously. I guess it would require quite a bit more of resources for iconv to validate each and every message that gets passed to the server.
  • As long as mIRC (and most other IRC clients out there) doesn’t have UTF-8 support, people won’t take you seriously with such messages. They will simply run alternative networks with alternative servers and still use their one-byte charsets. You can’t force them to change.

IMHO, what we can do, is bug Khaled to implement I18N support. I’ve just searched through mIRC‘s suport forums, and came up on this thread. What’s interesting in it is the last message. Here’s an excerpt:

I’m also all for mirc supporting UTF-8 – I think it’s vital to the progress of IRC to use UTF-8, rather than continue to squabble over a pile of incompatible character sets which have been relatively obsoleted by Unicode.

It’s got to happen sometime, and hopefully mirc will recognise the CHARSET token on the 005 numeric (see http://www.irc.org/tech_docs/005.html).

Yes, the protocol of IRC does know something about multilanguage. I think that is the key. If Khaled would add support for that, and irc daemons would announce the charset in which they output their messages, and irc clients fould then use same charset for their messages, and that charset would be UTF-8, most, if not all, of the problems would go away. So, is that the light at the end of the tunnel?

4 komentarai

  1. i’ve read the thread. That’s where I took the last excerpt from. however, as far as I understand, 005 CHARSET is ment to announce the charset in which server messages are output, not the charset in which people are REQUIRED to chat. That would be good, if an IRC client used that charset however…

  2. It would seem that this is referring to draft-brocklesby-irc-isupport-03 (‘IRC RPL_ISUPPORT Numeric Definition’), which appears not to have made it (it expired july 5th, 2004). I could be wrong there though.

Parašykite komentarą

El. pašto adresas nebus skelbiamas. Būtini laukeliai pažymėti *