[Clam-devel] Addressing foreign text encodings: a call for testing

David García Garzón dgarcia at iua.upf.edu
Mon Jul 7 02:25:49 PDT 2008


That highlights the problem i expected but definitelly a crash was not the 
symptom i expected. When reporting crashes is important to paste, at least, 
the console output. It could be the error we expected but i could be a 
different one so .

If i load the xml you sent i get, not a crash but an error message stating 
that:
An occurred while loading the network file.
XML Parser Errors: Fatal Error at file CLAMParser, line 1, col 39: An 
exception occurred! Type:UTFDataFormatException, Message:invalid byte 2 (�) 
of a 2-byte sequence

This is not a crash, but an expected error message. A crash is never a good 
behaviour, an error message is.

From the xml you send it is clear that the encoding has been missed. So, we 
should play with three formats, the Qt internal encoding, the local 8 bit 
format and utf8.

We have two options:
- Using utf8 as CLAM internal encoding, in that case we should use a 
conversion to local 8 bits whenever we use it in the c standard library (i 
guess that it includes filenames used in Xerces, libxml++, libmad...).
- Using local encoding and doing the conversion whenever we are storing it in 
XML.

The later is simpler to implement, as we should do modifications to the XML 
formating, but my few experiences with unicode tells me that using utf8 as 
lingua franca for the inner application is a good option as i think (not 
sure) that other 8bit encoding might not be c standard lib safe (use the 0 
not just to indicate the end of a string, for example)

Any opinions? Any unicode experiences?



On Dilluns 07 Juliol 2008, JunJun wrote:
> - Recompile the last svn revision of the NetworkEditor
>    QTDIR=D:/qt/4.3.3/ scons clam_prefix=d:/mingw/local
> prefix=d:/mingw/local external_dll_path=d:/mingw/local/bin An unexpected
> error is showed: "can not locate the program input point _ZN4...SsEE on the
> clam_core.dll" I just fix it by copy the NetworkEditor.exe and paste to the
> path of /mingw/local/lib
>
> - Open it and drop a MonoAudioFileReader into the canvas
> - Configure it to take a file which has some special characters in your
> language
>   ../朋克punk.wav
> - If the processing is still in red after clicking ok, you got the bug,
> report It's not in red after clicking "ok".
> - Open the configuration dialog again if the special symbol is being
> displayed wrongly, report
>   It displays just fine.
> - Accept the configuration again, if now red, report
>   Still no problem.
> - Save the network and load it again, if now red, report
>   When I load it again, the NetworkEditor crashes!!
> - Configure the processing, if the symbol now looks bad, report
>   TBD...
> - In any case, send me the network file so i can check the file encoding.
> If you open it with an encoding aware editor, the symbols should look well
> in utf8 mode.
>   No, I think it doesn't look well in utf8 mode.



-- 
David García Garzón
(Work) dgarcia at iua dot upf anotherdot es
http://www.iua.upf.edu/~dgarcia
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.clam-project.org/pipermail/clam-devel-clam-project.org/attachments/20080707/3595ea38/attachment-0001.sig>


More information about the clam-devel mailing list