<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Text Encoding in .NET Forum</title>
    <link>https://forums.autodesk.com/t5/net-forum/text-encoding/m-p/5213599#M43018</link>
    <description>&lt;P&gt;&lt;SPAN style="font-family: arial, helvetica, sans-serif;"&gt;Hi Khoa,&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: arial, helvetica, sans-serif;"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: arial, helvetica, sans-serif;"&gt;Thank you for the&amp;nbsp;code you posted. It works OK, except it makes the hardcoded assumption that the source string is Unicode Latin.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: arial, helvetica, sans-serif;"&gt;Among other things it means that running the code twice in succession on the same text results in gibberrish.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: arial, helvetica, sans-serif;"&gt;As the program I write is not interactive, it blindly processes all texts ant therefore it needs to know beforehand if the text's encoding has to be converted.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: arial, helvetica, sans-serif;"&gt;The lisp function I posted does just that (it looks for a certain range of chars).&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: arial, helvetica, sans-serif;"&gt;The problem is DBText.TextString always returns a Unicode string, seemingly irrespective of the string's encoding in the Autocad database, whereas, to do the same processing the lisp does, I need the ANSI chars, as Autocad sees it.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: arial, helvetica, sans-serif;"&gt;Do you know of a way to get the string as Autocad sees it and not converted to unicode in an encoding I don't controll?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: arial, helvetica, sans-serif;"&gt;One more problem is changing a text;s style from a Unicode font to a ANSI one. Again the .NET function results in problems while the lisp function is OK.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: arial, helvetica, sans-serif;"&gt;I even tried to run the lisp via P/Invoke, but there were no visible results. How does one debug through a lisp invoked from NET? Even (print) statements in the lisp wouldn't work, or do they?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: arial, helvetica, sans-serif;"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: arial, helvetica, sans-serif;"&gt;Thanks,&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: arial, helvetica, sans-serif;"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: arial, helvetica, sans-serif;"&gt;alex&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Thu, 14 Aug 2014 09:30:41 GMT</pubDate>
    <dc:creator>alex_b</dc:creator>
    <dc:date>2014-08-14T09:30:41Z</dc:date>
    <item>
      <title>Text Encoding</title>
      <link>https://forums.autodesk.com/t5/net-forum/text-encoding/m-p/5203295#M43014</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I&amp;nbsp;have some legacy drawings with texts using alegacy .shx font, with extended codes for the Hebrew alphabet.&lt;/P&gt;&lt;P&gt;The codes are from 0x80 thru 0x9a for the 26 Hebrew letters.&lt;/P&gt;&lt;P&gt;The texts display fine in R2004 and above.&lt;/P&gt;&lt;P&gt;Now if I&amp;nbsp;try extracting the DBText.TextString of the text object, I&amp;nbsp;get gibberish.&lt;/P&gt;&lt;P&gt;Tried everything to convert the string, without success.&lt;/P&gt;&lt;P&gt;I&amp;nbsp;have a lisp function which performs the needed conversion by adding 0x60 to each character value, but, while it works fine in lisp, it&amp;nbsp;fails in C#.&lt;/P&gt;&lt;P&gt;Am I missing something?&lt;/P&gt;&lt;P&gt;BTW, if I&amp;nbsp;change the text font to a Windows font, it will display the same gibberish and only after running the lisp function mentioned above will it display correctly and then, of course, the extracted string will be&amp;nbsp;OK too.&lt;/P&gt;&lt;P&gt;The following is the kind of string DBText.TextString returns on the original text object (it should be Hebrew, but it's obviously not):&lt;/P&gt;&lt;P&gt;316 „ˆ‘…˜‰&amp;#144; ‡” ‰…”‰–&lt;/P&gt;&lt;P&gt;It seems that TextString forces&amp;nbsp;the extended ASCII to Unicode and doing a poor job.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;alex&lt;/P&gt;</description>
      <pubDate>Fri, 08 Aug 2014 13:17:42 GMT</pubDate>
      <guid>https://forums.autodesk.com/t5/net-forum/text-encoding/m-p/5203295#M43014</guid>
      <dc:creator>alex_b</dc:creator>
      <dc:date>2014-08-08T13:17:42Z</dc:date>
    </item>
    <item>
      <title>Re: Text Encoding</title>
      <link>https://forums.autodesk.com/t5/net-forum/text-encoding/m-p/5207869#M43015</link>
      <description>&lt;P&gt;Hi Alex,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I guess you need to write your own C# code to loop for each char in a string, if its ASCII code &amp;gt;= 128 (0x80), and &amp;lt;= 154 (0x9a), then add its code with 96 (0x60).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;A&amp;nbsp;bigfont with .shx file was introduced with legacy AutoCAD before Unicode time.&amp;nbsp;Therefore its encoding is not matched with Unicode. We may need conversion from .shx to .ttf for character ASCII code &amp;gt;= 128. See the below test code:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;public static void TestConvertShxToUnicode()
{
    string text = "˜‰&amp;#144; ‡"; // DBText.TextString
    int convertCode = 96;
    text = ConvertShxToUnicode(text, convertCode);
}

private static string ConvertShxToUnicode(string text, int convertCode, int startCode = 128)
{
    var result = new System.Text.StringBuilder();
    foreach (char c in text)
    {
        char ch = c;
        if (c &amp;gt;= startCode)
        {
            ch = (char)((int)c + convertCode);
        }
        result.Append(ch);
    }
    return result.ToString();
}&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;You should provide a simple test drawing with&amp;nbsp;Hebrew letters and their .shx font to test.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 11 Aug 2014 20:51:01 GMT</pubDate>
      <guid>https://forums.autodesk.com/t5/net-forum/text-encoding/m-p/5207869#M43015</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2014-08-11T20:51:01Z</dc:date>
    </item>
    <item>
      <title>Re: Text Encoding</title>
      <link>https://forums.autodesk.com/t5/net-forum/text-encoding/m-p/5209575#M43016</link>
      <description>&lt;P&gt;&lt;SPAN style="font-family: arial, helvetica, sans-serif;"&gt;Hi Khoa,&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: arial, helvetica, sans-serif;"&gt;This is basically just what i'm doing and it surely doesn't work.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: arial, helvetica, sans-serif;"&gt;I'm trying to just add 0x60 to each char between 0x80 and 0x9a.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: arial, helvetica, sans-serif;"&gt;Curiously enough, the following lisp code works and the result is as expected:&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;(defun heb2win (ent) ;entity ent&lt;BR /&gt;(setq ed (entget ent)&lt;BR /&gt;textval (cdr (assoc 1 ed))&lt;BR /&gt;slen (strlen textval)&lt;BR /&gt;lptr 1&lt;BR /&gt;txtout ""&lt;BR /&gt;)&lt;BR /&gt;(while (&amp;lt;= lptr slen)&lt;BR /&gt;(setq tcr (ascii (substr textval lptr 1)))&lt;BR /&gt;(if (and (&amp;gt;= tcr 128) (&amp;lt;= tcr 154)) (setq tcr (+ tcr 96)))&lt;BR /&gt;(setq txtout (strcat txtout (chr tcr)))&lt;BR /&gt;(setq lptr (+ lptr 1))&lt;BR /&gt;);;while&lt;BR /&gt;(setq ed (entget e)&lt;BR /&gt;ed (subst (cons 1 txtout) (assoc 1 ed) ed)&lt;BR /&gt;)&lt;BR /&gt;(entmod ed)&lt;BR /&gt;(entupd e)&lt;BR /&gt;)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: arial, helvetica, sans-serif;"&gt;I think the framework does some behind-the-scenes guesswork based on the system code page maybe, which alters the string returned by DBText.TextString,&amp;nbsp;while lisp returns the raw&amp;nbsp;chars.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: arial, helvetica, sans-serif;"&gt;I tried using differrent encodings, and the byte array obtained from the string varies with the encoding, sometimes arratically.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: arial, helvetica, sans-serif;"&gt;I attach a sample drawing and the two relevant .shx fonts; the third font used is Arial.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: arial, helvetica, sans-serif;"&gt;The drawing contains three copies of the same text, each one based on a differrent font.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: arial, helvetica, sans-serif;"&gt;As you can see, the original text, based on the old font is legible, the other two are not because of the missing 0x60 shift.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: arial, helvetica, sans-serif;"&gt;If you run the lsip routine, it fixes things, while in C# I fail.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: arial, helvetica, sans-serif;"&gt;Thank you,&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: arial, helvetica, sans-serif;"&gt;alex&lt;/SPAN&gt;&lt;/P&gt;&lt;PRE&gt;&lt;SPAN style="font-family: arial, helvetica, sans-serif;"&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/PRE&gt;</description>
      <pubDate>Tue, 12 Aug 2014 16:14:59 GMT</pubDate>
      <guid>https://forums.autodesk.com/t5/net-forum/text-encoding/m-p/5209575#M43016</guid>
      <dc:creator>alex_b</dc:creator>
      <dc:date>2014-08-12T16:14:59Z</dc:date>
    </item>
    <item>
      <title>Re: Text Encoding</title>
      <link>https://forums.autodesk.com/t5/net-forum/text-encoding/m-p/5210247#M43017</link>
      <description>&lt;P&gt;Here is the code to convert your texts from Unicode Latin to Hebrew. The .NET framework does a great job on conversion. Credit to StackOverflow at the &lt;A target="_blank" href="http://stackoverflow.com/questions/17001315/converting-text-to-hebrew-characters-c-sharp-winfrom-encoding"&gt;link&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;[CommandMethod("ConvertLatinToHebrew")]
public static void ConvertLatinToHebrew()
{
    Document doc = Application.DocumentManager.MdiActiveDocument;
    Editor editor = doc.Editor;
    Database db = doc.Database;
    using (Transaction trans = db.TransactionManager.StartTransaction())
    {
        var peo = new PromptEntityOptions("Select a text: ");
        peo.SetRejectMessage("\nSelect only text");
        peo.AddAllowedClass(typeof(DBText), true);
        PromptEntityResult result = editor.GetEntity(peo);
        if (result.Status == PromptStatus.OK)
        {
            ObjectId id = result.ObjectId;
            var text = (DBText)trans.GetObject(id, OpenMode.ForWrite);
            string value = text.TextString;

            value = ConvertLatinToHebrew(value);

            text.TextString = value;
            text.DowngradeOpen();
        }
        trans.Commit();
    }
}

[CommandMethod("ConvertLatinToHebrewTest")]
public static void ConvertLatinToHebrewTest()
{
    string latinText = "„ˆ‘…˜‰&amp;#144; ‡” ‰…”‰–";
    string hebrewText = ConvertLatinToHebrew(latinText);
    // hebrewText = "הטסורינ חפ יופיצ";
}

public static string ConvertLatinToHebrew(string latinText)
{
    Encoding latinEncoding = Encoding.GetEncoding("Windows-1252");
    Encoding hebrewEncoding = Encoding.GetEncoding(862); // MS-DOS Hebrew

    byte[] latinBytes = latinEncoding.GetBytes(latinText);

    string hebrewText = hebrewEncoding.GetString(latinBytes);
    return hebrewText;
}&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I did try many different ways to encode the text to Hebrew letters and could not&amp;nbsp;find&amp;nbsp;the&amp;nbsp;rule to make mathematic calculations.&amp;nbsp;The ASCII number of characters in the previous code does not help when&amp;nbsp;they are&amp;nbsp;extended Unicode characters. Anyway, .NET helps the conversion to become easier.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 12 Aug 2014 20:53:27 GMT</pubDate>
      <guid>https://forums.autodesk.com/t5/net-forum/text-encoding/m-p/5210247#M43017</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2014-08-12T20:53:27Z</dc:date>
    </item>
    <item>
      <title>Re: Text Encoding</title>
      <link>https://forums.autodesk.com/t5/net-forum/text-encoding/m-p/5213599#M43018</link>
      <description>&lt;P&gt;&lt;SPAN style="font-family: arial, helvetica, sans-serif;"&gt;Hi Khoa,&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: arial, helvetica, sans-serif;"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: arial, helvetica, sans-serif;"&gt;Thank you for the&amp;nbsp;code you posted. It works OK, except it makes the hardcoded assumption that the source string is Unicode Latin.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: arial, helvetica, sans-serif;"&gt;Among other things it means that running the code twice in succession on the same text results in gibberrish.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: arial, helvetica, sans-serif;"&gt;As the program I write is not interactive, it blindly processes all texts ant therefore it needs to know beforehand if the text's encoding has to be converted.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: arial, helvetica, sans-serif;"&gt;The lisp function I posted does just that (it looks for a certain range of chars).&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: arial, helvetica, sans-serif;"&gt;The problem is DBText.TextString always returns a Unicode string, seemingly irrespective of the string's encoding in the Autocad database, whereas, to do the same processing the lisp does, I need the ANSI chars, as Autocad sees it.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: arial, helvetica, sans-serif;"&gt;Do you know of a way to get the string as Autocad sees it and not converted to unicode in an encoding I don't controll?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: arial, helvetica, sans-serif;"&gt;One more problem is changing a text;s style from a Unicode font to a ANSI one. Again the .NET function results in problems while the lisp function is OK.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: arial, helvetica, sans-serif;"&gt;I even tried to run the lisp via P/Invoke, but there were no visible results. How does one debug through a lisp invoked from NET? Even (print) statements in the lisp wouldn't work, or do they?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: arial, helvetica, sans-serif;"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: arial, helvetica, sans-serif;"&gt;Thanks,&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: arial, helvetica, sans-serif;"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: arial, helvetica, sans-serif;"&gt;alex&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 14 Aug 2014 09:30:41 GMT</pubDate>
      <guid>https://forums.autodesk.com/t5/net-forum/text-encoding/m-p/5213599#M43018</guid>
      <dc:creator>alex_b</dc:creator>
      <dc:date>2014-08-14T09:30:41Z</dc:date>
    </item>
  </channel>
</rss>

