& Construction

Integrated BIM tools, including Revit, AutoCAD, and Civil 3D
& Manufacturing

Professional CAD/CAM tools built on Inventor and AutoCAD
Integrated BIM tools, including Revit, AutoCAD, and Civil 3D
Professional CAD/CAM tools built on Inventor and AutoCAD
I did a test on the OCR functionality, see picture below. I wanted to document how use of different PDF printers influence how the OCR works. All the uploaded documents were 100% identical, except for one special case where no non-English characters were to be recognized.
Test:
Conclusion:
The OCR does not seem to have a problem with recognizing non-English characters in drawings printed from AutoCAD or Revit, using Foxit or Bluebeam.
However, using Autodesk's own DWG to PDF writer, fields that include non-English characters become unrecognizable.
1. Are there any plans to fix this issue?
2. Has this behavior been documented using other PDF writers, such as Acrobat?
Hello
Thank you for your post.
Would it be possible to get your files so we can take a look please?
We use 2 ways to get the text.
If the drawing is a vector we don’t use OCR, we just extract the provided text. (As you can mimic yourself on a vector drawing by selecting text then copy paste to another document)
In that case language doesn’t matter in extraction terms because we just take whatever is there.
If the drawing is raster (picture) then we need to use OCR to try to recognize it first and extract. Currently the OCR language is defined by the currently set browser language.
So for non English you need to change your browser language to be the same as the text language you want to extract.
(we are looking at improving that experience in the future but date is not set yet)
So it could be that your first tests were vector if non English worked ok, but the last test was raster which made it fail.
Or we may have an issue to address...
Either way please send us the info and we will investigate.
Thanks, Ian Turner
Thank you for the reply.
My browser language was set to English in all cases.
Actually, the browser language didn't seem to matter in my case because the text was recognized from both vector and raster PDFs.
It was only in the case of a PDF file prepared with the "DWG to PDF" printer in AutoCAD that the non-English characters were not recognized which resulted in the whole string were they appeared to come out corrupted. Curiously that particular PDF file is vector based.
I'll PM you a zip-file with all the files.
Hi,
Thanks eij for the testing.
I ran into the same issue on multiple occasions with French characters. Could we get an update on that issue please?
I uploaded the same drawing set (all vector based) multiple times on different test projects to see how it varied from one upload to another and the problematic sheets were never the same. I then tried to reprint the said PDF with a PDF printer (CutePDF) and then it looks like it worked, so eij's theory about the PDF exporter might be a good guess.
Thanks.
Thanks for the follow up,
I sent the files to Ian Turner in November but I haven't heard back from him.
It would be good to get an update from Autodesk on this matter.
Thanks.
Hello Eij
I apologize if i am wrong but I don't remember receiving any files.
How did you send originally and can you please resend or attach here?
Thanks, Ian
Hi @Anonymous,
I’m just following up to see if you were able to gather the information @ian_turner had previously requested to assist him in troubleshoot your issue.
Thank you and have a great day!
Hi all,
I sent all the relevant files to Ian via private message last November, see screenshot below.
Thank you very much for sharing the PDF files with us. The pdf file that cannot extract text includes the font with the encoding "Encoding: Identity-H".
We are investigating how to improve on this scenario. On the other hand, before we find proper solution, please use "Build-in" encoding.
Reference:
https://forums.adobe.com/thread/758316
Thanks and best regards,
Jason Jiang
Autodesk Document Management team
I'm just checking in to see if you need more help with this. Did the suggestion that @Jason_Kai_Jiang provided work for you?
If so, please click Accept as Solution on the posts that helped you so others in the community can find them easily.
Hi @anil_mistry,
Depends on your definition of solution 😛 To be fair, as your main customers (contractors) are at the receiving end of the design, the proposed workaround is quite cumbersome since they have to ask professionnals to modify the way they publish their drawings -which they might not- or they alternatively have to re-print the drawings on their side to be able to use the platform. I'll let you be the judge.
Also, I can't accept the suggestion as a solution as I'm not the one who submitted the initial question.
Thanks for the support.
Hello,
I agree with @vincent.carignan, the workaround is not really practical.
Besides, I haven't found a way to change the encoding using the DWGtoPDF printer that ships with AutoCAD.
For now I think the only thing we can do is to ask our clients/partners to use one of the PDF printers that we've found out that work. As @vincent.carignan points out, they might not do that anyway, so this is definitely still an issue.
Thanks
Hi @vincent.carignan and @Anonymous,
Thanks for your comments. We will continue to investigate resolve the encoding issue without involving reprint the PDF files.
Thanks and best regards,
Jason Jiang
Autodesk BIM 360 Document Management team
How to buy
Privacy | Do not sell or share my personal information | Cookie preferences | Report noncompliance | Terms of use | Legal | © 2025 Autodesk Inc. All rights reserved
Type a product name