Some imported PDF documents may return garbled text when you view them in the parsing rule editor or process them with existing parsing rules. When you see unreadable gibberish symbols as shown in the screenshot below, you are likely dealing with a corrupted PDF file.

More specifically, your PDF document is probably missing important information about font character mapping. The reason for this can be that the document was produced incorrectly. Another common reason is that the character mapping information was deliberately obfuscated as a protection mechanism to prevent the reader to 'copy & paste' the text data. Lastly, it is also possible that Optical Character Recognition (OCR) with low accuracy was applied to your document before uploading it to Docparser.

Hence, such a hardware problem makes File Access more difficult. Also, your File System details can go on a damaged sector. And you already know what happens when File System becomes unreadable. You lose access to your files and the corrupt data errors pop up everywhere. Cause 4: Not clicking 'Safely Remove' option when ejecting your USB.

In either way, it is unfortunately technically not possible to simply 'fix' the document and restore the original text. Luckily, there is a work-around in Docparser that will give you near-perfect results.

It is font specific issue. You can check the fonts of both the documents (the original and the one we have shared) Open the PDF in Reader and go to File Properties Fonts. In this case, you will need to ask the PDF creator to embed the font in the file. Option 3 - Print to PDF from PDF software. If your unreadable file is a document made of several other types of files combined into one (like an appellate appendix), this is the easiest fix. However, it relies on you having PDF creation software. Print, and choose 'Adobe PDF,' 'Acrobat Distiller,' 'Microsoft Print to PDF' (if you have Windows.

To fix unreadable text issues, go to the Preprocessing settings inside of your Document Parser (SETTINGS > PREPROCESSING) and set the option 'Perform OCR' to 'Yes - always perform OCR' as shown in the screenshot below.

Setting this option to 'Yes - always perform OCR' will convert your documents to an image file and then apply Optical Character Recognition (OCR). This means that we create a completely new text document based on the visual appearance of your original file. The new file will contain an image of your original document alongside a new (invisible) text layer with a correct character encoding. Once you enable this option, all newly uploaded documents will be sent to our OCR engine and the text should show up correctly.

PS - If you open your original document in Adobe Reader (or Mac Preview) and attempt to copy and paste the same text, you will probably run into the same issues. If the text does not paste as gibberish, please send your document to our support staff and we'll get back to you with a more detailed analysis.

Ever since this latest Windows update - I have hundreds of PDF files that WILL NOT OPEN. (I have Windows 10 Professional)

In Adobe, I get a message that says it's 'either not a supported file type or because the file has been damaged.' Other programs that read PDFs will give other error messages, but the effect is still the same - the file won't open. ALL OF THESE FILES WORKED AND OPENED THE DAY BEFORE THE UPDATE (to be fair, it's not like I opened up every single file the day before the update, but I never had this problem before and files that I did open up aren't working now), something about the update broke them. The weird thing is that this problem doesn't affect all my PDF files, but many many many of them. I haven't been able to figure out a rhyme or reason yet for which ones will open and which ones won't.

This seems a lot like a problem I experienced after a previous Windows update a year or so ago that would give me the same error message for my Excel files. Back then, I was able to roll back the update and all the files magically worked again. Eventually something was patched on the Windows side and the updates no longer gave me an issue. Apparently, Windows won't let me do the same thing with this update. I'm stuck with it.

So throwing this out to the great hive-mind to see if anyone has any ideas.

