Document AI company Epapyrus announced on the 18th that it has upgraded its sensitive information de-identification module, “BlackMarker”, and now provides it as an add-on feature to its PDF streaming viewer “StreamDocs” and PDF conversion solution “PDF Gateway”.
“BlackMarker” automatically detects and masks sensitive information within a PDF document, deleting the original data to block the risk of information leakage. With this upgrade, it can now automatically recognize and process both structured data such as phone numbers, resident registration numbers, and emails, as well as unstructured personal information like names and addresses.
The improvement in de-identification accuracy is thanks to an AI de-identification technology that combines rule-based and dictionary-based models utilizing an extensive database of names and addresses. When users upload documents, they undergo automatic text preprocessing, resulting in both structured and unstructured personal information being de-identified.
Epapyrus has also enhanced user-customized features. The masking method can be set as text or a special character, and the server-based batch processing method allows rapid processing of large volumes of documents. It has reduced cost burdens by achieving AI-level performance without the need for high-performance GPUs.
Kim Jeong-ah, Vice President of Epapyrus, explained, “By providing AI-level de-identification accuracy in a general GPU environment, we can achieve both cost savings and information protection.”
In the PDF viewer ‘StreamDocs’, users can apply de-identification by uploading documents to the viewer and using search, designation, or automatic recognition methods. De-identified documents can be safely shared via web links without downloading.

With the PDF conversion solution “PDF Gateway”, various documents such as Hangul and MS Office can be converted to PDF, and personal information can be automatically de-identified simultaneously. Automatic masking of specific data like institution names, nouns, and regular expressions can also be implemented through API customization.

Epapyrus announced that it is continuously working on the development of functions and performance improvements in electronic document solutions in cooperation with its overseas subsidiaries in the United States, Japan, and Europe.