RWS Community
RWS Community
  • Site

Trados Studio

Trados Team

Trados Accelerate

Trados Enterprise

Trados GroupShare

Trados Business Manager

Passolo

MultiTerm

RWS AppStore

Connectors

Beta Groups

Managed Translation

MultiTrans

TMS

Trados Enterprise

WorldServer

Language Weaver

Language Weaver Edge

Language Weaver Connectors

Language Weaver in Trados Studio

 

 

Tridion Docs

Tridion Sites

Contenta

LiveContent

XPP

Trados Studio Ideas

Trados GroupShare Ideas

Trados Team Ideas

Trados Team Terminology Ideas

Trados Enterprise & Accelerate Ideas

Trados Business Manager Ideas

MultiTerm Ideas

Passolo Ideas

RWS Appstore Ideas

Tridion Docs Ideas

Tridion Sites Ideas

Language Weaver Ideas

Language Weaver Edge Ideas

Managed Translation - Enterprise Ideas

TMS Ideas

WorldServer Ideas

Trados Enterprise Ideas

XPP Ideas

GroupShare Developers

Language Cloud Developers

MultiTerm Developers

Passolo Developers

Trados Studio Developers

Managed Translation Developers

TMS Developers

WorldServer Developers

Tridion Docs Developers

XPP Developers

Language Combinations by Language Services

RWS Training & Certification

Style Guides

RWS Campus

Trados Approved Trainers

ETUG (European Trados User Group) Public Information

Nordic Tridion Docs User Group

Tridion West Coast User Group

Community Ops

demospace

RWS Community Internal Group

AURORA

Internal Trados Ideas

Linguistic Validation

Mercury

XPP Cloud

Recognition & Reward System

RWS Community Platform Related Questions

Community Solutions Hub (Trados)

Events

RWS Training & Certification

To RWS Support

  • Search
  • Translate

    Detecting language please wait for.......


    Powered by
  • User
  • Site
  • Search
  • User
  • Products
  • Trados Portfolio
  • RWS AppStore
  • More
  • Cancel
RWS AppStore

RWS AppStore > Wiki

PDF Assistant for Trados
  • Home
  • Forums
  • Wiki
  • More
  • Cancel
  • New
Show Translation Options

Detecting language please wait for.......


Powered by
RWS AppStore requires membership for participation - click to join
  • -RWS AppStore
    • +AdaptiveMT Trainer
    • +Amazon Translate MT provider
    • +Antidote Verifier
    • +Apply Studio Project Template
    • +ApplyTM template
    • +AppNotifications
    • +AutoHotKey Manager (AHK)
    • +CleanUp Tasks
    • +Community Advanced Display Filter
    • +DeepL Translation Provider
    • +DSI Viewer
    • +Excel Terminology Provider / TermExcelerator
    • +Excel4GroupShare
    • +Trados Studio – Export Analysis Reports
    • +Export To Excel
    • +Fail Safe Task
    • +Google API Validator
    • GroupShare Kit
    • +GroupShare Version Fetch - Authentication options
    • +Hunspell Dictionary Manager
    • +IATE Real-Time Terminology
    • +IRIS PDF OCR Support for Studio
    • +Lingotek TMS plugin for Trados Studio
    • +MEMSOURCE CloudTM Provider
    • +MT Comparison
    • +MT Enhanced Plugin for Trados Studio
    • +MXLIFF File Type Support
    • +PA Admin
    • +Post-Edit Compare
    • +projectTermExtract
    • +Qualitivity
    • +Rapid Add Term
    • +RecordSource TU
    • +Reports Viewer Plus
    • +Trados Data Protection Suite
    • +Trados Freshstart
    • +Trados LegIt!
    • +Language Weaver
    • +Language Weaver Edge
    • MultiTerm Extract
    • +MultiTrans Plug-in for Trados Studio
    • +Trados Studio InQuote
    • +Trados T-Window for Clipboard
    • +Trados TMBackup
    • +Trados Studio Word Cloud
    • +Trados TTS
    • +SDLTM Import Plus
    • +SDLXLIFF Convertor for MS Office
    • +SDLXLIFF Split / Merge
    • +SDLXLIFF to Legacy Converter
    • +SDLXLIFF Toolkit
    • +Signoff Verify Settings
    • +Star Transit (TransitPackage Handler)
    • +Studio Migration Utility
    • +Studio Subtitling
    • +studioViews
    • +Stylesheet Verifier
    • +Target Renamer
    • +Trados Copy Tags
    • +Target Word Count
    • +TM Lifting (ReIndex Translation Memories)
    • +TM Optimizer
    • +Trados Analyse
    • +Trados Batch Anonymizer
    • +Trados InSource!
    • +Trados Number Verifier
    • +Trados Transcreate
    • +Wordlight
    • +WorldServer Compatibility Pack for Trados Studio
    • +XLIFF Manager for Trados Studio
    • +XML Reader
    • +#YourProductivity
    • +File type definition for TMX
    • +File Type Definition for Wordfast TXML
    • +Google Cloud Translation Provider
    • IntegratedSegmentActions
    • +Jobs
    • +Microsoft Translator Provider
    • +MSWord Grammar Checker
    • +Multilingual Excel Filetype
    • +Multilingual XML FileType
    • +OpenAI Translator
    • +OpenX Hash Generator
    • -PDF Assistant for Trados
      • Changelog: PDF Assistant for Trados
    • +RWS Community Inside
    • +SDLTM Repair
    • +SDLXLIFF Compare
    • +Segment Status Switcher
    • +Studio Time Tracker
    • +TermInjector
    • TM Compatibility Plug-in for Trados Studio 2015
    • +TMX Translation Provider
    • +Trados 2007 Translation Memory Plug-in
    • +Trados Compatibility and Migration Power Pack
    • +Trados QuickInfo
    • +Trados Translation Memory Management Utility
    • +TuToTm
    • +Variables Manager for Trados Studio

You are currently reviewing an older revision of this page.

  • History View current version

PDF Assistant for Trados

The PDF Assistant for Trados is an Add-In for Trados Studio that supports the conversion of a PDF to a DOCX so it can be successfully translated and delivered as a DOCX target file.

Table of Contents

  • Installation
  • Where is it installed?
  • Working with PDFs
  • Using the "Add-in"
    • Adding your files
    • Selecting your Provider and OCR options
    • Image Selection
    • Summary Stage
    • Preparation
  • DTP the converted files before Translation

Installation

The application is an sdlplugin and can be installed either by visiting the RWS AppStore, downloading, and then manually installing by double clicking the sdlplugin file in the usual way.  Alternatively the plugin can be installed throgh the Integrated AppStore in Trados Studio.  For this to work you must have Microsoft Office installed.  The testing was carried out on computers using Office 365 and not on older versions.

Where is it installed?

The plugin is installed into the ribbon in the "Add-Ins" tab and into the "Toolbox" group:

Working with PDFs

The application is designed to support the conversion of PDF files into a DOCX so that you can improve the quality of the DOCX prior to translating it in Trados Studio.  The reason we have taken this approach is because PDF to DOCX conversion without professional editing software can sometimes cause formatting issues, resulting in a document that looks different from the original PDF.

The more common problems that can occur during PDF to DOCX conversion would be things like:

  1. Text and image placement: Sometimes, the text and image placement can become distorted during conversion, causing the final document to look different from the original PDF.

  2. Formatting issues: PDFs often have complex formatting, such as columns, tables, and graphs. These elements can be difficult to convert to DOCX, leading to formatting issues in the final document.

  3. Fonts: If the PDF contains fonts that are not installed on the computer doing the conversion, the text can appear differently in the final document.

  4. Large files: PDF files can be very large, and converting them to DOCX can result in large files that take up a lot of storage space.

  5. Security features: Some PDFs have security features that prevent copying and pasting, which can make it difficult to convert the document to DOCX.

  6. OCR issues: If the PDF contains scanned images or text that was not originally digital, OCR (optical character recognition) software is needed to convert the text.  However, OCR can sometimes produce errors or miss characters, leading to mistakes in the final document.

  7. Unnecessary Tags: any of the above problems can lead to many unnecessary control tags being inserted into the DOCX that will become visible when working with a translation tool.

  8. Poor Segmentation: similarly any of the above issues can lead to unnecessary hard returns being added into the DOCX and these will also make translation more difficult than is necessary.

  9. Incorrect character display: If the character encoding is incorrect, it can cause characters to be displayed incorrectly in the final document. For example, some characters may appear as question marks or boxes especially with Asian character sets.

  10. Missing characters: In some cases, incorrect encoding can cause certain characters to be missing from the final document.  This can result in text that is difficult to read or understand.

  11. Encoding conflicts: If different parts of the document are encoded in different ways, it can cause conflicts and errors during conversion.  For example, some characters may be encoded in UTF-8 while others are encoded in ASCII, leading to errors when the document is converted to a PDF or other format.

It's important to note that the quality of the conversion largely depends on the quality of the original PDF and the conversion software used.  Some conversion tools may produce better results than others.  This "Add-In" initially makes use of the Microsoft Word desktop API providing simple text conversion and also some OCR capabilities.  Whilst you could simply use Word and avoid the "Add-In" altogether it's worth noting that the plugin does provide more support than Microsoft makes available through Microsoft Word, in particular around OCR capability.

Using the "Add-in"

Adding your files

The PDF Assistant for Trados is started by clicking on the icon in the ribbon.  This opens up a small wizard where you can add your files:

You can add as many files as you like, in as many languages as you like, but keep in mind the process could take a considerable amount of time and may even run out of memory if you ask for too much.  How many files you can use really depends on the number of pages, number of images in the file, amount of OCR work required etc.  Think about the work you are about to carry out and don't expect miracles!

The files or folders can be added via drag and drop, or by using the small icons in the wizard.  In this example two PDF files have been added.  An English language text containing two images, one that needs to be OCR'd and one that does not; and a Korean document that is non-readable, so the entire content is one big image in the PDF.

Selecting your Provider and OCR options

This screen allows you to do several things:

  1. select the PDF Assistant you wish to use.  For now there is only Microsoft Word to select from.
  2. check the options to specify whether or not you wish to extract text from the images and if so which ones you would like to be processed (OCR'd)
    1. keep in mid that when you OCR the images you will lose any background image that was there and will only have the text that the software was able to extract

Image Selection

This part of the wizard will extract the images the software was able to identify and allow you to specify which of the images contain translatable text:

In this example I have only selected two images for OCR'ing... the table image in the English file and a small banner in the Korean.  I can then click next to be presented with the Summary

Summary Stage

This screen in this stage of the wizard displays a summary of the options you have chosen for the conversion:

Preparation

The final stage provides an indication of the progress until the conversion has completed:

It is possible that some PDF files cannot be processed and the Word API could cause the conversion to get stuck like this:

Or return an error.  If it gets stuck you'll probably need to crash out of Studio, and possibly Microsoft Word as well.  Do this via the task manager (Ctrl+Shift+Escape) and "End task":

When this occurs you will probably find that Microsoft Word can open the file, but it will remain as an uneditable image.  In this case you may be left with these options:

  • you need a more sophisticated conversion software to be able to manage the file
  • transcribe the file and recreate it, add any images in by scanning the document and cutting out the images you need
  • try to get a copy of the original source file prior to being converted into a PDF (this would really be preferable to a PDF from the very start!)

PDF translation is not a fool proof business!!

DTP the converted files before Translation

Now you can open your converted PDF files as a DOCX in Microsoft Word and improve the quality of the file before you translate it.  This way the target file will probably be ready to go, or at least require minimal editing to accommodate changes required as a result of text expansion/contraction in the target language.

A good tool for tidying up files resulting from a messy PDF conversion is TransTools available here - https://www.translatortools.net/products/transtools

In the example files, the English file contained two images, one that was OCR'd and the other treated as an image.  The result isn't bad (PDF on the left, converted DOCX on the right) and if you were to open this PDF file in Microsoft Word both images would be handled as images, so the "Add-In" does provide considerable value here.  The table needs tidying up but it is editable and could save time when more extensive text is involved:

On the Korean non-readable PDF.  Some formatting would be required, but it's not too bad.  The image is floating and can be positioned wherever I like, and all the text is available to me for translation.  So some small amount of DTP work and I'll have a file that is easily translatable and the target file should be good with minimum work required:

  • Our Terms of Use
  • Copyright
  • Privacy
  • Security
  • Anti-slavery Statement
  • Cookie Notice
  • YouTube