
Pre-translation checklist
Many
documents that are sent for translation are formatting nightmares.
There
are two main sources of errors. Writers who don't know how to
use
their word processing program are the biggest one.
For
example, a user might create a “table” or column
layout on a page
by using lots of tab characters and spaces. If the text is
translated
as-is, the spacing will in all likelihood be ruined. Also,
the
user may have broken apart sentences and text strings that should have
remained together. While translating using a TM program, this will not
be obvious. For these reasons, it's almost always worth spending
some time before translation to convert these
sections into a
well-structured Word table.
The other
main source of problems is PDF conversion. Many
source documents come to us in PDF format. It's often best to
use
a program such as FineReader to convert them to a
word-processing format, such as Microsoft Word. The process is a great
time saver, but also introduces errors.
Most
documents are not plain text, but include formatting such as bold,
italic, a table of contents or a page number. DV handles this
information by converting it to codes, displayed as {001}. Many of
those codes are necessary and unavoidable. But users and PDF
converters can each introduce “rogue
codes”. Whether
intended or not, those codes prevent TM programs from
recognizing “exact matches”, or otherwise require
more time to handle
during translation. It's best to reduce those codes to a minimum. Some
of the most common methods are addressed in the list below.
Whatever
the source, almost every document requires cleanup
before I
import it into my translation memory program. Here’s a
partial
checklist:
- Fix irregular character spacing and font
sizes.
Frequently, select the entire document and set the character spacing
(scale, spacing, and
position) to the normal values. (However, this doesn't always
necessarily reset the spacing of every character.)
- Misuse of multiple spaces and/or tab characters. I always do
a
search for all instances of two or more spaces, tabs, or any
combination. That helps to find the following:
- Failure to use Word tables to align text in columns
and rows.
- Spacing text on a line by using tabs and/or spaces, instead
of “center” or right justification or by modifying
the tab settings.
- Search for all sequences of two (sometimes three)
paragraph marks. Users often force a new page by pressing the
<Enter>
key 20 times. Replace these instances with a manual page break.
- Usually set the paragraph spacing to insert one line
before or each
paragraph.
Usually I use a value of “1 line” instead of a
value in pts (points). Delete most multiple paragraph marks.
- Make sure that headings and subheadings are
set to “Keep with next”, to make sure they stay on
the same page as the
following text. This is helped a great deal by the previous item, above.
- Just a preference really, but turn on
“Widow/orphan control”.
- Search for spaces before these marks: . , ;
: ? !
(will vary by language)
- Most of the time, convert floating objects to inline shapes
using
Miri Orfek's macro, “Images
out/in” In my opinion they're easier to control
that way. Also, too many images can make the import into DV take a long
time.
- Convert manually created Tables of Contents to automatic
versions.
- Handle Tab characters, as described
here.
- I found that it's best to use only straight
marks in my TM databases. This macro
converts single and double quotation marks to
straight
characters (' ") and not “smart” quotes (‘ ’ “ ”),
and changes the Word settings that control them. After
translation,
the
macro can change them back to “smart” quotes. (On
my system,
the macro is run using Left-Alt-' character.)
-
Abbyy FineReader has the habit of
randomly applying Bold
and sometimes Italic
formatting to periods (full
stops) and spaces.
- A
PDF document may have a header or footer on each page. By default,
FineReader and most similar programs will simply add these bits of text
into the main body of the document. Instead, fix
the Word document by creating a proper header or footer.
Include the codes for page numbers and total pages.
After
importing into
Déjà Vu, if there's
any other pattern of codes that becomes apparent, I frequently fix the
source document and reimport.
Updated:
May 23, 2010
Back to Steven Marzuola's Tips
Home: www.techlanguage.com