Tech Language logo

Tab characters in Déjà Vu

Déjà Vu 3.0 and DVX from Atril are excellent tools for translators, but they have some quirks that need workarounds.

The Problem

One is, they do not treat tab characters in Word documents as sentence delimiters. Instead, these characters are converted to formatting codes. This can cause problems. For example, if I have a document with numbered paragraphs, this is what one might look like in DV:

13.{39}Maintenance and emergency personnel

or sometimes this.

{38}13.{39}Maintenance and emergency personnel

or maybe this:

{38}13.{39}{40}Maintenance and emergency personnel

One customer routinely sends me documents with sentences that are repeated from previous source documents but that might have different paragraph numbering (letters, numbers, bullets, or nothing at all). Sometimes the paragraph numbering is automatic, sometimes it is inserted manually. In this example, I might have one or more of these source sentences in the MDB:

Maintenance and emergency personnel

{001}III.{002}Maintenance and emergency personnel

{001}-{002}Maintenance and emergency personnel

Sometimes DV will find a fuzzy match but then omit the codes and the preceding character. Other times it won't make a good match at all.

My preferred solution would be for DV to define the tab character – whether it appears in automatic or manual numbering – as a sentence delimiter, and then handle the sentence separately from any leading number/letter/bullet characters.

However, despite requests from several users, Atril has been unable to provide this feature. Evidently this task presents a difficult programming challenge.

The workarounds

There are several workarounds. One is, replace the tab with another character such as ~ .  This character can then be defined as a sentence delimiter (within DV). Another is to insert a new paragraph (with the Enter key) after each tab. Both of these require only a few seconds of pre-editing and post-editing for each document.

However, these methods don't work on tabs that are part of the MS Word “list formatting” feature (automatic paragraph numbering or bullets). The first sentences of such paragraphs are still subject to the same problems described above.

One approach in that case is to convert all the automatically-generated numbers and bullets to actual text and tabs. This is what I do most of the time, because the automatically-generated features are almost never required in a translation. Note that this is a one-way conversion. There's no simple way to restore the automatic bullet/numbering feature after this step is performed.

The sequence is:

  1. Make sure that there are no sequences of “Tab-Enter” in the document. To do this, replace all “Tab-Enter” with “Enter” (using Search and Replace).
  2. Convert all automatic bullets and numbering to ordinary text. This requires a VBA command (Visual Basic for Applications), which can be added as a step in a macro.

    ActiveDocument.ConvertNumbersToText

    After this, the document should look exactly as it did before (with one occasional exception: sometimes new page breaks are added after some of the converted bullet characters.)
  3. Replace all Tab characters with the sequence Tab-Enter.
  4. Save the document, and import it into DV. All leading bullets and section numbers will be shown on rows that are separate from the sentences that follow them.
  5. Translate as usual and export. Open the translation in Word.
  6. Replace all Tab-Enter sequences with a single Tab.

What if the situation requires that the automatic numbering and bullets be retained in translation?

One possible solution is the macro below (also available here) It examines the first automatically numbered (or bulleted) paragraph in the document, to see whether the bullet or paragraph number character is hidden. It then sets *ALL* of the numbered / bulleted paragraphs in the entire document to the opposite setting. This is a “toggle” function, meaning that there is no separate macro to reverse its effects.

One bug: if the user has changed some of the standard defaults for the automatic bullet / number settings, then it's possible that one or more styles of bullets or numbers may remain hidden after using the macro in the finished translation. Simply run the macro once or twice more and that should Unhide all of those characters.

You can assign this macro to a button on a Word toolbar. Click the button to hide the numbering. Then, when revising the translation after export from DV, click the button again to change back to non-Hidden.


Sub ToggleHideParaNos()
' Revised Mar 1, 2003 by Steven Marzuola
' Toggle Hidden attribute for paragraph numbers (or bullets)
' that are automatically generated by Microsoft Word
' Contains error checking

Dim myPar, a As Long, newValue As Boolean, newValDef As Boolean

newValDef = False
On Error GoTo myContinue

For Each myPar In ActiveDocument.ListParagraphs
  a = myPar.Range.ListFormat.ListLevelNumber
  If newValDef Then
    myPar.Range.ListFormat.ListTemplate.ListLevels(a).Font.Hidden = newValue
  Else
    newValue = Not myPar.Range.ListFormat.ListTemplate.ListLevels(a).Font.Hidden
    newValDef = True
    myPar.Range.ListFormat.ListTemplate.ListLevels(a).Font.Hidden = newValue
  End If

myContinue:

Next
End Sub

Macro revised: March 1, 2003
Page revised: December 13, 2006
Back to Steven Marzuola's Tips
Home: www.techlanguage.com