Workarounds for the DV/DVX number problem

by Steven Marzuola


In a posting in August 2007, to Dejavu-L, a discussion group at Yahoo Groups, Marie Gouin expressed the following complaint about DVX:

I'm not one to complain, but I must say that I'm really upset at the way DVX handles numbers. I'm currently translating a cookbook, which could have been done really quickly because of all the repetitions in the ingredients. Unfortunately, thanks to the bug with DVX, this has been a nightmare project. I could give countless examples of the problems I've had, but just a couple should suffice:
Source   Target
 1/2 cup (125 ml)   3/4 cup (1/2 ml)
1/4 tsp (1 ml) salt   1/2 c. à thé (1/4 ml) salt

Here's my response, with some additional information:

Marie, you've come across one of the most persistent and annoying bugs in DV. In response, Atril has usually told its users to omit sentences with numbers from the MDB. That is simply impractical and you have found a great example.

BTW, it usually occurs when the source sentence has more than one number. There are at least two workarounds, for the worst cases.

As a complement to this, I created new TDB's, just for checking purposes, that contain all the one-letter and two-letter "words":

   à è ì ò ù Â Ê Î Ô Û àà àè àì àò àù à àÊ àÎ àÔ àÛ, èà èè èì ... etc.

I use that as part of my terminology check in Déjà Vu after translation is complete.

This terminology check is a very useful part of DV.  In particular, it catches the numbers error given in Marie's example.  However, as my translation memory databases have grown, it's become impractical to check the translation against the entire MDB and TDB.  Instead, Atril has suggested using an empty MDB and TDB during this test, to verify that numbers have been translated correctly. However, I have extended this technique. Instead of being empty, my "check" TDB's contain chosen words that should always be translated a certain way, or left untranslated. Examples:

  1. All one-letter and two-letter combinations that are not real words or abbreviations in either language: aa, ab, ae, af, ag, ... b, ba, bc, bd ... These can be present as parts of serial numbers, part numbers, references, etc.
  2. Roman numerals: II, III, IV, V, VII, .... up to XXXIX. Since "I" is a word in English and "vi" is a word in Spanish, these are omitted from the English-Spanish and Spanish-English "check" TDB's.
  3. Numbers: two=dos, three=tres ... up to twenty. Also thirty, forty, fifty up to ninety. Some are omitted because there's no translation that is always correct. For example, "one" in English can be "un", "uno" or "una" in Spanish; those three can be translated as either "one", "a" or "an".
  4. The nonsense words mentioned above: à è ì .. àà àè àì ... èà èè èì ... etc.
  5. A handful of other terms and expressions

If you would like some help setting up a "check TDB" for your language pair, please contact me.


Back to Steven Marzuola's Tips
Home: www.techlanguage.com