    As Lavid (2005) points out, information has become one of the

    basic elements in our current society, which may be called the Third

    Wave, paraphrasing Alvin Tofflers book (1996). First wave is the

    society after agrarian revolution; Second wave is industrial. Third

    Wave represents information and knowledge revolution. New

    millenniums society is information society, where Information and

    Communication(s) Technology (ICT) is of paramount importance.

    Therefore; the exchange of languages and cultures plays an important

    role in this information society. Consequently, translators and

    interpreters may become fundamental mediators on a global level.

  • M. Cristina Toledo Bez


    In this context, the Internet seems an essential tool, offering

    new modes of communication and spreading scientific knowledge. In

    addition, it facilitates and improves the documentation process. The

    translator, as an information user and an information producer,

    considers the Internet to be a valuable documentation source and a

    useful communication system.

    According to Pinto Molina (2002: 2), the informational

    revolution makes it possible to compile more information in less time

    and, consequently, improve the translators efficiency. With the

    mushrooming of the quantity of online text information, triggered in

    part by the growth of the World Wide Web, it is especially useful to

    have tools which can help users digest information content.

    Nevertheless, translators have to be extremely skilful during the

    documentation process since they need to be able to distinguish and

    choose only reliable information resources. This is because the

    Internet, although it is a valuable and very useful tool, contains a large

    amount of unreliable information.

    In that regard, an abstract may be quite useful for translators

    since it helps to select the correct information in the documentation

    process. Given that translators normally must meet tight deadlines,

    abstracting articles or electronic resources is an advantageous solution

    and facilitates the translation process. Consequently, automatic

    summarization and extraction, both fields of Computational

    Linguistics, can help humans in general and translators in particular to

    deal with information overload by automatically extracting the gist of


  • Abstract


    This thesis aims to combine both automatic summarization and

    translation in order to test whether automatic summarization as a new

    translation technology could be a useful tool in a translators



    Our main research hypothesis is that term-based automatic

    summarization as a documentation resource enhances direct and

    inverse translation of specialized texts. However, as Tymoczko (2002:

    16-17) points out, the starting point in Translation Studies is not just a

    hypothesis, and, consequently, we present a tripartite hypothesis:

    I) Research on the combination of automatic summarization

    and Translation and Interpreting needs empirical studies in

    order to test its efficacy.

    II) The translation of specialized text, specifically research

    articles in the legal-technological domain in three

    languages (Spanish, English and French) and in direct and

    inverse combinations, is improved with the help of

    Term-Based Summariser.

    III) Term-based automatic summarization should be part of an

    innovative translator's workbench.

  • M. Cristina Toledo Bez


    The aims listed above are achieved by setting the following list

    of general (1-2) and specific (3-11) goals:

    1. Providing a review of major work in translation

    technologies and in human and automatic summarization.

    2. Emphasising the relevance of documentation as a

    cornerstone in specialized translation.

    3. Building a representative multilingual comparable corpus

    of parallel texts from research articles on electronic

    commerce in three languages (Spanish, English and


    4. Focusing on the emerging legal-technological discourse

    from the Information Technology Law and Data


    5. Comparing the legal-technological discourse features in

    three languages, i.e., Spanish, English and French.

    6. Studying the research article as a textual genre.

    7. Testing whether Introduction-Material and Methods-

    Results-Discussion/Conclusion (IMRD) structure of

    English scientific articles may be valid to articles, on one

    hand, on Legal Sciences and, on the other hand, in the

    Romance languages of Spanish and French.

    8. Establishing evaluation parameters combining both

    analytic and holistic evaluation in order to find objective

    criteria in Translation Studies.

    9. Carrying out experiments with semi-professional

    translators offering quantitative results regarding three

  • Abstract


    main criteria: quality criteria, lexical richness criteria and

    number of words criteria.

    10. Analysing translators impressions and opinions regarding

    the use of the Term-Based Summariser by means of a

    survey and qualitative data.

    All of these goals were achieved in this thesis by means of the

    following materials and methods.


    To confirm the main hypothesis, several materials are used in

    this thesis dissertation.

    3.1. Term-Based Summariser

    First of all, the main material is Term-Based Summariser

    (TBS), a modified version of the Computer-Aided Summarisation

    Tool (CAST) developed by the Research Group in Computational

    Linguistics from the University of Wolverhampton. The weighting

    method used to score the words is the term frequency and the

    tokenisation method is the word. It produces both only summary

    and the whole text with highlight results; the former is just the

    extract and the latter encompasses the sentences selected marked with

    a different colour in the text. A compression rate can also be chosen.

    A stop list is also used for each language (Spanish, English and

    French) and TBS displays the top 50 terms identified by the program

    with their raw frequency of the words in the text. To have a clear and

    user-friendly TBS interface, 20 texts in each language were selected

    and their titles are written in bullet points.

  • M. Cristina Toledo Bez


    3.2. Multilingual comparable corpus

    A multilingual comparable corpus was compiled, consisting of

    a collection of parallel texts from research articles on electronic

    commerce in the three languages studied in the thesis (Spanish,

    English and French). The research articles were selected from journals

    in Spanish (Revista de Contratacin Electrnica), English (Journal of

    Information, Law and Technology and International Journal of Law

    and Information Technology) and French (Revue des techniques de

    l'information et de la communication, Revue internationale de droit

    conomique, etc.) and the distribution of articles was as follows: 150

    articles in Spanish (1,500,281 tokens), 142 articles in English

    (1,226,260 tokens) and 86 articles in French (1,277,841 tokens).

    Initially, the purpose of building the corpus was to implement

    the inverse document frequency for scoring the words, but, once the

    term frequency method was selected, the corpus was used to analyse

    the characteristics of research articles on electronic commerce in the

    three languages. Apart from that, one article in each language was

    selected as a source text for the direct and inverse translations.

    Consequently, source texts all shared the same domain

    (legal-technological discourse) and the same textual genre (research

    articles). Each article was then divided into different paragraphs and

    the same sections were selected from all the articles: on one hand,

    title, keyword and introduction (part 1 for direct translation and part 3

    for inverse translation) and, on the other hand, the section similar to

    the materials and methods one (part 2 for direct translation and part 4

    for inverse translation).

  • Abstract


    3.3. Markin and evaluation

    The teaching software Markin provides tools to mark and

    annotate texts. Once our evaluation parameters were established,

    Markin was used to evaluate direct and inverse translations with a set

    of annotations. These evaluation parameters consist of both analytic

    error evaluation as well as holistic and global evaluation. The former

    pays attention to negative aspects such as source text related errors

    (wrong sense, unnecessary addition or inadequate linguistic variation),

    target text related errors (orthography, grammar, terminology or

    textual type) and also to positive aspects such as correct terms. The

    holistic evaluation evaluates the translation as a whole and it has five

    different levels regarding transfer and expression quality. The levels

    range from 1 (very poor translation) to 5 (excellent translation). The

    evaluation of the direct and inverse translations with and without TBS

    constitutes one the criteria studied in this thesis.

    3.4. WordList in Oxford WordSmith Tools

    WordList in Oxford WordSmith Tools (version 3.00.00) is used

    in this thesis to calculate the lexical richness of translated texts by

    means of the type/token ratio. It also provides other results such as

    number of types, number of tokens and number of bytes. These results

    are also discussed.

    3.5. ReCor

    ReCor 3.1 is used to assess the representativeness of the corpus

    compiled in this thesis and its results are reflected in graphics. These

    results are also discussed.

  • M. Cristina Toledo Bez


    3.6. Electronic survey

    An electronic survey of 33 questions was created in order to

    generate qualitative results regarding the use of the TBS in direct and

    inverse translations. The most important questions concerned the use

    of Term-Based Summariser in both direct and inverse translations.

    Ninety-five out of ninety-five semi-professional translators filled out

    the electronic survey in a classroom environment.

    4. METHODS

    Both the methods and results sections in this abstract are

    divided into three subsections: contrastive analysis of discourse and

    domain, contrastive analysis of genre and textual type and

    experiments with semi-professional translators.

    4.1. Contrastive analysis of discourse and domain:

    legal-technological discourse

    Before translating any text, a thorough analysis of its

    terminology, lexis and textual features must be carried out in order to

    familiarise the translator with the source text. Consequently,

    legal-technological discourse in Spanish (from Spain), English (from

    the United Kingdom) and French (from France) must be approached

    in this thesis. However, before studying the discourse, the legal

    systems of the three languages must be compared due to their

    differences: English legislation belongs to the common law while

    Spanish and French are part of the civil law. Having established the

    distinction between the two legal systems, the built corpus was the

    starting point for the discourse approach.

  • Abstract


    Nevertheless, since most texts in the parallel corpus share the

    same terminological and lexical features because they all belong to the

    same domain, we only analysed the discourse from the three texts

    used as source texts in the translations and the results show that the

    legal-technological discourse has common features in Spanish,

    English and French even though they belong to different legal system.

    They will be presented in the results section.

    4.2. Contrastive analysis of textual genre: research article

    As mentioned above, the textual genre of the texts analysed is

    the research article. All the literature review considers this genre to

    have a very well established structure, particularly in the field of

    Science and Technology, presenting the following sections:

    Introduction, Materials and methods, Results and Discussion (IMRD).

    It is also important to note that English is the language of scientific

    communication in the scientific community, even for non-native

    speakers, and, consequently, the IMRD structure is essential to that


    In the work undertaken in this thesis we aim to prove whether

    the IMRD structure is used in Legal Sciences and in Romance

    languages such as Spanish and French. Since all the articles were

    selected from the same journals or very similar ones and they shared

    the same structure and format, we chose the 60 articles from the TBS

    interface, 20 in each language, and we compared them in couples (first

    Spanish and English, later Spanish and French and finally English and

    French). The results of this alignment will be presented below.

  • M. Cristina Toledo Bez


    4.3. Empirical experiments with semi-professional translators

    The experiments were carried out by 96 undergraduate students

    from the 4th year in Translation and Interpreting from the University

    of Mlaga. They all have similar grades (60-70 in previous courses)

    but, in order to avoid different variables, Socrates/Erasmus students

    were not allowed to take part in the experiments.

    Twenty-seven of ninety-five students study French as their first

    foreign language and sixty-nine study the English language. The

    difference between the two groups is related to the University

    restrictions for the student/language ratio: seventy-three is the

    maximum for English and forty for French. Taking into account these

    data, the sample is quite relevant.

    The experiments were carried out in a 3-hour classroom

    environment and the procedure was the same for four groups:

    1. First, experiments and Term-Based Summariser (TBS) were

    briefly explained in 15 minutes.

    2. Students translated Part 1 (title, keywords and introduction)

    from English or French into Spanish (direct translation)

    with online dictionaries. They were not allowed to use the

    TBS, nor any other parallel texts. Part 1 took 20 minutes.

    3. Students translated Part 2 (materials and methods) from

    English or French into Spanish (direct translation) with the

    TBS as the only terminological and information resource.

    Part 2 took 20 minutes.

    4. Students translated Part 3 (title, keywords and introduction)

    from Spanish into English or French (inverse translation)

  • Abstract


    with online dictionaries. They were not allowed to use the

    TBS, nor any other parallel texts. Part 3 took 20 minutes.

    5. Students translated Part 4 (materials and methods) from

    Spanish into English or French (inverse translation) with the

    TBS as the only terminological and information resource.

    Part 4 took 20 minutes.

    6. Students filled out the electronic survey. This final part took

    15 minutes.

    The number of target texts (translations) comprised a subcorpus

    of 379 documents: 137 for English-Spanish translation, 135 for

    Spanish-English translation, 56 for French-Spanish translation and 51

    for Spanish-French translation. There is no sample attrition.

    It is worth describing how translators used TBS as an

    informational and terminological result. After displaying the list of the

    research article titles, each translator chose the most appropriate title

    depending on the source text. Then they summed up the parallel text

    with the TBS using either the only summary or the whole text with

    highlights options and setting the compression rate at 10-15 %

    because of the length of articles. Then, they read the result displayed

    and searched for the most suitable terms or phraseological units for

    the translation process. The process was the same in both direct and

    inverse translation.

  • M. Cristina Toledo Bez


    5. RESULTS

    5.1. Results of contrastive analysis of legal-technological discourse

    A brief sample of the results after comparing the

    legal-technological discourse in Spanish, English and French is listed


    Spanish English French


    Specific terms Fehaciente Plaintiff Lgislateur

    Terms in Latin Prima facie Lex fori Inter alia

    Hellenisms Sinalagmtico Politique

    Anglicisms Marketing Common law

    Gallicisms Promocin Arbitrage

    Collocations Marco contractual Overriding issue Charte majeure

    Suffixation Oferente Consumer Prestataire


    Passive voice

    Los datos de carcter personal sern cancelados

    Consideration must be given to a

    new means

    Cette politique dharmonisation

    est base sur deux ides

    Particular use of verbs

    Aunque en razn del artculo 1

    resultare aplicable la Convencin

    It is submitted that the

    alternative requirements

    La politique de rgulation

    traditionnelle porte

    Table 1. Results of contrastive analysis of legal-technological discourse.

  • Abstract


    5.2. Results of contrastive analysis of research article as genre

    After comparing the structure of the sections of 20 articles in

    Spanish, 20 in English and 20 in French, the main results below show

    the percentage of articles that follow the different structures described.

    The IMRD structure is not always used because, for instance, neither

    the result section nor the materials and methods appear in any of the

    articles analysed. However, the genre research article shares common

    features in the three languages and in the Legal Sciences, although

    some differences are found, particularly with the English language

    given that Spanish and French, as Romance languages, are more


    Spanish English French

    Title 46%: less than 8 words

    52%: less than 8 words

    70.8%: less than 8 words






    Introduction-problem-solution 29.4% 20% 26.6%

    Presentation of a system or analysis

    17.6%, 20% 6.6%

    Introduction-method-solution 23.5%, 13.3% 20%


    11.7%, 20% 13.3%


    CARS structure (Swales, 1990)




    OARO structure (Swales, 2004)

    71% 46% 76%

  • M. Cristina Toledo Bez


    Materials and methods No common structure

    No common structure

    No common structure

    Results No common structure

    No common structure

    No common structure


    General results-specific results-conclusions




    General results-specific results-limitations-

    conclusions-future work






    Context-results-limitation-future work


    Table 2. Results of contrastive analysis of research article as genre.

    5.3. Results of the experiments with semi-professional translators

    In order to test the efficiency of the use of Term-Based

    Summariser for specialized translation, three main criteria were

    analysed: quality criteria, lexical richness and number of words.

    5.3.1. Quality criteria

    Quality criteria are related to the evaluation parameters

    developed in this thesis. Our main interest is to test whether the

    translations with TBS as terminological and informational source have

    a better quality, (i.e., fewer errors) than translations with online

    dictionaries. In order to prove that difference, all the translations were

    evaluated with the software Markin according to analytic and holistic

  • Abstract


    evaluation parameters and some of them (50 for each combination)

    were selected to illustrate the main characteristics. The results are

    summed up in the following tables:

    Direct translation


    Translation without TBS

    Translation with TBS

    Source text related errors 239 226

    Target text related errors 301 285

    Positive aspects 198 202

    Table 3. Direct translation (English-Spanish): results for 50 best translations.

    Direct translation


    Translation without TBS

    Translation with TBS

    Source text related errors 215 197

    Target text related errors 276 275

    Positive aspects 154 181

    Table 4. Direct translation (French-Spanish): results for 50 translations.

    Inverse translation


    Translation without TBS

    Translation with TBS

    Source text related errors 305 297

    Target text related errors 318 313

    Positive aspects 103 104

    Table 5. Direct translation (French-Spanish): results for 50 translations.

  • M. Cristina Toledo Bez


    Inverse translation


    Translation without TBS

    Translation with TBS

    Source text related errors 297 284

    Target text related errors 301 296

    Positive aspects 117 124

    Table 6. Inverse translation (Spanish-French): results for 50 best translations.

    In all the tables the translations with Term-Based Summariser

    have fewer errors than the translations with online dictionaries. The

    difference is higher in direct translation than in inverse translation

    because translators try to focus more on the text itself than on the

    documentation process or on the terminological search.

    Regarding the holistic evaluation, there are also some

    differences between the translations with Term-Based Summariser

    having fewer errors than the translations with online dictionaries. As

    noted earlier, level 1 means the translation is very poor and level 5

    implies that the translation is excellent. The results according to the

    languages are as follows:

  • Abstract



    Direct translation (English-Spanish)

    Translation without TBS

    11 13 26 10 8

    Translation with TBS

    7 8 31 10 12

    Inverse translation (Spanish-English)

    Translation without TBS

    13 15 20 12 5

    Translation with TBS

    11 18 27 14 6

    Total number of translations: 277 texts

    Table 7. Results of translations involving English language.

  • M. Cristina Toledo Bez



    Direct translation (French-Spanish)

    Translation without TBS

    3 6 8 7 4

    Translation with TBS

    2 4 10 8 5

    Inverse translation (Spanish-French)

    Translation without TBS

    4 6 9 7 2

    Translation with TBS

    2 7 10 7 2

    Total number of translations: 107 texts

    Table 8. Results of translations involving French language.

    In both tables, texts translated with TBS have better levels than

    texts translated with online dictionaries, although inverse translation

    once again presents more homogenous results in both types of

    translation. The main reason is the difficulty of translating into a non-

    mother tongue.

    5.3.2. Lexical richness criteria

    Another variable studied in this thesis is the lexical richness of

    translations, since they all have the same number of words (around

    150 words). WordList is the tool used to offer the type/token ratio in

    all the translated texts. The main results classified by translators are

    shown below:

  • Abstract




    Part 1

    (without TBS)


    Part 2

    (with TBS)


    Part 3

    (without TBS)


    Part 4

    (with TBS)

    48.31 50.35 54.79 46.46

    49.33 61.80 59.26 46.46

    53.44 62.61 59.26 70.53

    55.37 51.35 72.50 70.53

    52.35 58.87 57.97 57.35

    53.70 62.61 51.49 60.20

    46.43 70.00 75.00 68.06

    55.84 63.30 65.91 68.12

    48.85 61.94 71.91 80.36

    43.08 72.37 70.00 59.40

    64.52 61.80 59.79 59.40

    49.33 62.67 53.21 64.76

    48.20 58.99 75.00 52.35

    50.00 60.48 75.00 56.29

    53.08 69.59 59.43 57.38

    50.25 56.67 62.24 56.15

    53.14 72.37 57.02 56.20

    64.42 62.70 73.33 63.06

    45.45 58.39 55.62 60.14

    45.37 64.41 56.76 58.52

    46.89 57.07 54.79 63.89

    48.51 60.00 64.49 55.64

    57.82 61.29 51.64 52.67

    46.67 49.32 57,23 64.76

    50.26 64.41 55.21 64.76

    53.02 63.22 53.57 65.63

    44.17 58.25 58.82 46.91

  • M. Cristina Toledo Bez


    53.70 58.87 79,52 56.15

    47.37 65.00 56.92 64.86

    45.18 62.69 57.67 66.67

    48.88 55.56 52.72 57.80

    45.14 65.25 61.72 59.74

    53.33 61.79 51.97 55.24

    47.11 60.94 59.50 45.88

    45.32 61.01 57.59 54.97

    49.57 61.42 50.48 56.41

    52.21 74.19 59.48 65.29

    49.79 59.35 60.95 50.85

    52.83 63. 56 89.13 63.95

    51.79 65. 87 48.48 56.15

    Type/token ratio rate: 7.19

    Type/token ratio rate: 9.59

    Type/token ratio rate: 7.59

    Type/token ratio rate: 8.49

    Table 9. Type/token ratio in English translation.

  • Abstract




    Part 1

    (without TBS)


    Part 2

    (with TBS)


    Part 3

    (without TBS)


    Part 4

    (with TBS)

    57.14 62.28 66.20 62.50

    58.96 64.93 68.09 68.38

    59.63 64.20 72.73 59.52

    57.63 60.67 70.27 60.47

    59.06 66.67 66.99 63.64

    59.06 54.93 70.45 62.81

    47.83 63.57 76.19 70.89

    52.07 63.24 52.86 51.69

    57.36 56.68 79.55 79.6

    58.78 65.17 72.73 76.12

    56.35 56.28 56.64 71.59

    57.14 65.63 74.68 75.9

    66.67 69.74 71.83 72.4

    60.69 61.59 54.91 59.50

    57.58 59.75 67.01 68.7

    57.35 60.74 61.2 67.83

    57.61 63.78 65.8 69.74

    58.55 60.8 61.9 65.00

    57.56 59.57 62.3 60.98

    Type/token ratio rate: 9.67

    Type/token ratio rate: 10.82

    Type/token ratio rate: 13.89

    Type/token ratio rate: 16.10

    Table 10. Type/token ratio in French translation.

  • M. Cristina Toledo Bez


    In all the tables the type/token ratio rate is higher in translations

    with TBS than in translation without TBS, even though we find

    differences depending on the language (French has better results than

    English) and on the translation direction (Direct translation has better

    results than inverse translation).

    5.3.3. Number of words translated criteria

    The differences in the total number of words translated in

    translations without TBS and translation with TBS is another indicator

    of the benefits from using the TBS. The results are shown below:

    Translation without TBS Translation with TBS

    Part 1 (direct English-Spanish translation):

    9548 words

    Part 2 (direct English-Spanish translation):

    13257 words

    Part 3 (inverse Spanish-English translation):

    9306 words

    Part 4 (inverse Spanish-English translation):

    9473 words

    Part 1 (direct French-Spanish translation):

    4855 words

    Part 2 (direct French-Spanish translation):

    6966 words

    Part 3 (inverse Spanish-French translation):

    3898 words

    Part 4 (inverse Spanish-French translation):

    5522 words

    Table 11. Number of words translated criteria.

  • Abstract


    The results in the left column show that the number of words in

    translations with TBS is higher than the results with translations using

    online dictionaries. The difference in inverse translation is lower than

    in direct translation particularly in English; consequently, we infer that

    inverse translation from Spanish into French takes less time than

    inverse translation from Spanish into French. However, further studies

    are required.

    5.4. Results from the electronic survey

    A complete piece of research must also offer qualitative results

    and the survey used in this thesis consisting of 33 questions provides

    some indications on the translators impressions. The most relevant

    data concern the use of Term-Based Summariser during the

    documentation and translation process.

    The questions below are classified in a scale from 1 (almost

    nothing) to 5 (very). The most interesting results concerning the

    answers of the 95 semi-professional translators are as follows:

  • M. Cristina Toledo Bez


    1 2 3 4 5

    1. How important are titles in Term-Based Summariser?

    2.1% 17.0% 38.3% 35.1% 6.4%

    2. Did Term-Based Summariser help you to familiarise yourself with the research article structure?

    12.8% 45.7% 22.3% 12.8% 6.4%

    3. How useful is Term-Based Summariser as provider of parallel texts?

    2.1% 19.1% 34.0% 29.8% 14.9%

    4. Did you feel comfortable translating with online dictionaries?

    1.1% 9.6% 42.6% 34.0% 12.8%

    5. Did you feel comfortable translating with Term-Based Summariser?

    6.4% 24.5% 30.9% 30.9% 7.4%

    6. Did Term-Based Summariser help you in the direct translation process?

    14.9% 39.4% 23.4% 16.0% 6.4%

    7. Did Term-Based Summariser help you in the inverse translation process?

    16.0% 37.2% 16.0% 21.3% 9.6%

    8. Is the top 50 terms list useful for translators?

    3.2% 7.4% 25.5% 35.1% 28.7%

    9. Do you think Term-Based Summariser is useful for the documentation process?

    1.1% 14.9% 29.8% 30.9% 23.4%

    10. How useful is the option only the summary?

    3.2% 17.0% 37.2% 35.1% 7.4%

    11. How useful is the option the whole text with highlights?

    1.1% 8.5% 29.8% 33.0% 27.7%

    12. Would you include Term-Based Summariser in a translators workbench?

    5.3% 12.8% 26.6% 33.0% 22.3%

    Table 12. Results from the electronic survey.

  • Abstract


    The answers reflect the translators opinions and it is worth

    mentioning that many of them would include a Term-Based

    Summariser in a translators workbench, and that the top terms list is a

    very useful terminological tool.


    6.1. Conclusions

    The main contribution of this thesis is the innovative

    combination of Computational Linguistics and Translation Studies,

    i.e., automatic summarization and specialized translation. We will

    further discuss this now that the 10 goals set out in the aim section

    have been achieved.

    Goal 1 was to review the major work in translation

    technologies and in human automatic summarization. This goal was

    completed in both Chapters 1 and 2. In Chapter 1 the main translation

    technologies were reviewed, including Natural Language Generation,

    Corpus Linguistics, Machine Translation and Information Retrieval.

    In Chapter 2 the most relevant approaches to human and automatic

    summarization are presented, although the main emphasis is on

    term-based summarisation.

    Goal 2 was to emphasise the relevance of documentation as a

    cornerstone in specialized translation. This goal was completed in

    Chapter 1 where Documentation as Science is approached, in order to

    focus on its importance for Translation Studies.

  • M. Cristina Toledo Bez


    Goal 3 was to build a representative multilingual comparable

    corpus of parallel texts from research articles on electronic commerce

    in three languages (Spanish, English and French). This goal was

    completed in Chapter 2, where the process of selection and

    compilation of texts is described as well as the final result. All the

    details of the corpus are specified in Chapter 2.

    Both goals 4 and 5 share some points. Goal 4 was to focus on

    the emerging legal-technological discourse from the Information

    Technology Law and Data Protection. This goal was completed in

    Chapter 3. First, we established the difference between two important

    dichotomies: general/specialized language and word/term. Secondly,

    we studied the new legal-technological discourse explaining its

    innovative terminological appellation, describing its main features

    according to the Information Technology Law and pointing out its

    relation with Data Protection. Goal 5 was to compare the

    legal-technological discourse features in three languages, i.e., Spanish,

    English and French and this goal was also completed in Chapter 3,

    where a contrastive analysis of the legal-technological discourse in

    source texts is carried out in the three languages. Common features are

    pointed out in order to reflect the similarities of the legal-

    technological discourse.

    Goal 6 was to study the research article as a textual genre. This

    goal was completed in Chapter 4, where the notion of textual genre is

    analysed and then it is applied to the research article. We distinguish it

    from other similar concepts such as text type or register and its main

    features and structures are presented. The most common IMRD

    structure is defined and described.

  • Abstract


    Goal 7 was to test whether IMRD structure of English

    scientific articles may be valid to articles both on Legal Sciences as

    well as in the Romance languages of Spanish and French. This goal

    was completed in Chapter 4, beginning with the detailed analysis of

    the IMRD structure and then the analysis of the texts from the

    comparable corpus appearing in Term-Based Summariser.

    Consequently, a contrastive analysis in the three languages (Spanish,

    English and French) is carried out, and we tested that the IMRD

    structure is also used for Legal Sciences and Romance languages, but

    with some important changes.

    Goal 8 was to establish evaluation parameters combining both

    analytic and holistic evaluation in order to find objective criteria in

    Translation Studies. This goal was completed in Chapter 5 where a

    review of major work on evaluation is provided and then our own

    evaluation parameters are detailed. Such parameters encompass both

    analytic or error evaluation as well as global or holistic evaluation.

    These evaluation parameters have been used for the translation


    Goal 9 was to carry out experiments with semi-professional

    translators. This goal was completed in Chapter 5 with the description

    of the experiments in which 95 semi-professional translators from the

    University of Mlaga took part. The final results were 379 pieces of

    translation in both direct and inverse translation with four

    combinations: English-Spanish (direct translation), Spanish-English

    (inverse translation), French-Spanish (direct translation) and

    Spanish-French (inverse translation). The translators translated two

    parts without Term-Based Summariser and two parts with

    Term-Based Summariser and then results were compared with the

  • M. Cristina Toledo Bez


    three criteria described in goal 9: quality criteria, lexical richness

    criteria and number of words criteria. The quality criteria are

    concerned with the evaluation parameters and imply the evaluation of

    the translations by semi-professional translator in terms of analytic

    and holistic evaluation. The teaching software Markin is used to

    evaluate the translations. The lexical richness criteria are carried out

    with WordList in Wordsmith and it provides information about the

    token/type ratio in a text. The number of words criteria compares the

    results in translations with Term-Based Summariser with the

    translations without Term-Based Summariser.

    Goal 10 was to analyse translators impressions and opinions

    regarding the use of Term-Based Summariser by means of a survey.

    This goal is achieved in Chapter 5 with the detailed description of the

    questions in the survey and the comments in regards to the qualitative


    Regarding our triple hypothesis, we have proved with

    empirical studies as well as qualitative and quantitative results that

    automatic summarization enhances specialized translation in three

    languages (Spanish, English and French) and in direct and inverse

    combinations, although with better results for direct translation, and,

    consequently, we consider that a term-based automatic summarization

    should be part of an innovative translator's workbench.

  • Abstract


    6.2. Future work

    During this research a series of possible future directions have

    emerged. They are briefly discussed in this section.

    The main direction is that the empirical study carried out in this

    thesis with semi-professional translators should be carried out again

    but this time with professional translators. The results would be good

    indicators of the advantages of Term-Based Summariser as

    terminological and informational resource.

    Another line of research related to this is to learn whether the

    findings of this research are valid for other discourses and for other

    genres. We have focused on a very specific domain

    (legal-technological discourse) and genre (research articles), but it

    would be of particular interest to apply Term-Based Summariser to

    other domains and genres in order to find out whether similar results

    to the ones reported here can be obtained.

    Furthermore, a possible extension of this work is to analyse the

    results with other statistical methods such as the students t-test or the

    chi-square test. Finally, in the future it would also be interesting to

    repeat the same study with more complex and representative corpora

    in order to extrapolate the results. All these future directions will be

    developed in the current research project Ecosistema: espacio nico

    de sistemas de informacin ontolgica y tesauros sobre el medio

    ambiente (FFI2008-06080-C03-03/FILO; 2008-2011), directed by Dr.

    Corpas Pastor and Dr. Faber. The possible merging of terminology,

    ontology, and automatic summarization constitutes a fascinating field

    to be explored.



    Desearamos comenzar este trabajo que presentamos como tesis

    acadmica para la obtencin del grado de doctor exponiendo en

    primer lugar las razones que han motivado a su desarrollo, as como el

    marco investigador en el que se inscribe.

    Gracias a una beca de postgrado del Programa de Formacin

    del Profesorado Universitario (FPU)1, concedida por el Ministerio de

    Educacin y Ciencia2 en 2006, nos incorporamos como miembro

    1 La referencia de la beca es AP2005-2792 y la resolucin aparece publicada en el Boletn Oficial del Estado de 21 de abril de 2006. 2 Este Ministerio ha recibido distintas denominaciones desde la concesin de la beca. En 2006 se denomin Ministerio de Educacin y Ciencia hasta el ao 2008, cuando la competencia de la formacin de becarios de investigacin pas al Ministerio de Ciencia e Innovacin. Sin embargo, en 2009 es de nuevo el Ministerio de Educacin el encargado de convocar y gestionar las becas FPU.

  • M. Cristina Toledo Bez


    investigador a, por un lado, el grupo de investigacin de excelencia

    Lexicografa y Traduccin3 (HUM-106) y, por otro, a dos proyectos

    de I+D, dirigidos, al igual que el grupo de investigacin, por la Dra.

    D. Gloria Corpas Pastor: por una parte, el proyecto nacional

    TURICOR: Compilacin de un corpus de contratos tursticos

    (alemn, espaol, ingls, italiano) para la generacin textual

    multilinge y la traduccin jurdica)4 (Ministerio de Ciencia y

    Tecnologa, BFF2003-04616, 2003-2006) y, por otra, al recin

    concedido, por esa fecha, proyecto de excelencia de la Junta de

    Andaluca La contratacin turstica electrnica multilinge como

    mediacin intercultural: aspectos legales, traductolgicos y

    terminolgicos5 (Direccin General de Investigacin, Tecnologa y

    Empresa, HUM-892, 2006-2009).

    En este marco investigador tan propicio y gracias adems a una

    estancia en Dickinson College (Pensilvania, Estados Unidos) como

    lectora de espaol que nos permiti acceder a numerosos artculos,

    libros y material de referencia, presentamos en 2006 nuestro trabajo

    de investigacin de segundo ao de doctorado6, el cual se titul

    Aproximacin a la generacin automtica multilinge de resmenes.

    3 La URL del grupo HUM-106, en la que aparecen detallados los miembros, las actividades I+D, las publicaciones y los datos de contacto es la siguiente: . La versin inglesa de esta misma pgina web est disponible en . Todas las direcciones URL citadas en el presente trabajo se encontraban operativas a fecha de 20 de junio de 2009. 4 La URL del proyecto Turicor es la siguiente: . 5 Ms informacin sobre el proyecto en . 6 Nos referimos al programa de doctorado Estudios de Traduccin: Investigacin en Traduccin e Interpretacin especializadas (bienio 2004-2006) del Departamento de Traduccin e Interpretacin de la Universidad de Mlaga, al cual se le otorg adems la mencin de calidad por parte del Ministerio de Educacin y Ciencia.

  • Introduccin


    Dicho trabajo, que constituye el punto de partida para la presente

    investigacin, se centr en el estudio y comparacin de diversos

    programas de resumen automtico en lnea, gratuitos y multilinges en

    aras de mostrar su utilidad en la labor documental del traductor

    profesional, tanto en la fase semasiolgica de comprensin del sentido

    del texto, como en la onomasiolgica. Como textos objeto de estudio,

    hemos de apuntar que, a partir del macrocorpus Turicor, se compil

    un subcorpus compuesto por 22 condiciones generales de crucero en

    espaol y 27 en ingls una serie de documentos con las condiciones

    generales de los contratos de viaje combinado, especficamente de la

    modalidad de crucero, en las dos lenguas seleccionadas, esto es, ingls

    y espaol. El proyecto de investigacin nos permiti descubrir las

    mltiples facetas que el resumen automtico como aplicacin de la

    Lingstica Computacional poda ofrecer a los Estudios de

    Traduccin, de ah que marcsemos esa lnea de investigacin como

    columna vertebral de la presente tesis doctoral.

    El germen investigador que naci con el proyecto Turicor

    como marco ha madurado y crecido en la presente tesis doctoral

    gracias a una doble motivacin. Por un lado, el trabajo en el seno del

    proyecto HUM-892, donde nos incorporamos a las secciones de

    espaol, ingls y francs con la finalidad de buscar recursos

    electrnicos especficos sobre contratacin electrnica y comercio

    electrnico. De este modo, nos familiarizamos con el discurso jurdico

    y tecnolgico as como con los numerosos artculos de investigacin

    que versaban sobre esta materia, acotando as el dominio de

    especialidad y el gnero textual analizados en esta tesis. Asimismo, en

    lo que concierne al discurso jurdico, hemos bebido de las fuentes y

    enseanzas ofrecidas en el curso de postgrado titulado Especialista en

    traduccin jurdica ingls-espaol, organizado por el Departamento

  • M. Cristina Toledo Bez


    de Filologa Inglesa de la Universidad de Alicante y que realizamos

    durante el curso acadmico 2006-2007. Sin duda, dicho curso afianz

    nuestros conocimientos previos sobre traduccin jurdica y nos

    permiti especializarnos en este tipo de traduccin para as

    enfrentarnos a la tesis doctoral con ms eficacia y pericia.

    El otro elemento motivador, crucial para nuestros fines

    investigadores y para la mencin de Doctorado europeo de la

    presente tesis, fue la estancia de investigacin de tres meses de

    duracin en 2007 con el grupo Research Group in Computational

    Linguistics, dirigido por el Dr. D. Ruslan Mitkov y perteneciente al

    Research Institute in Information and Language Processing de la

    Universidad de Wolverhampton (Reino Unido), la cual nos permiti

    profundizar en el tema abordado en nuestro trabajo de investigacin,

    ya que consultamos numerosa bibliografa en torno al resumen

    automtico ofrecida de primera mano por los investigadores de ms

    renombre y, adems, pudimos conocer y experimentar las tcnicas

    ms novedosas a travs del programa Computer-Assisted

    Summarization Tool (CAST), desarrollado por el Dr. D. Constantin

    Orsan, y que constituye la piedra angular de la presente tesis doctoral

    al tratarse del recurso documental y terminolgico empleado y

    adaptado a nuestras necesidades investigadoras.

    Con esta motivacin como teln de fondo, explicamos a

    continuacin los objetivos planteados en nuestra investigacin.

  • Introduccin



    Nuestra hiptesis de partida es que el resumen automtico

    como recurso documental facilita la traduccin de textos

    especializados en ambos sentidos (directa e inversa). Sin embargo,

    como advierte Tymoczko (2002: 16-17), en los Estudios de

    Traduccin no se suele partir de una nica hiptesis, sino ms bien de

    una serie de hiptesis. A este respecto nuestro estudio no va a ser una

    excepcin y, por ello, partiremos de una triple hiptesis7 de trabajo:

    I) La investigacin en torno a la combinacin de la

    Lingstica Computacional y los Estudios de Traduccin

    precisa de estudios empricos y extrapolables que prueben

    su eficacia.

    II) La traduccin de textos especializados, en este caso de

    artculos de investigacin del mbito jurdico-tecnolgico,

    tanto en ingls como en francs y tanto directa como

    inversa se ve agilizada con la consulta a un programa de

    resumen automtico basado en trminos.

    III) El resumen automtico surge como recurso documental

    innovador y fiable que podra formar parte de una futura

    estacin de trabajo del traductor.

    7 En el DRAE aparece definida como la que se establece provisionalmente como base de una investigacin que puede confirmar o negar la validez de aqulla.

  • M. Cristina Toledo Bez


    De nuestra hiptesis tripartita se derivan los siguientes

    objetivos generales (1-2) y especficos (3-10):

    1. Realizar un excurso por la investigacin en tecnologas de

    la traduccin en aras de establecer un marco para nuestra


    2. Enfatizar la importancia de la documentacin como pilar

    fundamental en la traduccin de textos especializados.

    3. Compilar un corpus virtual, comparable y representativo de

    textos paralelos de artculos de investigacin que versen

    sobre contratacin electrnica en tres lenguas (espaol,

    ingls y francs).

    4. Estudiar la imbricacin de discurso jurdico con el discurso

    tecnolgico en el campo de la contratacin electrnica y de

    la proteccin de datos personales.

    5. Contrastar, mediante un corpus comparable, multilinge y

    representativo, las caractersticas del discurso

    jurdico-tecnolgico en las lenguas espaol, ingls y


    6. Abordar el artculo de investigacin como gnero textual

    propio con caractersticas definidas y claras.

    7. Cotejar mediante un corpus comparable y multilinge si la

    estructura tpica del artculo de investigacin escrito en

    lengua inglesa en el campo de las ciencias se extrapola: por

    un lado, a las Ciencias Jurdicas; y, por otro, a las lenguas

    romances espaola y francesa.

  • Introduccin


    8. Establecer una plantilla de evaluacin propia que combine

    la evaluacin analtica y la holstica en aras de lograr unos

    criterios claros y definidos.

    9. Llevar a cabo un estudio con un amplio nmero de

Top Related