Improving Translation of Unknown Proper Names Using a Hybrid Web-based Translation Extraction Method
Department of Computer Science and Information Engineering
National Cheng Kung University, Taiwan, R.O.C.
{foreverdream, jhlin, scottyu}@csie.ncku.edu.tw, whlu@mail.ncku.edu.tw
Abstract
Recently, we have proposed several effective Web-based term translation extraction methods
exploring Web resources to deal with translation of Web query terms. However, many
unknown proper names in Web queries are still difficult to be translated by using our previous
Web-based term translation extraction methods. Therefore, in this paper we propose a new
hybrid translation extraction method, which combines our pervious Web-based term translation
extraction method and a new Web-based transliteration method in order to improve translation
of unknown proper names. In addition, to efficiently construct a good quality transliteration
model, we also present a mixed-syllable-mapping transliteration model and a Web-based
semi-supervised learning algorithm to explore search-result pages further for collecting large
amounts of English-Chinese transliteration pairs from the Web. Introduction
In machine translation (MT) (Brown et al. 1993) or cross-language information retrieval (CLIR) (Jaleel
and Larkey 2003; Pirkola et al. 2003), unknown term translation are still problematic and remain to be
solved. Conventionally, most of the existing MT or CLIR systems rely mainly on general-purpose
bilingual dictionaries, which usually lack translations of proper names or technical terms, and thus are
unable to deal with such problems. We have proposed an effective Web-based approach to exploring
abundant language-mixed texts on the Web like anchor texts and search-result pages for alleviating the
difficulty of unknown query term translation (Lu et al. 2002, 2004; Cheng et al. 2004). However, the
approach employing statistical techniques still suffers from the problem of data sparseness and indirect
association errors in finding translations of low-frequency unknown terms (Melamed, 2000).
According to the report in previous research (Davis et al. 1998), around 50% of unknown terms are
proper names. To improve translation of unknown proper names, in this paper, we propose a hybrid
translation extraction method, which is composed of our pervious search-result-based term translation
extraction method (Section 3.2) and a new Web-based transliteration method (Section 3.3).
Transliteration is the process that converting a sequence of substrings or characters in the source
language (e.g., English) into a pronunciation-approximate substring/character sequence in the target
language (e.g., Chinese). Many researchers have proposed phoneme-based mapping techniques for
proper name transliteration (Jung et al. 2000; Knight & Graehl 1998; Lin & Chen 2002; Meng et al.
2001; Virga & Khudanpur 2003), but converting an English word from phonemic representation to
Chinese Pinyin and from Pinyin to Chinese characters may cause double errors. Taking this problem
into consideration, we thus try to adopt direct orthographical mapping for proper name transliteration
and propose a simple mixed-syllable-mapping transliteration model which can effectively increase the
correct mapping between an English-Chinese transliteration pair with different number of
transliteration unit (syllable), such as “
Ericsson”(易利信) with four English transliteration units
sson”and three Chinese transliteration units “
Additionally, to train a good quality transliteration model which is used to filter out impossible
transliteration candidates in the process of extracting translation of unknown proper names, we also
present a Web-based semi-supervised learning algorithm to collect large amounts of English-Chinese
transliteration pairs from the Web. Experimental results show that our new approach can make
improvements for translation of unknown proper names. Related Work Parallel-Corpus-based Term Translation Extraction
Term translation extraction is a significant research topic in the field of machine translation. A number
of related researches (Gale and Church 1991; Kupiec 1993; Melamed 2000; Smadja et al. 1996) have
used sentence-aligned parallel corpora to extract translations since the advent of statistical translation
model (Brown et al. 1990, 1993). For example, Melamed (2000) proposed statistical translation models
to improve the techniques of word alignment by taking advantage of pre-existing knowledge and
overcome the problems of indirect association errors, i.e., erroneous translational correspondence arose
from highly co-occurred relevant terms. Although high accuracy of translation extraction can be easily
achieved by these techniques, sufficiently large parallel corpora for various subject domains and
language pairs currently are not always available. Comparable-Corpus-based Term Translation Extraction
However, less attention has been devoted to automatic extraction of term translations from comparable
or even unrelated texts, since such methods encountered more difficulties due to lacking parallel
correlation aligned between documents or sentence pairs. Rapp (1999) proposed an approach to
utilizing non-parallel corpora based on the assumption that the contexts of a term should be similar to
the contexts of its translation in any language pairs. Fung et al. (1998) also proposed a similar approach
that uses vector-space model and takes a bilingual lexicon (called seed words) as feature set to estimate
the similarity between a word and its translation candidates. These works are important for automatic
extraction of new terminology and unknown proper names in diverse domains. It is a pity that
comparable corpora are easier to obtain, however, how to achieve better performance for higher
translation coverage is still a challenging task. Web-based Term Translation Extraction
The Web is becoming the largest data repository in the world, which consists of huge amounts of
multilingual and wide-scoped hypertext resources. A number of studies have been concentrated in the
use of the Web to complement insufficient corpora (Cao & Li 2002; Kilgarriff et al. 2003). How to
utilize the Web resources to benefit translations of unknown terms is worthy to investigate.
As mentioned above, the conventional term translation methods suffer from the problems of the lack
of large-size parallel corpora and the shortage of translation coverage of comparable corpora in medical
domain. Thus, we have proposed several Web-based methods to effectively deal with translation of
frequent Web query terms by exploring Web anchor text and search-result pages. Although the
anchor-text-based approach has been proven effective in extracting multilingual translations (Lu et al.
2002, 2004), it requires crawling the Web to gather sufficient training data as well as more network
bandwidth and storage. For the reason to reduce such costs, this paper only adopts the
search-result-based approach to extract translation candidates for term translation (describes in Section
3.2). However, many proper names are still difficult to be translated correctly using the
search-result-based approach. Therefore, in this paper we intend to further explore search results to
Chinese transliteration pairs, and build a good quality transliteration model which can be
used to filtered out impossible translation candidates to improve translation of unknown proper names. Proper Name Transliteration
For name transliteration between Latin-alphabet languages and some Asian languages with different
writing forms, such as English and Chinese, researchers have proposed phoneme-based mapping
techniques (Jung et al. 2000; Knight & Graehl 1998; Lin & Chen 2002; Meng et al. 2001; Virga &
Khudanpur 2003). Knight and Graehl used an English-katakana dictionary, katakana-English phoneme
mapping, and the CMU Speech Pronunciation Dictionary to deal with transliteration between English
words and Katakana sequences. Lin et al. (2003) proposed a statistical transliteration model and apply
the model to extract proper names and their transliterations in a parallel corpus with high average
precision and recall rates. However, Li et al. (2004) have pointed out that the transliteration precision
of the phoneme-based approaches could be limited by two main constraints. First, Latin-alphabet
foreign names from different origins have different phonic rules (Pirkola et al. 2003), such as French
and English. Second, converting English words to Chinese characters will need two steps: converting
from phonemic representation to Chinese Pinyin and from Pinyin to Chinese characters. Two cascaded
converting steps may cause double errors. Taking this problem into consideration, we try to adopt
direct orthographical mapping for name transliteration (described in Section 3.3). Extracting Translation of Unknown Proper Names Problem and Challenge
Actually, search-result page is a good resource for extracting translation of frequent unknown query
terms. However, a number of unknown proper names are still not extracted correctly due to the
problems of data sparseness. Thus, our idea is to integrate name transliteration techniques into the
process of extracting translation of proper names in order to filter impossible transliterated candidates
for improving the performance of translation extraction. To deal with the problem, first we need to
extract terms from the search-result pages as translation candidates, and then filter out impossible
candidates based on the name transliteration model. In fact, it is challenging to build a good quality
transliteration model while lacking sufficient transliteration pairs for training. We therefore propose a
Web-based semi-supervised learning algorithm to collect large amounts of English-Chinese
transliteration pairs from the Web (see Section 3.3). Extracting Translation Candidates Using a Search-Result-based Translation Extraction
We have proposed an effective search-result-based method to explore language-mixed search-result pages
and utilize co-occurrence relation and context information for extracting unknown query term translation.
In this section, we will simply describe candidate selection methods using the search-result-based
method. For more details, please refer to our previous work (Cheng et al. 2004). (1) Chi-Square Test Method: On the basis of co-occurrence analysis, chi-s
to estimate semantic similarity between the source term E and the target candidate C. The similarity
where a, b, c and d are the numbers of pages retrieving from search engines by submitting Boolean
number of pages, i.e., N = a + b + c + d. (2) Context-Vector Analysis Method: Due to the nature of Chinese-English mixed texts often
appearing in Chinese pages, the source term E and the target candidate C may share common
contextual terms in the search-result pages (Fung & Yee 1998; Rapp 1999). The similarity between E
and C will be computed based on their context feature vectors in the vector-space model. The
conventional tf-idf weighting scheme is used and defined as
where f(ti, p) is the frequency of term ti in search-result page p, N is the total number of Web pages, and
n is the number of the pages containing ti. Finally, we use the cosine measure to estimate the similarity
Filtering Translation Candidates Using a Web-Based Name Transliteration Method (1) English Letter Substring Segmentation: Wan and Verspoor (1998) have developed a fully
rule-based algorithm to transliterate English proper names into Chinese names. We simplify their
syllabification techniques to generate a few simple heuristic rules of segmenting an English name into
letter substrings. Each English substring is regarded as a transliteration unit (TU) in this paper and had
at most one corresponding character of the Chinese transliterated name. Initially, we used only five
a, e, i, o, u are vowels, and y is also regarded as a vowel if it appears behind a consonant. All
Separate two consecutive vowels except the following cases: ai, au, ee, ea, ie, oa, oo, ou, etc. Separate two consecutive consonants except the following cases: bh, ch, gh, ph, th, wh, ck, cz,
l, m, n, r are combined with the left vowel only if they are not followed by a vowel. A consonant and a following vowel are regarded as a TU.
segmented incorrectly, but it is easy to manually add new rules for improving English letter substring
(2) Mixed-Syllable-Mapping Transliteration Model: To avoid double errors of converting
English phonemic representation to Chinese Pinyin and from Pinyin to Chinese characters, we thus
adopted direct orthographical mapping to deal with the alignment between any English name, E =
em, and its Chinese transliterated name, C = c1c2
cn. Each English TU ei is mapped to a Chinese
character ci with the probability P(ci | ei). Initially, to efficiently train a Web-based transliteration model
based on the collected transliteration pairs from the Web for filtering out impossible transliteration
candidates, we adopt a simple name transliteration model called forward-syllable-mapping transliteration model, which computes the forward syllable mapping score between E and C using the
(E, C) P(C | E)
where is the smoothing weight.
For an English-Chinese transliteration pair with different number of transliteration unit, such as
Rusedski”(魯塞斯基) with the five English segmented substrings “
, to increase the correct mapping between English TUs and
reverse-syllable-mapping transliteration model, which is used to compute the reverse syllable
To cover all possibly correct mapping between English TUs and Chinese transliterated characters for
the distinct types of English-Chinese transliteration pairs with the same or different transliteration units,
we propose a simple mixed-syllable-mapping transliteration model, which combine the
forward-syllable-mapping and reverse-syllable-mapping transliteration models, to estimate the
(3) Web-based Semi-Supervised Learning Algorithm: We intend to take advantages of abundant
language-mixed texts on the Web to collect English-Chinese transliteration pairs and then train a good
quality transliteration model. Thus, we design a semi-supervised learning process of transliteration
mapping. The process is composed of three main stages: extraction of Chinese transliterated names,
extraction of English original names, and learning of transliteration mapping, and described below as
Extraction of Chinese Transliterated Names: Xiao et al. (2002) have proposed a
bootstrapping algorithm that uses only five frequent Chinese transliterated characters as initial
seed character set: {阿, 爾, 巴, 斯, 基} to automatically collect over 100,000 of Chinese
transliterated names by utilizing search-result pages. Inspired by Xiao et al., we design a
different bootstrapping algorithm which uses the same seed character set to automatically find
large amounts of Chinese transliterated names from search-result pages. Initially, we select
two frequent Chinese transliterated characters from the seed character set, and then send them
to search engines for getting search-results pages. To efficiently extract more Chinese
transliterated names from the search-result pages, we use the CKIP tagger (Ma & Chen 2003),
which is a representative Chinese POS tagger with the ability of segmenting Chinese texts into
meaningful words and extracting unknown words. Extraction of English Original Names: We first use the search-result-based translation
extraction method (Section 3.2) to find possible candidates of English original names, and then
filter out the impossible candidates which are included in general-purpose bilingual
dictionaries. Finally, to collect English-Chinese transliteration name pairs with high quality,
we may need to take some manual efforts to examine the correct transliteration pairs. Web-based Semi-Supervised Learning Algorithm for Collecting English-Chinese Transliteration Pairs and Training a Transliteration Model
Chinese seed character set Cs and a general-purpose bilingual dictionary D
English-Chinese transliteration pair set Vec, and a transliteration model TExtraction of Chinese transliterated names: 1.1. Seed character selection: select two frequent characters from the Chinese Search-result crawling: send the two selected characters to a search Chinese transliterated name identification: use CKIP tagger to find
unknown terms in the search-result pages, and then take the unknownterms containing the two Chinese seed characters as potential Chinesetransliterated names and add them into Vc . Seed character set updating: update Cs by adding the new characters
from the new Chinese transliterated names. Repeat step1 until the desired number of the Chinese transliterated name Extraction of English original names: for each potential Chinese transliterated name in Vc, perform the following sub-steps: 2.1. Potential English name extraction: use search-result-based translation
extraction method (Section 3.2) to find potential candidates of Englishname. Candidate filtering: filter out impossible English name candidates English name identification: take some manual efforts to examine the English-Chinese transliteration pair updating: update Vec by adding the Learning of English-Chinese transliteration mapping: use the proposed mixed-syllable-mapping transliteration model (equation (6)) to train a Web-based transliteration model T based on the extracted English-Chinese transliteration pairs.
Figure 1. Algorithm for collecting transliteration pairs and training a transliterationmodel. Learning of Transliteration Mapping: On the basis of the English letter substring
segmentation rules and the proposed mixed-syllable-mapping transliteration model described
above, we will train a Web-based transliteration model based on the collected transliteration
The Proposed Approach to Translation Extraction
Currently, for some unknown proper names, it is still difficult to effectively extract translation by using
our previous search-result-based translation extraction method. Therefore, we try to combine a new
Web-based transliteration method to enhance our previous search-result-based translation extraction
(1) Linear Combination Method: Intuitively, a simple method is to directly combine the above
three different methods: the chi-square test method, the context-vector analysis method, and the
Web-based transliteration method. Under consideration of the large difference of ranges of similarity
values among the above methods, we would use a linear combination of inverse ranks to compute the
where m is an assigned weight for each similarity measure Sm, and Rm(E, C) represents the similarity
rank of each target candidate C with respect to its source term E and is assigned to be from 1 to k
(candidate number) according to similarity measure Sm(E, C) in decreasing order.
Note that this liner combination method is only used as baseline in comparison with our proposed
hybrid translation extraction method described below in the following experiments (Section 4.2). (2) Hybrid Method: For some unknown proper names, the simple linear combination method might
not make good improvements while these respective methods can’
correct transliteration candidates. Therefore, we propose a new hybrid translation extraction method in
order to obtain better performance. First, we use the search-result-based translation extraction method
described above to extract k (k = 20) terms with high similarity score as transliteration candidates.
Second, some impossible candidates included in general-purpose bilingual dictionaries are filtered out,
and then each of the rest transliterated candidates is ranked according to transliteration mapping score
with the test proper name which is computed based on the Web-based transliteration model (Equation
Experimental Results and Analysis
We conducted the following experiments to examine the performance of the proposed hybrid translation
extraction method and the comparison with the simple linear combination method. Particularly, the focus
of the experiments is mainly emphasized on the effectiveness of translations of unknown proper names
using the proposed mixed-syllable-mapping transliteration model and hybrid translation extraction
Collected data: Initially, our proposed Web-based semi-supervised learning algorithm is employed
to efficiently collect about 11,000 English-Chinese transliteration pairs for training a transliteration
Test set: We constructed one test set of unknown English query terms, NTCIR proper name set,
which contains 22 unknown transliteration names from a total of 100 NTCIR2 and NTCIR3 title queries
that contain 175 and 183 unique query terms respectively (Chen & Chen 2001). Evaluation Metric: The average top-n inclusion rate was adopted as a metric on the extraction of
translation equivalents. For a set of terms to be translated, its top-n inclusion rate was defined as the
percentage of the terms whose translations could be found in the first n extracted translations (Cheng et
Comparison of translation results between the forward-syllable-mapping model and the
Comparison of translation results between the different translation methods.
Mixed-Syllable-Mapping Transliteration Model
Effective results of translation extraction using the hybrid translation extraction method
(underlined terms indicate correct translation).
麥可布雷,麥克傑克森,蜜可艾
爾,施文彬,華納
麥可傑克森,麥可布雷,麥克傑 麥可布雷,麥可傑克森,
Mixed-Syllable-Mapping Transliteration Model vs. Forward-Syllable-Mapping Transliteration Model
To test the effectiveness of the mixed-syllable-mapping transliteration model, we carried out a
comparative experiment with different ranking. The results are shown in Table 1. Actually, the
mixed-syllable-mapping transliteration model is effective to improve the top-n inclusion rate. For
translation extraction of the NTCIR proper names, the mixed-syllable-mapping transliteration model
can achieve 27% and 45% top-1 inclusion rates for the name transliteration method and the hybrid
translation method, respectively. Obviously, the reason is that for many English-Chinese transliteration
pairs with different number of TU, reverse-syllable-mapping transliteration model can aid in learning
correct mapping between English substrings and Chinese characters. Additionally, the model has the
same assist effect to many partially matching transliteration pairs collected by using our proposed
better rank of its correct translation can be obtained by using the mixed-syllable-mapping
Hybrid Translation Extraction Method vs. Linear Combination Method
To determine the effectiveness of the proposed hybrid translation extraction method compared with
other methods, we also did several comparative experiments with different ranking. The results are also
shown in Table 2. For the NTCIR test set, surprisingly, the hybrid translation extraction method made a
great improvement compared with the search-result-based translation extraction method, name
transliteration method, or linear combination method. The hybrid translation extraction method with
mixed-syllable-mapping transliteration model can achieve 45% top-1 inclusion rate. The main reason is
that most of the incorrect translation candidates extracted by using the search-result-based translation
extraction method can be filtered out by using the Web-based transliteration method. For example,
be ranked to the top one from the fourth rank using only the search-result-based translation extraction
method. However, the simple linear combination method seems not effective to improve translation
performance since the name transliteration method is still limited in generating correct transliterated
candidates even though it can generate many pronunciation-proximate candidates. Discussions
Our proposed mixed-syllable-mapping model and hybrid translation extraction method is effective to
improve performance in extracting translation of unknown proper names. However, the hybrid
translation extraction method sometimes performs not good as linear combination method. An example
威而剛) is shown in Table 4. Currently, our Web-based semi-supervised learning
algorithm is limited by insufficient transliteration training from our collected transliteration pairs which
are still in the need of examining by large amounts of manual labor. In the future, we will develop an
unsupervised learning algorithm to automatically collect much more amounts of English-Chinese
Ineffective results of translation extraction using the hybrid translation extraction method
(underlined terms indicate correct translation).
偉哥,食品藥物,威而剛,藥物管理局,藥物
薇阿格拉,薇亞格拉,薇艾格拉,薇阿葛拉,薇亞葛拉
偉哥,食品藥物,薇阿格拉,威而剛,藥物管理局
萬艾可,藥物管理,食品管理,輝瑞,威而剛
transliteration pairs from the Web for training good quality transliteration model. Besides them, there
are still a number of cases that are still difficult to deal with by using the simple
mixed-syllable-mapping transliteration model and need to be further investigated in the future. Conclusions
We have presented a new hybrid translation extraction method that works well for improving extraction
of translation of known proper names by effectively combining a previous search-result-based
translation extraction method and our proposed Web-based name transliteration method. Additionally,
our proposed simple mixed-syllable-mapping transliteration model and Web-based semi-supervised
learning algorithm are also effective to collect English-Chinese transliteration pairs and then train a
transliteration model for filtering out incorrect transliteration candidates in the process of extracting
References
N. A. Jaleel and L. S. Larkey. 2003. Statistical transliteration for English-Arabic cross language
information retrieval. CIKM 2003: 139-146.
P. F. Brown, J. Cocke, S. A. D. Pietra, V. J. D. Pietra, F. Jelinek, J. D. Lafferty, R. L. Mercer, and P. S.
Roossin. 1990. A statistical approach to machine translation. Computational Linguistics,
P. F. Brown, , S. A. D. Pietra, V. D. J. Pietra and R. L. Mercer. 1993. The Mathematics of Machine
Translation. Computational Linguistics, 19(2): 263-312.
Y.-B. Cao and H. Li. 2002. Base noun phrase translation using Web data and the EM algorithm. In
Proc. of COLING 2002: 127-133.
K.-H. Chen and H.-H. Chen. 2001. The Chinese Text Retrieval Tasks of NTCIR Workshop 2. In Proc.of the Second NTCIR Workshop Meeting on Evaluation of Chinese & Japanese Text Retrieval and
P.-J. Cheng, Y.-C. Pan, W.-H. Lu, L.-F. Chien. 2004. Creating Multilingual Translation Lexicons with
Regional Variations Using Web Corpora. In Proc. of ACL 2004: 535-542.
M. W. Davis and W. C. Ogden. 1998. Free Resources and Advanced Alignment for Cross-Language
Text Retrieval. In Proc. of the Sixth Text Retrieval Conference (TREC6): 385-394.
P. Fung and L.-Y. Yee. 1998. An IR approach for translating new words from nonparallel, comparable
texts. In Proc. of ACL 1998: 414-420.
W. A. Gale and K. W. Church. 1991. Identifying Word Correspondances in Parallel Texts, In Proc. of
DARPA Speech and Natural Language Workshop.
W. Gao, K.-F. Wong and W. Lam. 2004. Phoneme-based Transliteration of Foreign Names for OOV
Problem. In Proc. of IJCNLP 2004: 274-381.
J. Halpern. 2000. Lexicon-based orthographic disambiguation in CJK intelligent information retrieval.
In Proc. of Workshop on Asian Language Resources and International Standardization.
S. Y. Jung, S. L. Hong and E. Paek. 2000. An English to Korean Transliteration Model of Extended
Markov Window. In Proc. of COLING 2000.
A. Kilgarriff and G. Grefenstette. 2003. Introduction to the special issue on the web as corpus. Computational Linguistics 29(3): 333-348.
K. Knight and J. Graehl. 1998. Machine Transliteration, Computational Linguistics 24(4): 599-612.
J. M. Kupiec. 1993. An algorithm for finding noun phrase correspondences in bilingual corpora. In
H. Li, M. Zhang and J. Su. 2004. A Joint Source-Channel Model for Machine Transliteration. In Proc.
T. Lin, C.-C. Wu, J.-S. Chang. 2003. Word-Transliteration Alignment, In Proc. of ROCLING XV, 1-16.
W.-H. Lin and H.-H. Chen. 2002. Backward machine transliteration by learning phonetic similarity. In
Proc. of CONLL 2002: 139-145.
W.-H. Lu., L.-F. Chien and H.-J. Lee. 2002. Translation of Web Queries using Anchor Text Mining,
ACM Transactions on Asian Language Information Processing (TALIP), 159-172.
W.-H. Lu., L.-F. Chien and H.-J. Lee. 2004. Anchor Text Mining for Translation of Web Queries: A
Transitive Translation Approach. ACM Transactions on Information Systems 22(2): 242-269.
W.-Y. Ma and K.-J. Chen. 2003. A Bottom-up Merging Algorithm for Chinese Unknown Word
Extraction, In Proc. of ACL workshop on Chinese Language Processing 2003: 31-38.
I. D. Melamed. 2000. Models of translational equivalence among words. Computational Linguistics,
H. Meng, W.-K. Lo, B. Chen and K. Tang. 2001. Generate Phonetic Cognates to Handle NameEntities in English-Chinese Cross-Language Spoken Document Retrieval, ASRU 2001.
A. Pirkola, J. Toivonen, H. Keskustalo, K. Visala and K. Jarvelin. 2003. Fuzzy Translation of
Cross-Lingual Spelling Variants, In Proc. of SIGIR 2003: 345-352.
Y. Qu and G. Grefenstette. 2004. Finding Ideographic Representations of Japanese Names Written in
Latin Script via Language Identification and Corpus Validation In Proc. of ACL 2004: 184-191.
R. Rapp. 1999. Automatic identification of word translations from unrelated English and German
corpora, In Proc. of ACL 1999: 519-526.
R. Schwartz and Y.-L. Chow. 1990. The N-best algorithm: An efficient and exact procedure for finding
the N most likely sentence hypothesis. In Proc. of ICCASP 1990: 81-84.
F. Smadja, K. McKeown, and V. Hatzivassiloglou. 1996. Translating collocations for bilingual
lexicons: a statistical approach. Computational Linguistics, 22(1):1-38.
P. Virga and S. Khudanpur. 2003. Transliteration of Proper Names in Cross-Lingual Information
Retrieval. ACL 2003 workshop MLNER.
S. Wan and C. M. Verspoor. 1998. Automatic English-Chinese name transliteration for development of
multilingual resources. In Proc. of ACL 1998: 1352-1357.
J. Xiao, J. Liu and T.-S. Chua. 2002. "Extracting pronunciation-translated names from Chinese texts
using bootstrapping approach", the 1st SIGHAN workshop on Chinese Language Processing , Taipei,
To Parent(s)/Guardian(s): Complete this section and give this form (FORM 2) and a copy of your completed CAMPER HEALTH HISTORY FORM (FORM 1) to your child’s health-care provider for review. Developed and reviewed by: American Camp Association, Dates will attend camp: from ______________to_____________ American Academy of Pediatrics Council on School Health, & Cam
Cette fiche d’information, rédigée par l’Association Française d’Urologie, est destinée aux patients ainsi qu’aux usagers du système de santé. CYSTITE INTERSTITIELLE Cette pathologie inflammatoire de la vessie au mécanisme encore mal connu se différencie totalement de la cystite infectieuse traditionnelle. Mise en évidence en 1915, individualisée alors sous le nom d’ulcéra