Research 2002

Department Home

Researchers

Guest Researchers

Research Interests

Research Output

Postgraduate Student Projects 2002

Research Findings

Back To

Faculty Structure

 

Faculty of Engineering, Built Environment and Information Technology
School of Information Technology
Department of Information Science

Selected Highlights from Research Findings

Cross Language Information Retrieval (CLIR) makes it possible to retrieve information written in one language through queries structured in another. Together with the Department of Information Science of the University of Tampere, Finland, the Department studied the accessibility of information written in isiZulu through English queries. The findings showed, in broad terms, that mechanical matching of the inflected word forms in the running text to the normalised word forms in the indexes works quite well if metric similarity measures are used (skip-grams in particular yielded quite good results). However, retrieval performance in general was not as high as expected - mainly due to disparate vocabularies of isiZulu and English, which leads to paraphrasing during the translation process from English to isiZulu. Suggested solutions includes the use of metadata (specifically the Dublin Core data element set) in conjunction with CLIR to facilitate access on more than one level. We have suggested extensions to the existing element set, specifically with regard to facilitating access to databases of indigenous knowledge. Problems experienced mainly related to lack of resources - we had no suitable electronic bilingual dictionaries and no useable parsers in the South African indigenous languages. We also found that the CLEF (European) database, which was used as a test-bed, contained data which were very Eurocentric, and this compounded the comparable vocabulary issues.
Die herwinning van inligting oor taalgrense heen (bekend as CLIR Cross Language Information Retrieval) is 'n belangrike navorsingsarea waarin die fokus val op die herwinning van inligting wat in 'n bepaalde taal geskryf is, deur middel van navrae geformuleer in 'n ánder taal. Die Department het in samewerking met die Departement Inligtingstudies van die Universiteit van Tampere, Finland, ondersoek ingestel na hoe toeganklik inligting is wat in isiZulu geskryf is, vir navrae wat in Engels geformuleer word. Hierdie ondersoek het bevindinge opgelewer waaruit dit duidelik is dat geďnflekteerde woorde in die teks wat ondersoek word, goed meganies gepas kan word met genormaliseerde woordvorme in indekse, mits meetinstrumente wat metries ooreenstem, gebruik word. Veral benaderde reekspassing ('skip-grams') het besondere goeie resultate opgelewer. Die herwinning was egter oor die algemeen nie so goed as wat verwag is nie; dit kan hoofsaaklik toegeskryf word aan die ongelyksoortigheid tussen die onderskeie woordeskatte van isiZulu en Engels wat parafrasering tydens die vertaalproses veroorsaak het. Een van die voorgestelde oplossings is om metadata (spesifiek die Dublin Core metadata elementestel) saam met CLIR te gebruik sodat toegang op meer as een vlak moontlik gemaak word. Hiervoor het die navorsers uitbreidings op die bestaande elementestel voorgestel — spesifiek om toegang tot databasisse van inheemse kennis ('indigenous knowledge') te skep. Probleme wat ondervind is hou hoofsaaklik verband met 'n tekort aan bronne: daar bestaan nie geskikte tweetalige e-woordeboeke of bruikbare woordontleders ('parsers') vir Suid-Afrikaanse inheemse tale nie. Dit het ook aan die lig gekom dat die data in die CLEF (Cross Language Evaluation Forum) databasis, wat as 'n toetsbed vir die navorsing gebruik is, eurosentries is en dit het die onvergelykbaarheid tussen die onderskeie woordeskatte van isiZulu en Engels verder bemoelik.
Contact person: Ms E Cosijn.

Publishing trends database. The lack of statistical information regarding the number and categories of books published in South Africa provided a research opportunity for the Programme in Publishing Studies. A research project aimed at providing comparative data on the number of books published during the 1990s. The following issues were addressed: finding a suitable primary source that contained the basic data needed to establish statistical trends; finding ways and means of extracting data from the primary source in order to compile cumulative statistics; creating an electronic database that could be used for additional data extraction and that could be updated to ensure reliable data in the future. The electronic version of the South African National Bibliography was used as primary source. Retrieved data were converted into ASCII text and imported into a searchable MS Access database according to specified fields (including title, publication date, publisher, language). Currently this database is the most sensible and reliable source available on book production statistics during the 1990s. The database serves as a prototype that will be updated and retrospectively supplemented.
Contact person: Dr FCJ Galloway.

 

Related Links

Department of Information Science Home Page