Die linguistic engine EXTRAKT ist in dem Suchportal LexxiNet (früher ExtraktSearch) integriert und sorgt für optimale Suchergebnisse.

LexxiNet (deutsch)
 

The Linguistic Engine EXTRAKT

The linguistic engine EXTRAKT is the main product of TEXTEC.

It carries out a (morpho-syntactic) analysis of most of the European languages. Besides its analysis function, EXTRAKT contains a whole series of linguistic functions.

Basis of the system is made up of dictionaries whose number of entries vary from approx. 500,000 up to 1,8 million entries. Different (domain) specific dictionaries are also available.

A basic form is a word form, which stands for the different forms of the same word. The German word Haus is the basic form for example for Haus, Hause, Hauses, Häuser and Häusern.

In English, the basic form "go" stands for all forms of this verb, i.e. go, goes, went, and gone.

 

A dictionary of about 1,5 million entries was created for the German language. In addition, special dictionaries containing "ersatz" representations of specific characters (Haeuser instead of Häuser) as well as a dictionary with split German compounds (approx. 1,500,000 entries), a French dictionary with accentless entries (like methode for méthode) etc. are available.

 

For the German language, a dictionary of about 1,5 million entries was built up. In addition, special dictionaries containing "ersatz" representations of specific characters (Haeuser instead of Häuser) as well as a dictionary with split German compounds (approx. 1,500,000 entries), a French dictionary with accentless entries (like methode for méthode) etc. are available.

 

For German, there is a dictionary of 120,000 synonyms and related word forms (derivations) with 150,000 entries and a thesaurus.

 

There are 36,000 English and 25,000 French synonyms available.

 

The lemmata can be translated into another languages so that a multilingual (cross-lingual) search becomes possible.

 

At this time,EXTRAKT has bilingual dictionaries, which have 50,000 to 170,000 entries. These dictionaries are constantly updated.

 

Private dictionaries can be created by the customer themself and can be added to the system.

 

EXTRAKT was equipped with the generation function GENERATE: this function generates all forms of a given basic word form. Therefore, the word "Hauses" (genitive singular) can be entered and GENERATE generates all inflected forms Hause, Hauses, Häuser and Häusern from the basic form Haus.

 

GENERATE is available in all languages covered by our system because the monolingual dictionaries are used for both analysis and generation.

 

EXTRAKT has a client/server architecture, however, is also provided as a DLL.

EXTRAKT is available as a simple C++-DLL, a TCP/IP-Server (EXTRAKT - Server) and as a COM/DCOM - Server. The C++-DLL can be directly integrated into client programs. Any client system can communicate with the EXTRAKT - Server using a simple protocol, where requests can be formulated as strings. In most of the installations, the server version is choosen.

 

Furthermore, TEXTEC offers special interface modules for different platforms in order to make the communication with the server (as well as the integration of the EXTRAKT functions) as easy as possible.

 

As a COM/DCOM component, EXTRAKT allows for an easy integration into local and distributed systems.


The INDEX and GENERATE functions are also available in JAVA.

The newest version of EXTRAKT is 12.0 b (Sept. 2021).

Druckversion | Sitemap
© 1995-2024 TEXTEC Software Dr. Erwin Stegentritt