Irion uses various technologies for the development of its products, of which language technology is the most important.
In addition to language technology, use is made of learning algorithms (machine learning), information extraction en semantic web.
Computers can be made a lot smarter when they are programmed to deal with natural language. The creation of computer programms that deal with natura language is called language technology, or computational linguistics.
Language technology makes use of dictionaries, grammars, formal semantics, language heuristics, dialogue patterns, and the like, in fact all the types of linguistic knowledge that people use when they speak or write to each other, and that can be formalised. Language technology is extremely complicated, because natural language is only partly regular and structured, and has many sorts of vagueness, irregularities and ambiguities.
A good example of this complexity is the current status of automatic translation. Despite the promises made from even the fifties of last century, automatic translation is still cumbersome. Irion’s ambitions are more modest than automatic translation. We strongly focus at the use of language technology for the improvement of knowledge and information management.
Text Mining / Information Extraction
Text Mining / Information Extraction is one of the most important technologies Irion uses to build solutions for customers. With the use of language technology, pattern recognition and case based learning specific information is derived automatically from large text corpora, and stored in databases, or other data repositories. Preferably we do this in a semantic way, so that Linked Open Data protocols and standards can be applied. Important examples of Irion’s Text Mining / Information Extraction projects are: BizTriggers, Hotfrog and Achmea.
Irion makes use of several Machine Learning, techniques, such as supervised learning, unsupervised learning, and transduction. These techniques are usually combined with natural language technology, and with formal concept analysis, information extraction and simplex rule-based deduction to achieve the envisaged goals in solutions for customers. Particularly the supervised learning is used for our classification system, but here it is combined with a special brand of computational linguistics, namely annotated and normalised corpus statistics, and domain knowledge from thesauruses, other controlled vocabularies, and semantic networks.
Irion developed its own special brand of combined technologies to achieve the maximum performace of the classification system, that not only outperforms most of the classifications systems in the world, when applied in out-of-laboratory, ‘real-life’ situations, but also requires significantly less effort to build solutions.
A spectacular and recent example of this is the IPTC classifier for both Italian and Spanish which we developed recently with our partner LexisNexis in just three months.