Advas – A Python Search Engine Module Dipl.-Inf. Frank Hofmann Potsdam 11. Oktober 2007 Dipl.-Inf. Frank Hofmann (Potsdam) Advas – A Python Search Engine Module 11. Oktober 2007 1 / 15 Contents 1 Project Overview 2 Functional Overview 3 Package Details 4 Future 5 References Dipl.-Inf. Frank Hofmann (Potsdam) Advas – A Python Search Engine Module 11. Oktober 2007 2 / 15 Project Overview AdvaS in Five Minutes (1) a library to build a search engine abbreviation for Advanced Search written in Python available as an additional module project website advas.sourceforge.net Dipl.-Inf. Frank Hofmann (Potsdam) Advas – A Python Search Engine Module 11. Oktober 2007 3 / 15 Project Overview AdvaS in Five Minutes (2) Goals: free library to extend an CMS or AI system to study and and understand information retrieval/search to learn how does an search engine works to understand why you do not find the data you are looking for to improve formulating your search queries to improve and test enhanced search algorithms Dipl.-Inf. Frank Hofmann (Potsdam) Advas – A Python Search Engine Module 11. Oktober 2007 4 / 15 Project Overview Project History development started in 2002 as a student’s project since then: several improvements released as rpm packages (Fedora, Mandrake) since 2006: available as Debian package (for Etch) test application with GTK+ Dipl.-Inf. Frank Hofmann (Potsdam) Advas – A Python Search Engine Module 11. Oktober 2007 5 / 15 Functional Overview Statistical Algorithms term frequency (tf) term frequency with stop list inverse document frequency (idf) retrieval status value (rsv) language detection by keywords k-nearest neighbour algorithm (kNN) Dipl.-Inf. Frank Hofmann (Potsdam) Advas – A Python Search Engine Module 11. Oktober 2007 6 / 15 Functional Overview Linguistic Algorithms stemming algorithms table lookup stemmer n-gram stemmer successor variety stemmer using peak-and-plateau method synonym detection with the use of the OpenThesaurus (plain text version) Dipl.-Inf. Frank Hofmann (Potsdam) Advas – A Python Search Engine Module 11. Oktober 2007 7 / 15 Functional Overview Sound-like Methods soundex metaphone NYSIIS algorithm caverphone algorithm (version 2.0) Dipl.-Inf. Frank Hofmann (Potsdam) Advas – A Python Search Engine Module 11. Oktober 2007 8 / 15 Functional Overview Additional Methods a simple descriptor-based ranking algorithm text search algorithms (Knuth-Morris-Pratt) Dipl.-Inf. Frank Hofmann (Potsdam) Advas – A Python Search Engine Module 11. Oktober 2007 9 / 15 Package Details Requirements technical requirements python 2.4 other facts enthusiasm idea or plan which advas components to combine data to play around with Dipl.-Inf. Frank Hofmann (Potsdam) Advas – A Python Search Engine Module 11. Oktober 2007 10 / 15 Package Details Package Content AdvaS library Documentation (plain text, HTML in German and English) Examples demo application using pyGTK Dipl.-Inf. Frank Hofmann (Potsdam) Advas – A Python Search Engine Module 11. Oktober 2007 11 / 15 Future Wanted code review code improvements extending the demo application testing documentation: translation into other languages Dipl.-Inf. Frank Hofmann (Potsdam) Advas – A Python Search Engine Module 11. Oktober 2007 12 / 15 References Literature (selection) G. G. Chowdhury: Introduction to Modern Information Retrieval Facet Publishing, London, 2004, 474 S., ISBN 1-8560-4480-7 David A. Grossman/Ophir Frieder: Information Retrieval: Algorithms and Heuristics (The Information Retrieval Series), Springer, 2006, 356 S., ISBN 978-1402030048 Dipl.-Inf. Frank Hofmann (Potsdam) Advas – A Python Search Engine Module 11. Oktober 2007 13 / 15 References Links Semantic Web School – Zentrum für Wissenstransfer http://www.semantic-web.at Semantic Pool http://www.semanticpool.de SearchEngine Watch http://www.searchenginewatch.com Search Theory and Strategies http://wiki.apache.org/nutch/Search Theory Dipl.-Inf. Frank Hofmann (Potsdam) Advas – A Python Search Engine Module 11. Oktober 2007 14 / 15 References The End Thank you for your attention :-) Kontakt: Dipl.-Inf. Frank Hofmann Hofmann EDV – Linux, Layout und Satz Dortustr. 53 14467 Potsdam / Germany Email <[email protected]> web www.efho.de Dipl.-Inf. Frank Hofmann (Potsdam) Advas – A Python Search Engine Module 11. Oktober 2007 15 / 15