Tuesday, September 2, 2008

concept for a popular book on human language technology

Filipino Guide to Language Technology


Book Concept

Target Audience is undergrad IT student looking to do a school project related to human language. Start with a chapter related to applying NLTK to Cebuano, including the Wolff dictionary. Secondary target is a IT-power user language specialist.

Put up draft chapters on Linguistic Exploration, or later a dedicated website.

...

Outline

  1. Intro stuff, what is HLT, diversity of Philippine languages, language situation in Phil, disciplines of informatics and linguistics
  2. Using HLT on the Web, identifying the gap for Phil languages, focus on three problems motivating running examples
  3. Using NLTK
  4. Concepts about Language: Phonemes and Morphemes
  5. Concepts about Software: data representation, XML
    1. e.g. TEI dictionaries, Wolff dictionary
  6. Lang: Syntax at level of parse trees
    1. using NLTK for a grammar of Filipino
  7. Modeling syntax
    1. FSM, push-down automata, recursive desent parser
    2. formal grammars of Filipino and Cebuano
  8. Corpus building
    1. Using NLTK, building a database/Web repository
  9. Using semantics
    1. Semantic Web
    2. Word senses in a TEI dictionary
  10. Modifying NLTK
    1. Python or Java
  11. Typed Feature Structures
    1. HPSG models of Filipino and Cebuano
    2. Using LKB
  12. Semantics of Words
    1. FrameNet
  13. Modifying NLTK for word semantics of a Philippine language
  14. Semantics of Clauses
    1. MRS
  15. Discourse
    1. DRT
    2. Speech Acts
  16. Comparing Languages
  17. Extended example
  18. Cognitive Modeling
  19. Open projects
    1. Collaborating with language specialists
    2. Publishing on the Web
    3. Open Source

Biblio

  • NLTK book
  • SIL software
  • HPSG
    • Kim & Sells 2008
    • Sag, Wasow and Bender
    • LKB
  • ....


No comments: