Sunday, September 7, 2008

Kilanlanin ang HPSG, bahagi 2 (sa wikang Filipino)

3 Valency (pagkombinasyon) at ang grammatikal rules

Sa kabanatang ito, pag-uusapan natin ang interaksyon sa pagitan ng grammatikal rules at ang impormasyon sa pagkombinasyon o valency. Mula sa kemistri ang konsepto ng "valency". Makakabuo ng kemikal bonds ang isang atom sa ibang atoms para magbuo ng mga molekyul na humigit-kumulang ay stablè. Mahalaga para sa stabilidad ang kung anong valency shell (o elektron shell) ang meron sa isang klase ng atom. Makakabuo lang ng stableng kemikal bond sa ibang atom kapag lamang mapupuno ang electron shell ng mga shared electrons. Ang valency number ng mga atom ng isang element ay ang bilang ng mga hydrogen atoms na pwedeng magbond sa element na iyan. Sa kemikal kompound H2O, may valency 2 ang oxygen. Mahahati ang mga elements sa valency classes. Ang mga elements na may partikular na valency ang batayan ni Mendeleev sa pag-organisa sa mga kolum ang impormasyon tungkol sa mga elements sa Periodic Table.

(the chemical meaning referring to the "combining power of an element" is recorded from 1884, from German Valenz.[1] )

Ginamit ang konsepto ng valency sa linggwistiks ni Tesnière (1959): kailangan ng isang Head ang tiyak na bilang ng mga Argument, para magkaroon ng stablèng bond (o phrase structure). Ang salita na pareho ang valency -- ang tamang bilang at uri ng argument -- ay nagbubuo ng isang valency class (halimbawa: intransitive verbs, transitive verbs at ditransitive verbs, na nagkokombinasyon sa 1, 2 o 3 argument). Pareho ang /behavior ng mga miyembro ng isang valence class kaugnay ng mga kombinasyon o phrases na nabubuo niya. Sa Larawan 3.1 makikita ang mga halimbawa galing kemistri at linggwistiks.

[]
Larawan 3.1 Ang kombinasyon ng oxygen at hydrogen, at ang kombinasyon ng isang verb sa kanyang mga arguments.

Sa sumusunod na parte, ipinapakita kung paano ang pagrepresent ng valency information, para imbis na maraming partikular na phrase structure rules pwedeng gumamit na lang ng ilang mas general na schemata (o balangkas na rules) para magbigay-lisensya sa syntactic structure.

3.1  Ang pag-represent ng impormasyon tungkol sa valency

Introducing HPSG, part 1

HPSG Introduction

http://wwwhomes.uni-bielefeld.de/rvogel/ws0607/synfolien/hpsg-lehrbuch.pdf

adaptation using http://lingro.com and http://dict.tu-chemnitz.de
[and maybe Google Translate]

1.10 /classification of HPSG

HPSG has the ff: qualities: it is a lexicalist theory, that means the essential elements of the linguistic connections/coherencies are found in the descriptions of words [but also constructions]. HPSG is sign-based in the sense of Saussure (1916), that means signifier and signified of linguistic signs are always represented together. Typed feature structures like (64) are used for modeling all relevant information. Lexicon entries, phrases and principles are all described with the same formal /mechanism. Generalizations about word classes and rule schemas are realised through an inheritance hierarchy. HPSG is a monostratal theory, that means Phonology, Syntax and Semantics are described in a single structure [that means that although there are different levels of information, each type of information is only represented only once; in contrast, transformation grammar theories represent the same information several times before and after each transformation or movement]. There are no separate description layers/levels unlike, for example, GB theory. (64) shows and extract of the representation of the Word Grammatik.

[]

You can see from this example that this feature description contains information about phonology, syntactic category and semantic contribution for the word Grammatik. The meaning of the italicized words and numbered boxes, why the information is structured the way it is, and how such a lexicon entry works with rule schemas will all be /shown/explained in the following chapters.

1.11  The /underlying the data

...

2. The Formalism

This chapter introduces feature structures, which are used in modeling all linguistic objects. [...] Feature structures are complex /shapes that model all qualities of a linguistic object. A linguist usually works with a feature description, that illustrates only an /extract of the feature structure. The difference between model and description will be made precise in Section 2.7. Another term for feature description is attribute-value matrix. [...] I try to limit the formal part of this book to what is absolutely necessary. Interested readers should consult Shieber (1986), Pollard
and Sag (1987), Johnson (1988), Carpenter (1992), King (1994) and Richter (2004). Shieber's book presents a readable introduction to unification grammars. King and Richter's work present the important foundations of HPSG that will remain inaccessible the non-mathematician. The important thing is to know that this work exists, and that the theory rests on a firm foundation.

Friday, September 5, 2008

Corpus Resources

The Arts & Humanities Data Service in the UK seems to have lost its funding, but the resources in gathered together with continute to be available from the Oxford Text Archive.  Two data sets that caught my eye, because they have word frequencies, are:

  • CUVPlus, which contains 72,000 words from the Oxford Advanced Learners Dictionary
  • Japanese Kanji

Tuesday, September 2, 2008

concept for a popular book on human language technology

Filipino Guide to Language Technology


Book Concept

Target Audience is undergrad IT student looking to do a school project related to human language. Start with a chapter related to applying NLTK to Cebuano, including the Wolff dictionary. Secondary target is a IT-power user language specialist.

Put up draft chapters on Linguistic Exploration, or later a dedicated website.

...

Outline

  1. Intro stuff, what is HLT, diversity of Philippine languages, language situation in Phil, disciplines of informatics and linguistics
  2. Using HLT on the Web, identifying the gap for Phil languages, focus on three problems motivating running examples
  3. Using NLTK
  4. Concepts about Language: Phonemes and Morphemes
  5. Concepts about Software: data representation, XML
    1. e.g. TEI dictionaries, Wolff dictionary
  6. Lang: Syntax at level of parse trees
    1. using NLTK for a grammar of Filipino
  7. Modeling syntax
    1. FSM, push-down automata, recursive desent parser
    2. formal grammars of Filipino and Cebuano
  8. Corpus building
    1. Using NLTK, building a database/Web repository
  9. Using semantics
    1. Semantic Web
    2. Word senses in a TEI dictionary
  10. Modifying NLTK
    1. Python or Java
  11. Typed Feature Structures
    1. HPSG models of Filipino and Cebuano
    2. Using LKB
  12. Semantics of Words
    1. FrameNet
  13. Modifying NLTK for word semantics of a Philippine language
  14. Semantics of Clauses
    1. MRS
  15. Discourse
    1. DRT
    2. Speech Acts
  16. Comparing Languages
  17. Extended example
  18. Cognitive Modeling
  19. Open projects
    1. Collaborating with language specialists
    2. Publishing on the Web
    3. Open Source

Biblio

  • NLTK book
  • SIL software
  • HPSG
    • Kim & Sells 2008
    • Sag, Wasow and Bender
    • LKB
  • ....