MY MENU

일반자료실

제목

Ancient Greek and Latin Dependency Treebank 2.0

작성자
김창성
작성일
2016.09.22
첨부파일0
조회수
1162
내용

Ancient Greek and Latin Dependency Treebank 2.0

 

Posted: 20 Sep 2016 12:03 PM PDT

 

Ancient Greek and Latin Dependency Treebank 2.0
Responsible for the project
Giuseppe G. A. Celano (celano at informatik.uni-leipzig.de) & Gregory Crane (crane at informatik.uni-leipzig.de)


Advisory board
Joakim Nivre
Jonathan Robie


Treebanking is the activity of annotating texts syntactically. It is part of a relatively new field of research exploring the potential of linguistic annotation for a great variety of purposes, ranging from natural language processing tasks, such as machine translation or summarization, to linguistic research, where computational treatment of data has been significantly impacting method and results in linguistics.

Continuing the pioneer work at the Perseus Project, where the first texts were treebanked (Ancient Greek and Latin Dependency Treebank 1.0), the Humboldt Chair for Digital Humanities promotes the building of the Ancient Greek and Latin Treebank 2.0 within the project Treebanking: building a linguistic corpus for Ancient Greek and Latin, started on 2015.

The aim of the project is twofold: (1) produce new treebanked data following a new specification and (2) develop annotation and conversion tools, so that annotation can be as automatic as possible and data can be converted into different formats: this is particularly relevant in that the newly produced data will also be released as part of the project Universal Dependencies.

Currently, our annotation can be performed online through the Perseids platform: users are freely granted access to Arethusa, a new annotation environment currently allowing three layers of linguistic annotation: the morphological layer, the syntactic layer, and the advanced syntax (or semantic) layer.

Morpheus PoS tagger allows semi-automatic annotation for morphology. The annotator is provided with some morphological analyses for each word. S/he can choose one of them or add a new one, if the right one is missing.

The syntactic annotation consists in building syntactic trees according to a dependency grammar model and assigning a grammatical relation label, such as SBJ or OBJ, to each node of a tree on the basis of its relationship with the governor node. The current implemented model builds on the one developed for the Prague Dependency Treebank 2.0.

Ancient Greek can also be annotated for semantics. The advanced syntax (or semantic) layer allows annotation of the categories identified in Smyth’s grammar (where the term “syntax” is used in a broader sense, to also cover semantic roles).  Starting from the morphosyntactic annotation of a word, the annotator is algorithmically guided to the identification of a relevant semantic role (e.g., genitive > genitive proper > genitive of possession).

Currently, a selection of Aesop’s fables, passages from the Bibliotheca (Pseudo-Apollodorus), and the fables of Phaedrus are being annotated. The creation of the corpus is documented on github:

guidelines 2.0 for the annotation

inter-coder agreement for the Greek and Latin texts (work in progress)

repository for the treebank, both AGDT 1.0 and 2.0 (work in progress)

Annotation platform:

Arethusa through Perseids Platform

A few videos on how to use Arethusa to annotate

Screen Shot 2014-10-20 at 18.25.28

게시물수정

게시물 수정을 위해 비밀번호를 입력해주세요.

댓글삭제게시물삭제

게시물 삭제를 위해 비밀번호를 입력해주세요.