INTRODUCTION TO ARTIFICIAL INTELLIGENCE - clic

January 9, 2018 | Author: Anonymous | Category: Social Science, Psychology, Cognitive Psychology
Share Embed Donate


Short Description

Download INTRODUCTION TO ARTIFICIAL INTELLIGENCE - clic...

Description

INTRODUCTION TO ARTIFICIAL INTELLIGENCE Truc-Vien T. Nguyen

Lab: Named Entity Recognition

Download • Slides http://sites.google.com/site/trucviennguyen/Lab NER -- Vien.pdf

• Software http://sites.google.com/site/trucviennguyen/Teaching/AI/SSHSecureShellClien t-3.2.9.rar

Natural Language Processing (NLP) • Main purpose of NLP – Build systems able to analyze, understand and generate languages which human use naturally

• Involved Tasks – – – – –

Automatic Summarization Information Extraction Speech Recognition Machine Translation …

Information Extraction (1) News 3 News 2 News 1

Form 3

WHO: vcvcvcvcvcvcvcvcvc Form 2vcvcvcvcvcvcvcvcvc WHAT: WHO: vcvcvcvcvcvcvcvcvc vcvcvcvcvcvcvcvcvc WHEN: Form 1vcvcvcvcvcvcvcvcvc WHAT: WHO: vcvcvcvcvcvcvcvcvc vcvcvcvcvcvcvcvcvc WHEN: WHAT: vcvcvcvcvcvcvcvcvc WHEN: vcvcvcvcvcvcvcvcvc

Mapping of texts into fixed structure representing the key informations

Information Extraction (2) Sam Brown retired as executive vice president of the famous hot dog manufacturer, Hupplewhite Inc. He will be succeeded by Harry Jones. EVENT: leave job Person: Sam Brown Position: executive vice president Company: Hupplewhite Inc. EVENT: start job Person: Harry Jones Position: executive vice president Company: Hupplewhite Inc.

Entity and Relation • Entity – An object in the world – Ex. President Bush was in Washington today – Example: Person, Organization, Location, GPE

• Relation – A relationship between two entities – Ex. LocatedIn(“Bush”, “Washington”) – Example: LocatedIn, Family, Employment

Named Entity Recognition • Named Entity Recognition – Subtask of information extraction – Locate and classify elements in text into predefined categories: names of persons, organizations, locations, expressions of times, etc • Example – James Clarke, director of ABC company (Person) (Organization)

CoNLL2003 shared task (1) • English and German language • 4 types of NEs: – – – –

LOC Location MISC Names of miscellaneous entities ORG Organization PER Person

• Training Set for developing the system • Test Data for the final evaluation

CoNLL2003 shared task (2) • Data – – – –

columns separated by a single space A word for each line An empty line after each sentence Tags in IOB format

• An example Milan 's player George Weah meet

NNP POS NN NNP NNP VBP

B-NP B-NP I-NP I-NP I-NP B-VP

I-ORG O O I-PER I-PER O

CoNLL2003 shared task (3) English

precision

recall

F

[FIJZ03] 88.99% 88.54% 88.76% [CN03] 88.12% 88.51% 88.31% [KSNM03] 85.93% 86.21% 86.07% [ZJ03] 86.13% 84.88% 85.50% --------------------------------------------------[Ham03] 69.09% 53.26% 60.15% baseline

71.91%

50.90%

59.61%

Dataset • Italian NER-- Evalita 2009 - PER/ORG/LOC/GPE – Development set: 223.706 tokens – Test set: 90.556 tokens

• English NER-- CoNLL 2003 - PER/ORG/LOC/MISC – Training set: 203.621 tokens – Development set: 51.362 tokens – Test set: 46.435 tokens

• Mention Detection-- ACE 2005 – 599 documents

CRF++ (1) • • • • • •

Can redefine feature sets Written in C++ with STL Fast training based on LBFGS for large scale Less memory usage both in training and testing encoding/decoding in practical time Available as an open source software http://crfpp.googlecode.com/svn/trunk/doc/index.html

CRF++ (2) • use Conditional Random Fields (CRFs) • CRFs methodology: use statistical correlated features and train them discriminatively • simple, customizable, and open source implementation • for segmenting/labeling sequential data • can define – unigram/bigram features – relative positions (windows-size)

Template basic • An example: He reckons the current account

PRP VBZ DT JJ NN

B-NP B-VP B-NP I-NP I-NP

../corpus/zzz

• Evaluation perl ../eval/conlleval.pl ../corpus/zzz > ../corpus/ttt

• See the results cat ../corpus/ttt

THANKS • I used material from – Text Processing II: Bernardo Magnini – Lab Text Processing II: Roberto Zanoli

View more...

Comments

Copyright � 2017 NANOPDF Inc.
SUPPORT NANOPDF