Displayed Bias as a Reflection of Both Speaker and Intended

January 6, 2018 | Author: Anonymous | Category: Social Science
Share Embed Donate

Short Description

Download Displayed Bias as a Reflection of Both Speaker and Intended...


Carolyn Penstein Rosé Language Technologies Institute and Human-Computer Interaction Institute

New York Times Article What strikes you about the agent’s style of speaking? June 24, 2010 Computers Learn to Listen, and Some Talk Back By STEVE LOHR and JOHN MARKOFF “Hi, thanks for coming,” the medical assistant says, greeting a mother with her 5-year-old son. “Are you here for your child or yourself?” The boy, the mother replies. He has diarrhea. “Oh no, sorry to hear that,” she says, looking down at the boy. The assistant asks the mother about other symptoms, including fever (“slight”) and abdominal pain (“He hasn’t been complaining”). She turns again to the boy. “Has your tummy been hurting?” Yes, he replies. After a few more questions, the assistant declares herself “not that concerned at this point.” She schedules an appointment with a doctor in a couple of days. The mother leads her son from the room, holding his hand. But he keeps looking back at the assistant, fascinated, as if reluctant to leave. Maybe that is because the assistant is the disembodied likeness of a woman’s face on a computer screen — a no-frills avatar. Her words of sympathy are jerky, flat and mechanical. But she has the right stuff — the ability to understand speech, recognize pediatric conditions and reason according to simple rules — to make an initial diagnosis of a childhood ailment and its seriousness. And to win the trust of a little boy.

Not all so rosy…

Are we missing something? Sociolinguists and Discourse Analysts have been studying social aspects of language since the 20s and 30s!!!

Ask yourself this:

Where do I sound like I’m from? Actually from California, but picked up some accent from my dad from New York... Did you notice the a in Carolyn? But not the back-open r. And if you heard me say “daughter…” But how often do I say that in class?

Note that context is everything. a in sat doesn’t have the same significance as a in Carolyn.

What information are we throwing away or ignoring that would allow us to distinguish meaningful variation from meaningless variation?

What will you get out of this class?  Learn to read the primary literature in sociolinguistics,

discourse analysis, and pragmatics  Get a more intimate familiarity with the state-of-theart in language processing applied to analysis of social media, especially conversation and narrative  Explore what insights these fields of linguistics can contribute to language technologies  Explore what language technologies might be able to do to advance these fields of linguistics  Get hands on experience working on both

Please Introduce Yourself…  What experience do you have with discourse analysis?  What do you most want to get out of this class?

Discourse and Identity  Identity is reflected in the way we present ourselves

in conversational interactions

 Reflects who we are, how we think, and where we

belong  Also reflects how we think of our audience

 Examples  Regional dialect: shows my identification with where I am from, but also shows I am comfortable letting you identify me that way  Jargon and technical terms: shows my identification with a work community, but also shows I expect you to be able to relate to that part of my life  Level of formality: shows where we stand in relation to one another  Explicitness in reference: shows whether I am treating you like an insider or an outsider

Discourse and Identity  Discourse is text above the clause level (Martin & Rose, 2007)  A Discourse is an ongoing conversation [type]

 Socialization is the process of joining a

Lakoff & Johnson, 1980

Discourse (Lave & Wenger, 1991; Sfard, 2010)  We join Discourses that match our core identity (de Fina, Schiffrin, & Bamberg, 2006)  In moving from the periphery to the core of a Discourse community, we sound more and more like the community (Arguello et al., 2006)

 A discourse is one instance of it [token]

 All discourses contain echoes of

Lave & Wenger, 1991

previous discourses (Bakhtin, 1983)

Metaphors Structure our Experience  We describe arguments using

terms related to war

 Using a typical war ‘script’ to

structure a story about an argument

 We orient towards arguments

as though they were wars

 Our conversational partner is

our opponent  We may feel that we won or lost  We may feel wounded as a result

Discourses, Frames, and Metaphors

 Frame: A portion of a discourse belonging to

distinct Discourse  Metaphor : One linguistic device that can be used to define a set of discourse practices that constitute a frame  Topic models: a technical approach that makes sense for identifying frames within a discourse  A discourse could be drawn from a mixture of


 Within the same conversation, we may wear a

variety of “hats”  E.g., the same discourse with a co-worker may contain exchanges pertaining to our relationship as colleagues and others to our relationship as friends


Discussion Questions  What other stories/movies/genres does this remind

you of?  What is the message being communicated about Hummers?  What is communicated about the company that makes them?  What is communicated about the assumed audience?  What are other messages?  E.g., are any political statements being made?

Semester Plan  Unit 1: Theoretical


 Unit 2: Linguistic Structure  Unit 3: Sentiment  Unit 4: Identity and


 Unit 5: Social Positioning

 In each Unit:  Readings from

Discourse Analysis and Sociolinguistics

 Readings from

Language Technologies

 Hands-on assignment  Implementation and corpus based experiment  Competitive error analysis  Student Presentations

Grading people who make a good faith effort always do well in my courses…

 15% for each of 5 Unit assignments  First one is a discourse analysis  Others are corpus based experiments  

We provide the corpus You implement a feature extractor, test it, do an error analysis, and present your well motivated idea and evaluation in class

 10% for class participation  Doing readings (will be posted to course Drupal)  Posting to Drupal discussion by 10pm the night before class  Actively contributing to class discussions  15% for final critique of a technical paper

Corpora for experimentation  Unit 2: Maptask data (Negotiation coding)  Possibly other chat corpora with same coding as well  Unit 3: Product Reviews (Sentiment)  Unit 4: Blog corpus (Age and Gender)  Unit 5: AMI meeting corpus (Dialogue Acts)  Other corpora  Email discussion list (Social Support coding)

SIDE: Workbench for Experimentation  http://www.cs.cmu.edu/~cprose/SIDE.html



Two Options  Create your own feature extractor plugins  We will provide documents abstract classes that you create specializations of  Programmed in Java  Elijah is the developer and can answer your questions  Use SIDE’s feature creation functionality to create

novel functions  Grades will be based on:

 The extent to which your features are theory motivated

or data motivated  The depth of your error analysis




Setting up the course Drupal  If you are not registered, please do so  If you don’t have an Andrew account, make sure I have

your email address  We will manage the course through Drupal  All materials, including pdfs for required readings, will

be posted to Drupal  Slides for all lectures will be posted to Drupal after class  Discussion threads in preparation for each lecture will be found on Drupal

Assignment 1 (not due til Jan26)  Transcribe a scene from a favorite move, play, or TV show  As a shortcut, you can find a script online  Excerpt should be no more than one page of text  Select one of the methodologies we are discussing in Unit 1

(e.g., from Gee, Martin & Rose, or Levinson)  Do a qualitative analysis of the script and write it up

 Use readings from Unit 1 as a collection of models to chose


 Due on Week 3 lecture 2  Turn in transcript, raw analysis (can be annotations added to the transcript), and write up (your interpretation of the analysis)  Prepare a powerpoint presentation for class (no more than 5 minutes of material)

For next time….  You will receive login information for Drupal  http://kanagawa.lti.cs.cmu.edu/11719/  Read excerpts from James Gee’s book (linked to

syllabus entry for Wednesday’s lecture)  Post to drupal (in response to discussion question posted for Week 1 Lecture 2)

Carolyn Penstein Rosé http://www.cs.cmu.edu/~cprose [email protected] Gates-Hillman Center 5415

View more...


Copyright � 2017 NANOPDF Inc.