Soup is a parser of context-free grammars (CFGs) developed at Carnegie Mellon's Interactive Systems Labs by Marsal Gavaldà.

Contents

1. Soup and G·Soup

Soup is a stochastic, chart-based, top-down parser, specially designed for parsing spoken language utterances with very large, multi-domain semantic grammars in real time, and is written in C++. G·Soup is a graphical user interface (GUI) to Soup written in Java with the double purpose of (i) enabling a more efficient grammar development and (ii) tracking the behavior of human grammar writers.

Soup and G·Soup are currently only available at Carnegie Mellon and the Universität Karlsruhe but feel free to contact Marsal Gavaldà if you are interested in using them.


to topback to top

2. The Soup Grammar Formalism

Soup handles context-free grammars, that is, rules consist of a left-hand side (LHS) (also called rule head, or nonterminal) and a right-hand side (RHS) (also called rule body, or rule rewrite).

2.1. Left-hand sides

In the context of a semantic grammar (i.e., a grammar in which the nonterminals correspond to concepts in the domain as opposed to syntactic categories)there are two kinds of LHSs:
  • principal: principal LHSs (also called concepts) are used to denote the relevant concepts or entities in the domain, e.g. temporal expressions in a scheduling task.
    Principal LHSs start with the square bracket [, e.g. [greeting].

  • auxiliary: auxiliary LHSs are used to group syntactically equivalent classes, and are usually removed from the final interpretation (see -print_aux_NT).
    Auxiliary LHSs start with a capial letter or with [_NT_, e.g. [_NT_free].

Besides being principal or auxiliary, there are a few other boolean features that characterize each LHS, namely:

  • Top-level: top-level LHSs are able to stand at the root of a parse tree, i.e. they constitute the starting symbols of the grammar.
    Top-level LHSs are marked with an s.

  • Speaker side: concepts can be marked as belonging to a specific set, e.g. to a certain spearker side. Then, at run-time and for each utterance, concepts not belonging to the current set can be deactivated. Non-marked concepts belong to all sets.
    Speaker side is marked with a digit 1..9 preceding concept definition.
    Example:
    s1[agent_greeting]
          (hello this is [travel_agency])
    Indicates that the concept [agent_greeting] is top-level (s) and belongs exclusively to the agent speaker side (1).

  • Lookup: lookup LHSs are those LHSs whose values are to be distinguished.
    Lookup LHSs are marked with a v.
    Example:

    v[day_of_week]         (monday)
            (tuesday)
            ; ...

    It does matter whether [day_of_week] is realized by monday or tuesday, i.e. the value of the LHS [day_of_week] is important.


    [available]
            (free)
            (available)
            (not [_NT_busy])
            ; ...


    It does not matter, at least in the context of a simple semantic grammar, whether availability is expressed with the word free or with the word available, i.e. the value of the LHS [available] is not important.

  • Character-level: character-level LHSs operate at the character level (as opposed to the usual word-level). They are very useful for languages with a rich morphology, e.g. with several word endings for verbs and adjectives.
    Character-level LHSs are marked with a c.
    Character-level LHSs cannot call word-level LHSs, i.e. they can only call single characters or character-level LHSs.
    Example. (Don't worry if you don't understand this example. You may want to read first section 2.2. below on right-hand sides). Given the grammar:

    s[request_reservation+room]
            (ich möchte *EIN *+ADJ zimmer)

    cEIN
            (e i n *ENDING)

    cADJ
            (ADJ_BODY *ENDING)

    cADJ_BODY
            (g e l b)         (k l e i n)
            ; ...

    cENDING
            (e r)
            (e)
            (e s)
            (e m)
            (e n)

    and input ich möchte ein kleines gelbes zimmer Soup/G·Soup outputs


    Click on the image to get a closer view

  • (Open-class proper-names: open-class proper-name LHSs have been developed for the C-Star multi-lingual translation effort. They suggest which LHSs can accept new proper names at run-time.
    Open-class proper-name LHSs are marked with an o.)

2.2. Right-hand sides

Right-hand sides or rule bodies tell how their left-hand side or rule head can be realized.
Example:

[farewell]         (bye)

indicates that the concept [farewell] can be expressed by the word bye.

To note:

  • Words (terminals) and LHSs (nonterminals) can be freely mixed.
    Example:
    [suggest_time]
            (how does [temporal] sound)
    uses the words how, does, sound and the LHS [temporal] in the RHS for [suggest_time].

  • A + preceding (with no space in between) a token (either a terminal or a nonterminal) indicates repeatability of the token.
    Example:
    		( +hi )
    		
    accepts hi, hi hi, hi hi hi, et cætera ad infinitum.

  • A * preceding (with no space in between) a token (either a terminal or a nonterminal) indicates optionality of the token.
    Example:
    		( hi *there )
    		
    accepts hi and hi there.

  • A *+ preceding (with no space in between) a token (either a terminal or a nonterminal) indicates optionality and repeatability of the token.
    Example:
    		( hi *+there )
    		
    accepts hi, hi there, hi there there, hi there there there, et cætera ad infinitum.

  • The special terminal _$any$_ is a wildcard that matches any out-of-vocabulary word, or any word present in the file given to command-line argument -allow_as_ANY.

  • All RHSs for a given LHS must be written consecutively.

  • Comments are introduced by ;, #, or %.

2.3. Example

Given the grammar:
s[nicety] ([greeting])
([farewell])
; top-level concept
[greeting] (hello *[name])
(how are you)
; optional [name]
[name] (_$any$_) ; [name] is open-class
[farewell] (*good +bye) ; this rule eats it all: "bye", "good bye", "bye bye", etc
s[request] ([suggest_meeting]) ; another top-level concept
[suggest_meeting] (are you [_NT_free] [temporal])
(how does [temporal] sound)
; no need to distinguish exact word used to convey being free
[_NT_free] (free)
(available)
; auxilary nonterminal: label will not appear in final parse tree
; but useful to group similar words
[temporal] (*on [d_o_w]) ; preposition not terribly crucial: let's make it optional
[d_o_w] (sunday)
(monday)
(tuesday)
(wedndesday)
(thursday)
(friday)
(saturday)

and input hello Tom, are you free on Tuesday Soup/G·Soup outputs


    Click on the image to get a closer view

to topback to top

3. The G·Soup Handbook

[Note: Section in development.]

  • Visualizing the Domain Model

    Click on the image to get a closer view

  • Visualizing top-level concepts

    Click on the image to get a closer view

  • Assessing rule coverage

    Click on the image to get a closer view

  • Detecting rule conflicts

    Click on the image to get a closer view

  • Annotating rules

Click on the image to get a closer view

More
to topback to top
Site maintained by: Céline Morel