Developing The LogX Reader

In developing the logx-reader there are three pieces: rules (and corresponding tests), taxonomy, and actions. The REST API and its endpoints are already defined. For information on how annotations are generated for use in rules, or how to use alternate processors, see the Annotations section.

Rules

Before developing a new rule or set of related rules, it is best to first define tests which describe the expected behavior of the rule(s). To develop tests follow the Testing section.

Rules in the logx-reader are defined using Odin. For an in depth look at Odin and writing a grammar, please see the manual.

The logx-reader has two grammars, entities and events. The grammars can be found under reader/grammars/logx. An identical set of grammars can be found under reader/src/main/resources/org/parsertongue/reader/grammars/logx, however, when developing the grammars the user should only modify the top level grammars. These will later be edited and copied to the src grammars via action.

Rules can be developed with live reloading following the instructions in the Development/Install section.

Writing Rules

Odin rules are written in yaml and run over annotated text (the Odin manual includes a gentle introduction to YAML syntax). Annotated text is produced using an external NLP service (such as StanfordCoreNLP or SpaCY), however, Penn Tags and Universal Dependencies are always included in the annotations.

There are two types of rules, token and dependency. If the type is not specified it defaults to type=dependency. Token rules are defined over the set of tokens and their values, while dependency rules are defined over the set of dependencies.

Token rule

If you wanted to label all tokens which carry the NER tag "LOCATION" as "Location," the following rule can be used.

- name: ner-loc
  label: Location
  priority: 1
  type: token
  pattern: |
    [entity=LOCATION]+

Note

Rules can be given a priority which determines the order in which rules apply. For example, a rule with priority: 1 will only be run on the first iteration, whereas a rule with priority: "2+" will be run on all iterations following the first.

For example, given the text "How many F16 engines are heading to The United Kingdom?" the sequence "The United Kingdom" would be labeled Location by this rule.

Graph traversal rule

If you wanted to capture a "risk of" event, you could start with a simple rule defining a traversal over a syntactic dependency graph such as the following:

- name: risk-of
    label: RiskOf
    example: "What is the risk of spoilage for frozen fish heading to Dubai on August 24th 2020?"
    pattern: |
      trigger = [lemma=risk] of
      type:Entity = nmod_of

This rule finds all words whose lemma is "risk" and if followed by "of" labels the sequence as a trigger of a RiskOf event, which is given a type defined by the dependency relation "nmod_of." In the provided example "spoilage" would be labeled as the type of the RiskOf event.

Note

When the rules are run on text, a JSON file of labeled mentions is generated. Mentions are the matches found by the rules within the text, and the labels are included in a heiarchy defined in the Taxonomy.

Taxonomy

The taxonomy is a set of heiarchical relationships between mention labels. Like the grammars, the taxonomy can be found in two places but only the top level taxonomy should be modified in development.

A sample of the logx-reader taxonomy can be seen here:

- Measurement:
  - Unit
  - NumericExpression:
    - Quantity

Actions

Generally, when developing rules there should be no need to change the existing actions. However, if it is necessary, new actions or modifications to existing actions can be made in reader/src/main/scala/org/parsertongue/mr/logx/odin/LogxActions.scala. For more information about actions, see How it Works.