Basic Callbacks

Variable schema utils


source

object_arity

 object_arity (arity)

return a schema of objects with given arity


source

str_arity

 str_arity (arity)

return a schema of strings with given arity


source

span_arity

 span_arity (arity)

return a schema of Spans with given arity

Debugging IEs


source

regex functions


source

rgx

 rgx (pattern:str, text:Union[str,spannerlib.span.Span])

An IE function which runs regex using python’s re and yields tuples of spans according to the number of capture groups in the pattern. capture groups are ordered by their starting position in the pattern. In the case of no capture groups, the function yields a single span of the entire match.

Type Details
pattern str the regex pattern to be matched
text Union the text to be matched on, can be either a string or a span.
text = "aaaaa@bbbbbbaa@bb"
pattern = '(?P<c>(?P<a>a*)@(?P<b>b*))'
assert list(rgx(pattern,text)) == [
    ('aaaaa@bbbbbb', 'aaaaa', 'bbbbbb'),
    ('aa@bb', 'aa', 'bb')
]
text = "aaaaa@bbbbbbaa@bb"
# anonymous groups are not captured, so we return the entire match
pattern = '((?:a*)@(?:b*))'
assert list(rgx(pattern,text)) == [
    ('aaaaa@bbbbbb',),
    ('aa@bb',)
]
list(rgx(pattern,text))
[([@a254e9,0,12) "aaaaa@bbbb...",), ([@a254e9,12,17) "aa@bb",)]
document = Span('dddaaaaa@bbbbbbaa@bb',name = 'doc1')
document
[@doc1,0,20) "dddaaaaa@b..."
list(rgx('(a*)@(b*)',document))
[([@doc1,3,8) "aaaaa", [@doc1,9,15) "bbbbbb"),
 ([@doc1,15,17) "aa", [@doc1,18,20) "bb")]
assert list(rgx('(a*)@(b*)',document)) == [
    (Span(document,3,8),Span(document,9,15)),
    (Span(document,15,17), Span(document,18,20))]
list(rgx('(a*)@(b*)',document))
[([@doc1,3,8) "aaaaa", [@doc1,9,15) "bbbbbb"),
 ([@doc1,15,17) "aa", [@doc1,18,20) "bb")]
sub_doc = document.slice(3,None)
assert list(rgx('(a*)@(b*)',sub_doc)) == list(rgx('(a*)@(b*)',document))

source

rgx_split

 rgx_split (delim, text, initial_tag='Start Tag')

An IE function which given a delimeter rgx pattern and a text, returns tuples of spans of the form (delimeter_match, text_before_next_delimeter). Note that rgx pattern should not have any groups.

Type Default Details
delim the delimeter pattern to split on
text the text to be split, can be either string or Span
initial_tag str Start Tag the tag to be used incase the first split is not at the start of the text
assert list(rgx_split('a|x','bbbannnnxdddaca')) == [
    ('Start Tag', 'bbb'),
    ('a', 'nnnn'),
    ('x', 'ddd'),
    ('a', 'c'),
    ('a', '')]

assert list(rgx_split('a|x','abbbannnnxdddaca')) == [
    ('a', 'bbb'),
    ('a', 'nnnn'),
    ('x', 'ddd'),
    ('a', 'c'),
    ('a', '')]

source

rgx_is_match

 rgx_is_match (delim, text)

An IE function which given a delimeter rgx pattern and a text, returns True if any match is found, False otherwise.

Details
delim the delimeter pattern to split on
text the text to be split, can be either string or Span
assert rgx_is_match('(a*)@(b*)',document) == [True]
assert rgx_is_match('(a*)@(e+)',document) == [False]

Expression eval


source

expr_eval

 expr_eval (template, *inputs)

*Evaluate an expression template with the given inputs. The template should contain numerical indices that correspond to the positions of the inputs.

Returns: The result of evaluating the expression template with the given inputs.

Raises: ValueError: If the expression template is invalid or the number of inputs does not match the number of indices in the template.*

Details
template The expression template to be evaluated.
inputs
assert next(expr_eval('{0} + {1}',1,2)) == 3
a = Span('aaaa',1,3)
b = Span('bbbb',3,4)
assert next(expr_eval('{0}.end == {1}.start',a,b))
assert not next(expr_eval('{0}.doc == {1}.doc',a,b))
assert next(expr_eval('({0}.doc != {1}.doc) & ({0}.end == {1}.start)',a,b))

source

not_ie

 not_ie (val)

An IE function which negates the input value.

assert not_ie(True) == [False]
assert not_ie(False) == [True]

Span operations


source

as_str

 as_str (obj)

casts objects to strings


source

span_contained

 span_contained (s1, s2)

yields True if s1 is contained in s2, otherwise yield False

# usage example
doc1 = Span('hello darkness my old friend',name='doc1')
doc2 = Span('I come to talk to you again',name='doc2')

span1 = Span(doc1,1, 10)
span2 = Span(doc1,0, 11)
span3 = Span(doc1,2, 12)
span4 = Span(doc2,3,5)



assert list(span_contained(span1,span2)) == [True]
assert list(span_contained(span2,span1)) == [False]
assert list(span_contained(span1,span3)) == [False]
assert list(span_contained(span1,span4)) == [False]

source

deconstruct_span

 deconstruct_span (span)

yields the doc id, start and end of the span

doc = Span('hello darkness my old friend',name='doc1')
doc2 = Span('I come to talk to you again')

assert list(deconstruct_span(doc)) == [('doc1', 0, 28)]
assert list(deconstruct_span(doc2))== [('f8f5e8', 0, 27)]

source

read

 read (text_path)

Reads from file and return it’s content as a string

Details
text_path the path to the text file to read from

source

read_span

 read_span (text_path)

Reads from file and return it’s content, as a span with the name of the file as the doc id.

Details
text_path the path to the text file to read from
path = Path('sample1.txt')
path.write_text('hello darkness my old friend')
text = list(read('sample1.txt'))[0]
text_span = list(read_span('sample1.txt'))[0]

path.unlink()

assert text == "hello darkness my old friend"
assert text_span == text
text_span
[@sample1.txt,0,28) "hello dark..."

Basic Aggs

Spannerlib also supports some pandas aggregation functions

Exported source
DefaultAGGs().add('count','count',[object],[int])
DefaultAGGs().add('sum','sum',[Real],[Real])
DefaultAGGs().add('avg','avg',[Real],[Real])
DefaultAGGs().add('max','max',[Real],[Real])
DefaultAGGs().add('min','min',[Real],[Real])

Callback names and Schemas

Table 1: Registered Callbacks
name function input_schema output_schema type
print print_ie object_arity ['object'] IE Function
rgx rgx ['str', ('str', 'Span')] span_arity IE Function
rgx_split rgx_split ['str', ('str', 'Span')] ['Span', 'Span'] IE Function
rgx_is_match rgx_is_match ['str', ('str', 'Span')] ['bool'] IE Function
expr_eval expr_eval object_arity ['object'] IE Function
not not_ie ['bool'] ['bool'] IE Function
as_str as_str ['object'] ['str'] IE Function
span_contained span_contained ['Span', 'Span'] ['bool'] IE Function
deconstruct_span deconstruct_span ['Span'] ['str', 'int', 'int'] IE Function
read read ['str'] ['str'] IE Function
read_span read_span ['str'] ['Span'] IE Function
count count ['object'] ['int'] Aggregation Function
sum sum ['Real'] ['Real'] Aggregation Function
avg avg ['Real'] ['Real'] Aggregation Function
max max ['Real'] ['Real'] Aggregation Function
min min ['Real'] ['Real'] Aggregation Function