Basic Callbacks

Variable schema utils

object_arity

 object_arity (arity)

return a schema of objects with given arity

source

str_arity

 str_arity (arity)

return a schema of strings with given arity

source

span_arity

 span_arity (arity)

return a schema of Spans with given arity

Debugging IEs

source

print_ie

 print_ie (fstring, *objects)

prints the objects using the format string fstring to the console used for debugging.

	Details
fstring	the format string used to print the objects
objects

_ = print_ie("Hello, {}!", "world")
_ = print_ie("Hello, {!r}", {'complicated': 'object'})

Hello, world!
Hello, {'complicated': 'object'}

regex functions

source

rgx

 rgx (pattern:str, text:Union[str,spannerlib.span.Span])

An IE function which runs regex using python’s re and yields tuples of spans according to the number of capture groups in the pattern. capture groups are ordered by their starting position in the pattern. In the case of no capture groups, the function yields a single span of the entire match.

	Type	Details
pattern	str	the regex pattern to be matched
text	Union	the text to be matched on, can be either a string or a span.

text = "aaaaa@bbbbbbaa@bb"
pattern = '(?P<c>(?P<a>a*)@(?P<b>b*))'
assert list(rgx(pattern,text)) == [
    ('aaaaa@bbbbbb', 'aaaaa', 'bbbbbb'),
    ('aa@bb', 'aa', 'bb')
]

text = "aaaaa@bbbbbbaa@bb"
# anonymous groups are not captured, so we return the entire match
pattern = '((?:a*)@(?:b*))'
assert list(rgx(pattern,text)) == [
    ('aaaaa@bbbbbb',),
    ('aa@bb',)
]
list(rgx(pattern,text))

[([@a254e9,0,12) "aaaaa@bbbb...",), ([@a254e9,12,17) "aa@bb",)]

document = Span('dddaaaaa@bbbbbbaa@bb',name = 'doc1')
document

[@doc1,0,20) "dddaaaaa@b..."

list(rgx('(a*)@(b*)',document))

[([@doc1,3,8) "aaaaa", [@doc1,9,15) "bbbbbb"),
 ([@doc1,15,17) "aa", [@doc1,18,20) "bb")]

assert list(rgx('(a*)@(b*)',document)) == [
    (Span(document,3,8),Span(document,9,15)),
    (Span(document,15,17), Span(document,18,20))]
list(rgx('(a*)@(b*)',document))

[([@doc1,3,8) "aaaaa", [@doc1,9,15) "bbbbbb"),
 ([@doc1,15,17) "aa", [@doc1,18,20) "bb")]

sub_doc = document.slice(3,None)
assert list(rgx('(a*)@(b*)',sub_doc)) == list(rgx('(a*)@(b*)',document))

source

rgx_split

 rgx_split (delim, text, initial_tag='Start Tag')

An IE function which given a delimeter rgx pattern and a text, returns tuples of spans of the form (delimeter_match, text_before_next_delimeter). Note that rgx pattern should not have any groups.

	Type	Default	Details
delim			the delimeter pattern to split on
text			the text to be split, can be either string or Span
initial_tag	str	Start Tag	the tag to be used incase the first split is not at the start of the text

assert list(rgx_split('a|x','bbbannnnxdddaca')) == [
    ('Start Tag', 'bbb'),
    ('a', 'nnnn'),
    ('x', 'ddd'),
    ('a', 'c'),
    ('a', '')]

assert list(rgx_split('a|x','abbbannnnxdddaca')) == [
    ('a', 'bbb'),
    ('a', 'nnnn'),
    ('x', 'ddd'),
    ('a', 'c'),
    ('a', '')]

source

rgx_is_match

 rgx_is_match (delim, text)

An IE function which given a delimeter rgx pattern and a text, returns True if any match is found, False otherwise.

	Details
delim	the delimeter pattern to split on
text	the text to be split, can be either string or Span

assert rgx_is_match('(a*)@(b*)',document) == [True]
assert rgx_is_match('(a*)@(e+)',document) == [False]

Expression eval

source

expr_eval

 expr_eval (template, *inputs)

*Evaluate an expression template with the given inputs. The template should contain numerical indices that correspond to the positions of the inputs.

Returns: The result of evaluating the expression template with the given inputs.

Raises: ValueError: If the expression template is invalid or the number of inputs does not match the number of indices in the template.*

	Details
template	The expression template to be evaluated.
inputs

assert next(expr_eval('{0} + {1}',1,2)) == 3

a = Span('aaaa',1,3)
b = Span('bbbb',3,4)

assert next(expr_eval('{0}.end == {1}.start',a,b))
assert not next(expr_eval('{0}.doc == {1}.doc',a,b))
assert next(expr_eval('({0}.doc != {1}.doc) & ({0}.end == {1}.start)',a,b))

source

not_ie

 not_ie (val)

An IE function which negates the input value.

assert not_ie(True) == [False]
assert not_ie(False) == [True]

Span operations

source

as_str

 as_str (obj)

casts objects to strings

source

span_contained

 span_contained (s1, s2)

yields True if s1 is contained in s2, otherwise yield False

# usage example
doc1 = Span('hello darkness my old friend',name='doc1')
doc2 = Span('I come to talk to you again',name='doc2')

span1 = Span(doc1,1, 10)
span2 = Span(doc1,0, 11)
span3 = Span(doc1,2, 12)
span4 = Span(doc2,3,5)



assert list(span_contained(span1,span2)) == [True]
assert list(span_contained(span2,span1)) == [False]
assert list(span_contained(span1,span3)) == [False]
assert list(span_contained(span1,span4)) == [False]

source

deconstruct_span

 deconstruct_span (span)

yields the doc id, start and end of the span

doc = Span('hello darkness my old friend',name='doc1')
doc2 = Span('I come to talk to you again')

assert list(deconstruct_span(doc)) == [('doc1', 0, 28)]
assert list(deconstruct_span(doc2))== [('f8f5e8', 0, 27)]

source

read

 read (text_path)

Reads from file and return it’s content as a string

	Details
text_path	the path to the text file to read from

source

read_span

 read_span (text_path)

Reads from file and return it’s content, as a span with the name of the file as the doc id.

	Details
text_path	the path to the text file to read from

path = Path('sample1.txt')
path.write_text('hello darkness my old friend')
text = list(read('sample1.txt'))[0]
text_span = list(read_span('sample1.txt'))[0]

path.unlink()

assert text == "hello darkness my old friend"
assert text_span == text
text_span

[@sample1.txt,0,28) "hello dark..."

Basic Aggs

Spannerlib also supports some pandas aggregation functions

Exported source

DefaultAGGs().add('count','count',[object],[int])
DefaultAGGs().add('sum','sum',[Real],[Real])
DefaultAGGs().add('avg','avg',[Real],[Real])
DefaultAGGs().add('max','max',[Real],[Real])
DefaultAGGs().add('min','min',[Real],[Real])

Callback names and Schemas

Table 1: Registered Callbacks

name	function	input_schema	output_schema	type
print	print_ie	object_arity	['object']	IE Function
rgx	rgx	['str', ('str', 'Span')]	span_arity	IE Function
rgx_split	rgx_split	['str', ('str', 'Span')]	['Span', 'Span']	IE Function
rgx_is_match	rgx_is_match	['str', ('str', 'Span')]	['bool']	IE Function
expr_eval	expr_eval	object_arity	['object']	IE Function
not	not_ie	['bool']	['bool']	IE Function
as_str	as_str	['object']	['str']	IE Function
span_contained	span_contained	['Span', 'Span']	['bool']	IE Function
deconstruct_span	deconstruct_span	['Span']	['str', 'int', 'int']	IE Function
read	read	['str']	['str']	IE Function
read_span	read_span	['str']	['Span']	IE Function
count	count	['object']	['int']	Aggregation Function
sum	sum	['Real']	['Real']	Aggregation Function
avg	avg	['Real']	['Real']	Aggregation Function
max	max	['Real']	['Real']	Aggregation Function
min	min	['Real']	['Real']	Aggregation Function