= print_ie("Hello, {}!", "world")
_ = print_ie("Hello, {!r}", {'complicated': 'object'}) _
Hello, world!
Hello, {'complicated': 'object'}
object_arity (arity)
return a schema of objects with given arity
str_arity (arity)
return a schema of strings with given arity
span_arity (arity)
return a schema of Spans with given arity
print_ie (fstring, *objects)
prints the objects using the format string fstring to the console used for debugging.
Details | |
---|---|
fstring | the format string used to print the objects |
objects |
rgx (pattern:str, text:Union[str,spannerlib.span.Span])
An IE function which runs regex using python’s re
and yields tuples of spans according to the number of capture groups in the pattern. capture groups are ordered by their starting position in the pattern. In the case of no capture groups, the function yields a single span of the entire match.
Type | Details | |
---|---|---|
pattern | str | the regex pattern to be matched |
text | Union | the text to be matched on, can be either a string or a span. |
text = "aaaaa@bbbbbbaa@bb"
# anonymous groups are not captured, so we return the entire match
pattern = '((?:a*)@(?:b*))'
assert list(rgx(pattern,text)) == [
('aaaaa@bbbbbb',),
('aa@bb',)
]
list(rgx(pattern,text))
[([@a254e9,0,12) "aaaaa@bbbb...",), ([@a254e9,12,17) "aa@bb",)]
[([@doc1,3,8) "aaaaa", [@doc1,9,15) "bbbbbb"),
([@doc1,15,17) "aa", [@doc1,18,20) "bb")]
assert list(rgx('(a*)@(b*)',document)) == [
(Span(document,3,8),Span(document,9,15)),
(Span(document,15,17), Span(document,18,20))]
list(rgx('(a*)@(b*)',document))
[([@doc1,3,8) "aaaaa", [@doc1,9,15) "bbbbbb"),
([@doc1,15,17) "aa", [@doc1,18,20) "bb")]
rgx_split (delim, text, initial_tag='Start Tag')
An IE function which given a delimeter rgx pattern and a text, returns tuples of spans of the form (delimeter_match, text_before_next_delimeter). Note that rgx pattern should not have any groups.
Type | Default | Details | |
---|---|---|---|
delim | the delimeter pattern to split on | ||
text | the text to be split, can be either string or Span | ||
initial_tag | str | Start Tag | the tag to be used incase the first split is not at the start of the text |
rgx_is_match (delim, text)
An IE function which given a delimeter rgx pattern and a text, returns True if any match is found, False otherwise.
Details | |
---|---|
delim | the delimeter pattern to split on |
text | the text to be split, can be either string or Span |
expr_eval (template, *inputs)
*Evaluate an expression template with the given inputs. The template should contain numerical indices that correspond to the positions of the inputs.
Returns: The result of evaluating the expression template with the given inputs.
Raises: ValueError: If the expression template is invalid or the number of inputs does not match the number of indices in the template.*
Details | |
---|---|
template | The expression template to be evaluated. |
inputs |
not_ie (val)
An IE function which negates the input value.
as_str (obj)
casts objects to strings
span_contained (s1, s2)
yields True if s1 is contained in s2, otherwise yield False
# usage example
doc1 = Span('hello darkness my old friend',name='doc1')
doc2 = Span('I come to talk to you again',name='doc2')
span1 = Span(doc1,1, 10)
span2 = Span(doc1,0, 11)
span3 = Span(doc1,2, 12)
span4 = Span(doc2,3,5)
assert list(span_contained(span1,span2)) == [True]
assert list(span_contained(span2,span1)) == [False]
assert list(span_contained(span1,span3)) == [False]
assert list(span_contained(span1,span4)) == [False]
deconstruct_span (span)
yields the doc id, start and end of the span
read (text_path)
Reads from file and return it’s content as a string
Details | |
---|---|
text_path | the path to the text file to read from |
read_span (text_path)
Reads from file and return it’s content, as a span with the name of the file as the doc id.
Details | |
---|---|
text_path | the path to the text file to read from |
Spannerlib also supports some pandas aggregation functions
name | function | input_schema | output_schema | type |
---|---|---|---|---|
print_ie | object_arity | ['object'] | IE Function | |
rgx | rgx | ['str', ('str', 'Span')] | span_arity | IE Function |
rgx_split | rgx_split | ['str', ('str', 'Span')] | ['Span', 'Span'] | IE Function |
rgx_is_match | rgx_is_match | ['str', ('str', 'Span')] | ['bool'] | IE Function |
expr_eval | expr_eval | object_arity | ['object'] | IE Function |
not | not_ie | ['bool'] | ['bool'] | IE Function |
as_str | as_str | ['object'] | ['str'] | IE Function |
span_contained | span_contained | ['Span', 'Span'] | ['bool'] | IE Function |
deconstruct_span | deconstruct_span | ['Span'] | ['str', 'int', 'int'] | IE Function |
read | read | ['str'] | ['str'] | IE Function |
read_span | read_span | ['str'] | ['Span'] | IE Function |
count | count | ['object'] | ['int'] | Aggregation Function |
sum | sum | ['Real'] | ['Real'] | Aggregation Function |
avg | avg | ['Real'] | ['Real'] | Aggregation Function |
max | max | ['Real'] | ['Real'] | Aggregation Function |
min | min | ['Real'] | ['Real'] | Aggregation Function |