Spans

Span class and how to interface it with pandas

I followed this guide on how to make extension types for pandas


source

small_hash

 small_hash (txt, length=6)

*A function that returns a small hash of a string

Args: txt (type): string to hash length (int, optional): length of hash. Defaults to 6.

Returns: type: description*


source

get_span_repr_format

 get_span_repr_format ()

*Returns the span representation format.

Returns: (the span representation format, the number of characters to display in the span text)*


source

set_span_repr_format

 set_span_repr_format (format=None, head:int=None)

*Sets the representation format for spans and the number of characters to display in the span text.

Parameters: format (str, optional): The representation format for spans. Defaults to None. head (int, optional): The number of characters to display in the span text. Defaults to None.*


source

ie

 ie (s:__main__.Span)

source

Span

 Span (doc, start=None, end=None, name=None)

*All the operations on a read-only sequence.

Concrete subclasses must override new or init, getitem, and len.*

assert Span("aa",0,2) == "aa"
doc = 'world'
df = pd.DataFrame([
    [Span('hello',0,5),1],
    [Span(doc,0,5),2],
    [Span(doc,0,5),3],
], columns=['span','num'])
df
span num
0 (h, e, l, l, o) 1
1 (w, o, r, l, d) 2
2 (w, o, r, l, d) 3
doc = 'world'
df = pd.DataFrame([
    ['hello',1],
    ['world',2],
    ['world',3],
], columns=['span','num'])
df
span num
0 hello 1
1 world 2
2 world 3
#TODO from here, ok so We need union types and to make the span class print prettily
df.groupby('span').sum()
num
span
hello 1
world 5
string = "hello stranger"
short_string = "hi"
s = Span(string,0,len(string),name ='doc')
display(s)
[@doc,0,14) "hello stra..."
pd.DataFrame({'span':[s]})
span
0 (h, e, l, l, o, , s, t, r, a, n, g, e, r)
df = pd.DataFrame({'span':[s]}).map(repr)
df
span
0 [@doc,0,14) "hello stra..."
s2 = Span(short_string)
display(s2)
[@c22b5f,0,2) "hi"
assert s == 'hello stranger'
assert s[0:5] == 'hello'
assert not s == s[0:5]
assert f"{s[0:5].as_str()} darkness" == 'hello darkness'
assert s[0:5][1:4] == 'ell'