assert Span("aa",0,2) == "aa"
Spans
I followed this guide on how to make extension types for pandas
small_hash
small_hash (txt, length=6)
*A function that returns a small hash of a string
Args: txt (type): string to hash length (int, optional): length of hash. Defaults to 6.
Returns: type: description*
get_span_repr_format
get_span_repr_format ()
*Returns the span representation format.
Returns: (the span representation format, the number of characters to display in the span text)*
set_span_repr_format
set_span_repr_format (format=None, head:int=None)
*Sets the representation format for spans and the number of characters to display in the span text.
Parameters: format (str, optional): The representation format for spans. Defaults to None. head (int, optional): The number of characters to display in the span text. Defaults to None.*
ie
ie (s:__main__.Span)
Span
Span (doc, start=None, end=None, name=None)
*All the operations on a read-only sequence.
Concrete subclasses must override new or init, getitem, and len.*
= 'world'
doc = pd.DataFrame([
df 'hello',0,5),1],
[Span(0,5),2],
[Span(doc,0,5),3],
[Span(doc,=['span','num'])
], columns df
span | num | |
---|---|---|
0 | (h, e, l, l, o) | 1 |
1 | (w, o, r, l, d) | 2 |
2 | (w, o, r, l, d) | 3 |
= 'world'
doc = pd.DataFrame([
df 'hello',1],
['world',2],
['world',3],
[=['span','num'])
], columns df
span | num | |
---|---|---|
0 | hello | 1 |
1 | world | 2 |
2 | world | 3 |
#TODO from here, ok so We need union types and to make the span class print prettily
'span').sum() df.groupby(
num | |
---|---|
span | |
hello | 1 |
world | 5 |
= "hello stranger"
string = "hi" short_string
= Span(string,0,len(string),name ='doc')
s display(s)
[@doc,0,14) "hello stra..."
'span':[s]}) pd.DataFrame({
span | |
---|---|
0 | (h, e, l, l, o, , s, t, r, a, n, g, e, r) |
= pd.DataFrame({'span':[s]}).map(repr)
df df
span | |
---|---|
0 | [@doc,0,14) "hello stra..." |
= Span(short_string)
s2 display(s2)
[@c22b5f,0,2) "hi"
assert s == 'hello stranger'
assert s[0:5] == 'hello'
assert not s == s[0:5]
assert f"{s[0:5].as_str()} darkness" == 'hello darkness'
assert s[0:5][1:4] == 'ell'