assert Span("aa",0,2) == "aa"Spans
I followed this guide on how to make extension types for pandas
small_hash
small_hash (txt, length=6)
*A function that returns a small hash of a string
Args: txt (type): string to hash length (int, optional): length of hash. Defaults to 6.
Returns: type: description*
get_span_repr_format
get_span_repr_format ()
*Returns the span representation format.
Returns: (the span representation format, the number of characters to display in the span text)*
set_span_repr_format
set_span_repr_format (format=None, head:int=None)
*Sets the representation format for spans and the number of characters to display in the span text.
Parameters: format (str, optional): The representation format for spans. Defaults to None. head (int, optional): The number of characters to display in the span text. Defaults to None.*
ie
ie (s:__main__.Span)
Span
Span (doc, start=None, end=None, name=None)
*All the operations on a read-only sequence.
Concrete subclasses must override new or init, getitem, and len.*
doc = 'world'
df = pd.DataFrame([
[Span('hello',0,5),1],
[Span(doc,0,5),2],
[Span(doc,0,5),3],
], columns=['span','num'])
df| span | num | |
|---|---|---|
| 0 | (h, e, l, l, o) | 1 |
| 1 | (w, o, r, l, d) | 2 |
| 2 | (w, o, r, l, d) | 3 |
doc = 'world'
df = pd.DataFrame([
['hello',1],
['world',2],
['world',3],
], columns=['span','num'])
df| span | num | |
|---|---|---|
| 0 | hello | 1 |
| 1 | world | 2 |
| 2 | world | 3 |
#TODO from here, ok so We need union types and to make the span class print prettily
df.groupby('span').sum()| num | |
|---|---|
| span | |
| hello | 1 |
| world | 5 |
string = "hello stranger"
short_string = "hi"s = Span(string,0,len(string),name ='doc')
display(s)[@doc,0,14) "hello stra..."
pd.DataFrame({'span':[s]})| span | |
|---|---|
| 0 | (h, e, l, l, o, , s, t, r, a, n, g, e, r) |
df = pd.DataFrame({'span':[s]}).map(repr)
df| span | |
|---|---|
| 0 | [@doc,0,14) "hello stra..." |
s2 = Span(short_string)
display(s2)[@c22b5f,0,2) "hi"
assert s == 'hello stranger'
assert s[0:5] == 'hello'
assert not s == s[0:5]
assert f"{s[0:5].as_str()} darkness" == 'hello darkness'
assert s[0:5][1:4] == 'ell'