GenAI – What Is ‘Trie’ Prefix Tree ?
Table Of Content:
- What Is Trie ?
- Key Properties Of Trie.
- Example Of Trie.
- What Can We Do With Trie?
- Why Not Just Use A List ?
- What Is Prefix Lookup?
- How ‘Trie’ Helps Text Tokenization ?
(1) What Is ‘Trie’ ?



(2) Key Properties Of ‘Trie’ .

(3) How ‘Trie’ Works Internally ?

(4) How “New York City” Will Get Inserted ?
['Ney York City', 'India', 'South Africa']



(5) What Happens If The Word Does Not Exist ?



(6) How Trie Helps In Text Tokenization ?





(7) How Trie will give “New York City” as a single token , when New → match , it will give New, then it will match ‘York’ it matches it will give ‘York’, why again it will match ‘New York’ which is a combination of New and York

(8) How Trie Helps In Tokenization Process ?
def loadDict_(self, fnm):
logging.info(f"[HUQIE]:Build trie from {fnm}")
try:
of = open(fnm, "r", encoding='utf-8')
while True:
line = of.readline()
if not line:
break
line = re.sub(r"[\r\n]+", "", line)
line = re.split(r"[ \t]", line)
k = self.key_(line[0])
F = int(math.log(float(line[1]) / self.DENOMINATOR) + .5)
if k not in self.trie_ or self.trie_[k][0] < F:
self.trie_[self.key_(line[0])] = (F, line[2])
self.trie_[self.rkey_(line[0])] = 1
dict_file_cache = fnm + ".trie"
logging.info(f"[HUQIE]:Build trie cache to {dict_file_cache}")
self.trie_.save(dict_file_cache)
of.close()
except Exception:
logging.exception(f"[HUQIE]:Build trie {fnm} failed")


