GenAI – What Is ‘Trie’ Prefix Tree ?


GenAI – What Is ‘Trie’ Prefix Tree ?

Table Of Content:

  1. What Is Trie ?
  2. Key Properties Of Trie.
  3. Example Of Trie.
  4. What Can We Do With Trie?
  5. Why Not Just Use A List ?
  6. What Is Prefix Lookup?
  7. How ‘Trie’ Helps Text Tokenization ? 

(1) What Is ‘Trie’ ?

(2) Key Properties Of ‘Trie’ .

(3) How ‘Trie’ Works Internally ?

(4) How “New York City” Will Get Inserted ?

['Ney York City', 'India', 'South Africa']

(5) What Happens If The Word Does Not Exist ?

(6) How Trie Helps In Text Tokenization ?

(7) How Trie will give “New York City” as a single token , when New → match , it will give New, then it will match ‘York’ it matches it will give ‘York’, why again it will match ‘New York’ which is a combination of New and York

(8) How Trie Helps In Tokenization Process ?

def loadDict_(self, fnm):
        logging.info(f"[HUQIE]:Build trie from {fnm}")
        try:
            of = open(fnm, "r", encoding='utf-8')
            while True:
                line = of.readline()
                if not line:
                    break
                line = re.sub(r"[\r\n]+", "", line)
                line = re.split(r"[ \t]", line)
                k = self.key_(line[0])
                F = int(math.log(float(line[1]) / self.DENOMINATOR) + .5)
                if k not in self.trie_ or self.trie_[k][0] < F:
                    self.trie_[self.key_(line[0])] = (F, line[2])
                self.trie_[self.rkey_(line[0])] = 1

            dict_file_cache = fnm + ".trie"
            logging.info(f"[HUQIE]:Build trie cache to {dict_file_cache}")
            self.trie_.save(dict_file_cache)
            of.close()
        except Exception:
            logging.exception(f"[HUQIE]:Build trie {fnm} failed")

Leave a Reply

Your email address will not be published. Required fields are marked *