| by YoungTimes | No comments

面试刷题-Trie树-单词搜索

题目描述

给定一个 m x n 二维字符网格 board 和一个单词(字符串)列表 words,找出所有同时在二维网格和字典中出现的单词。

单词必须按照字母顺序,通过 相邻的单元格 内的字母构成,其中“相邻”单元格是那些水平相邻或垂直相邻的单元格。同一个单元格内的字母在一个单词中不允许被重复使用。

(从当前位置开始,向前、向后、向上、向下搜索,查看搜索路径组成的单词是否在words中出现过。)

示例 1:

输入:board = [[“o”,”a”,”a”,”n”],[“e”,”t”,”a”,”e”],[“i”,”h”,”k”,”r”],[“i”,”f”,”l”,”v”]], words = [“oath”,”pea”,”eat”,”rain”]
输出:[“eat”,”oath”]

基于前缀树(Trie树)的回溯

解决这个问题的关键在于我们如何从字典中找到单词的匹配项,最直观的思路就是将字典中的所有单词都放入到Map或者Dict中,然后一边遍历网格,一边查询搜索过的字符组成的单词是否在Map/Dict中。显然这种效率比较低。

改进的思路:1、采用回溯的算法,避免每次都从头开始搜索;2、如果我们知道给定前缀在字典中不存在任何单词匹配,那么就不需要在这个方向上继续搜索,Trie树可以很方便的实现前缀的查询。

来源:https://leetcode-cn.com/problems/word-search-ii/solution/dan-ci-sou-suo-ii-by-leetcode/

代码实现如下:

1、TrieNode结构

我们在TrieNode中记录了以该Node结尾的Word,这样就避免了在网格遍历过程中还要记录走过的路径。只要搜索到该TrieNode就可以直接取到搜索路径的所有字符组成的Word。

class TrieNode {
public:
    bool is_end{false};
    std::unordered_map<char, std::shared_ptr<TrieNode>> childs;
    std::string word;
};

2、构建Trie树

将所有的words构建Trie树,后面用这个Trie树查询前缀组成的字符串是否在字典中。

auto root = std::make_shared<TrieNode>();
for (const auto& word : words) {
    auto node = root;
    for (size_t i = 0; i < word.size(); i++) {
        const auto& ch = word.at(i);

        if (node->childs.count(ch) <= 0) {
            node->childs[ch] = std::make_shared<TrieNode>();
        }

        node = node->childs.at(ch);
    }

    node->word = word;
    node->is_end = true;
}

3、回溯遍历搜索

代码中有几个技巧:

1)当首次查询到word之后,清空word内容。这样就解决了结果集中出现重复项的问题,避免了要先搜索匹配结果再去重的过程。

if (node->childs.at(ch)->is_end == true) {
    if (!node->childs[ch]->word.empty()) {
        rets.emplace_back(node->childs[ch]->word);
        node->childs[ch]->word.clear();
    }
}

2)在搜索过程中引入回溯机制

在搜索过程中,Trie树遍历与回溯过程一起进行,提高了搜索效率。

3)回溯过程中对Trie树进行剪枝

来源:https://leetcode-cn.com/problems/word-search-ii/solution/dan-ci-sou-suo-ii-by-leetcode/

对于 Trie 中的叶节点,一旦遍历它(即找到匹配的单词),就不需要再遍历它了,因此我们可以把它从树上剪下来。逐渐地,这些非叶节点可以成为叶节点以后。在极端情况下,一旦我们找到字典中所有单词的匹配项,Trie 就会变成空的,极大的提升在线检索的效率。

剪枝的代码如下:

if (node->childs[ch]->childs.size() <= 0) {
    node->childs.erase(ch);
}

剪枝优化最典型的是下面这个测试用例,剪枝之前耗时1516ms
,剪枝之后耗时4ms

[["a","a","a","a","a","a","a","a","a","a","a","a"],["a","a","a","a","a","a","a","a","a","a","a","a"],["a","a","a","a","a","a","a","a","a","a","a","a"],["a","a","a","a","a","a","a","a","a","a","a","a"],["a","a","a","a","a","a","a","a","a","a","a","a"],["a","a","a","a","a","a","a","a","a","a","a","a"],["a","a","a","a","a","a","a","a","a","a","a","a"],["a","a","a","a","a","a","a","a","a","a","a","a"],["a","a","a","a","a","a","a","a","a","a","a","a"],["a","a","a","a","a","a","a","a","a","a","a","a"],["a","a","a","a","a","a","a","a","a","a","a","a"],["a","a","a","a","a","a","a","a","a","a","a","a"]]
["lllllll","fffffff","ssss","s","rr","xxxx","ttt","eee","ppppppp","iiiiiiiii","xxxxxxxxxx","pppppp","xxxxxx","yy","jj","ccc","zzz","ffffffff","r","mmmmmmmmm","tttttttt","mm","ttttt","qqqqqqqqqq","z","aaaaaaaa","nnnnnnnnn","v","g","ddddddd","eeeeeeeee","aaaaaaa","ee","n","kkkkkkkkk","ff","qq","vvvvv","kkkk","e","nnn","ooo","kkkkk","o","ooooooo","jjj","lll","ssssssss","mmmm","qqqqq","gggggg","rrrrrrrrrr","iiii","bbbbbbbbb","aaaaaa","hhhh","qqq","zzzzzzzzz","xxxxxxxxx","ww","iiiiiii","pp","vvvvvvvvvv","eeeee","nnnnnnn","nnnnnn","nn","nnnnnnnn","wwwwwwww","vvvvvvvv","fffffffff","aaa","p","ddd","ppppppppp","fffff","aaaaaaaaa","oooooooo","jjjj","xxx","zz","hhhhh","uuuuu","f","ddddddddd","zzzzzz","cccccc","kkkkkk","bbbbbbbb","hhhhhhhhhh","uuuuuuu","cccccccccc","jjjjj","gg","ppp","ccccccccc","rrrrrr","c","cccccccc","yyyyy","uuuu","jjjjjjjj","bb","hhh","l","u","yyyyyy","vvv","mmm","ffffff","eeeeeee","qqqqqqq","zzzzzzzzzz","ggg","zzzzzzz","dddddddddd","jjjjjjj","bbbbb","ttttttt","dddddddd","wwwwwww","vvvvvv","iii","ttttttttt","ggggggg","xx","oooooo","cc","rrrr","qqqq","sssssss","oooo","lllllllll","ii","tttttttttt","uuuuuu","kkkkkkkk","wwwwwwwwww","pppppppppp","uuuuuuuu","yyyyyyy","cccc","ggggg","ddddd","llllllllll","tttt","pppppppp","rrrrrrr","nnnn","x","yyy","iiiiiiiiii","iiiiii","llll","nnnnnnnnnn","aaaaaaaaaa","eeeeeeeeee","m","uuu","rrrrrrrr","h","b","vvvvvvv","ll","vv","mmmmmmm","zzzzz","uu","ccccccc","xxxxxxx","ss","eeeeeeee","llllllll","eeee","y","ppppp","qqqqqq","mmmmmm","gggg","yyyyyyyyy","jjjjjj","rrrrr","a","bbbb","ssssss","sss","ooooo","ffffffffff","kkk","xxxxxxxx","wwwwwwwww","w","iiiiiiii","ffff","dddddd","bbbbbb","uuuuuuuuu","kkkkkkk","gggggggggg","qqqqqqqq","vvvvvvvvv","bbbbbbbbbb","nnnnn","tt","wwww","iiiii","hhhhhhh","zzzzzzzz","ssssssssss","j","fff","bbbbbbb","aaaa","mmmmmmmmmm","jjjjjjjjjj","sssss","yyyyyyyy","hh","q","rrrrrrrrr","mmmmmmmm","wwwww","www","rrr","lllll","uuuuuuuuuu","oo","jjjjjjjjj","dddd","pppp","hhhhhhhhh","kk","gggggggg","xxxxx","vvvv","d","qqqqqqqqq","dd","ggggggggg","t","yyyy","bbb","yyyyyyyyyy","tttttt","ccccc","aa","eeeeee","llllll","kkkkkkkkkk","sssssssss","i","hhhhhh","oooooooooo","wwwwww","ooooooooo","zzzz","k","hhhhhhhh","aaaaa","mmmmm"]

4)避免循环搜索

在搜索过程中,搜索过的节点不能重复搜索。为了解决这个问题,需要给搜索过的节点打个标记。

打标记的方法很多,这里只说两种,一种是定义一个与网格大小一样的布尔类型数组,访问过的节点标记为True,未访问过的节点标记为False;另一种是把节点的值修改为标志变量,搜索完成之后再还原回去。

board[i][j] = '.';
....
board[i][j] = ch;

完整的代码

class Solution {
private:
    class TrieNode {
    public:
        bool is_end{false};
        std::unordered_map<char, std::shared_ptr<TrieNode>> childs;
        std::string word;
    };

    void search(std::shared_ptr<TrieNode> node,
        std::vector<std::vector<char>>& board, 
        const size_t i, 
        const size_t j, 
        std::vector<std::string>& rets) {
        const auto ch = board[i][j];
        if (node->childs.count(ch) <= 0) {
            return;
        }

        // std::cout << "ch:" << ch << std::endl;
        if (node->childs.at(ch)->is_end == true) {
            // std::cout << "ch 1:" << ch << std::endl;
            if (!node->childs[ch]->word.empty()) {
                rets.emplace_back(node->childs[ch]->word);
                node->childs[ch]->word.clear();
            }
        }

        board[i][j] = '.';

        if (i > 0) {
            search(node->childs[ch], board, i - 1, j, rets);
        }

        if (j > 0) {
            search(node->childs[ch], board, i, j - 1, rets);
        }

        if (i + 1 < board.size()) {
            search(node->childs[ch], board, i + 1, j, rets);
        }

        if (j + 1 < board[0].size()) {
            search(node->childs[ch], board, i, j + 1, rets);
        }

        board[i][j] = ch;

        if (node->childs[ch]->childs.size() <= 0) {
            node->childs.erase(ch);
        }
    }

public:
    vector<string> findWords(vector<vector<char>>& board, vector<string>& words) {
        std::vector<std::string> rets;
        if (words.size() <= 0) {
            return rets;
        }

        if (board.size() <= 0 || board[0].size() <= 0) {
            return rets;
        }

        auto root = std::make_shared<TrieNode>();
        for (const auto& word : words) {
            auto node = root;
            for (size_t i = 0; i < word.size(); i++) {
                const auto& ch = word.at(i);

                if (node->childs.count(ch) <= 0) {
                    node->childs[ch] = std::make_shared<TrieNode>();
                }

                node = node->childs.at(ch);
            }

            node->word = word;
            node->is_end = true;
        }

        for (size_t i = 0; i < board.size(); i++) {
            for (size_t j = 0; j < board[0].size(); j++) {
                const auto& ch = board[i][j];
                if (root->childs.count(ch) > 0) {
                    search(root, board, i, j, rets);
                }
            }
        }

        return rets;
    }
};

参考材料

https://leetcode-cn.com/problems/word-search-ii/

除非注明,否则均为[半杯茶的小酒杯]原创文章,转载必须以链接形式标明本文链接

本文链接:http://www.banbeichadexiaojiubei.com/index.php/2021/01/01/%e9%9d%a2%e8%af%95%e5%88%b7%e9%a2%98-trie%e6%a0%91-%e5%8d%95%e8%af%8d%e6%90%9c%e7%b4%a2/