"Legal Detective" announces open-sourcing of Taiwan legal RAG vector database, enabling colloquial search across 30 years of judgments

Taiwan legal AI fan page "Legal Detective" announced the open-sourcing of "TW Legal RAG," a Taiwan legal semantic search tool that structures and vectorizes approximately 22 million Taiwan court judgments. Users can search judgments using everyday colloquial language and package the results into a format directly readable by any AI (ChatGPT, Claude, Gemini, or local models). The tool has built-in citation verification to check whether judgment case numbers in AI responses actually exist. (Previous context: 6 key points from the public hearing on Taiwan's crypto-specific "Virtual Asset Service Act": unlicensed operators involved in fraud will face aggravated penalties!) (Background supplement: Far Eastern International Bank and the High Prosecutors Office signed an MOU: 95% of Taiwan's crypto money flows incorporated into anti-money-laundering and anti-fraud investigation framework) Key Highlights - "Legal Detective" open-sourced TW Legal RAG, covering approximately 22 million Taiwan court judgments. Just run pip install twlegalrag to use it - The architecture is a "Bring Your Own AI" model: the tool itself does not call any LLM. After semantic search, results are packaged into a Bundle for any AI to read - Built-in citation verification can check whether judgment case numbers in AI responses actually exist in the search results, preventing hallucinations "Legal Detective" announced on Facebook today (27th) that "TW Legal RAG" (Taiwan Legal RAG Semantic Search Tool), developed over six months, is officially open-sourced and live, released under the MIT license. This tool structures and vectorizes approximately 22 million Taiwan court judgments, building a semantic search system specifically designed for the legal field. Anyone can install and use it directly via pip install twlegalrag . RAG (Retrieval-Augmented Generation) is currently the mainstream technical architecture for solving the "hallucination" problem in Large Language Models (LLMs). It means retrieving relevant facts from an external knowledge base before the model generates a response, ensuring outputs are backed by actual data. This is especially critical in the legal field, as LLMs fabricating non-existent precedents have repeatedly caused practical disputes, including the well-known case of an American lawyer being sanctioned by a court for citing fictitious precedents generated by ChatGPT. The developer states that he spent thousands of hours optimizing the retrieval pipeline, with costs potentially approaching one million New Taiwan dollars, and is now open-sourcing it for free under the MIT license. 22 Million Judgments, BYO-AI Architecture, Citation Verification TW Legal RAG's technical architecture differs from most legal AI tools: it does not call any LLM itself, but instead adopts a "Bring Your Own AI" (BYO-AI) design. Users send semantic search requests via the CLI tool to the backend (Legal Detective's TLR infrastructure, with the endpoint tlr.dr-lawbot.com ). After the system returns relevant judgments, the tool packages the results into a structured Bundle that can be fed directly to ChatGPT, Claude, Gemini, or any local model. This means users do not need to deploy embedding models or vector indexes locally. The entire tool has lightweight dependencies, requiring only three Python packages: httpx , typer , and rich . The tool provides four core commands: search : Perform semantic search on 22 million judgments using natural language pack : Package search results into an AI-readable Bundle, including judgment excerpts and verification rules check : Bundle-level citation verification to confirm whether judgment case numbers cited in AI responses exist in the search results health : Service status check The citation verification feature is a major highlight of this tool. It uses pure regular expression matching (without relying on LLMs) to check whether judgment case numbers in AI-generated content actually exist in the Bundle. However, the developer also clearly notes the limitations: the verifier cannot determine whether the cited content is correct, cannot detect reasoning errors, and cannot identify semantically paraphrased hallucinations—it can only confirm the existence of case numbers. According to "Legal Detective"'s post, this is a project completed single-handedly by one person. The developer invested thousands of hours over six months optimizing the retrieval pipeline, with estimated costs potentially approaching one million New Taiwan dollars. Why Choose Free Open Source? In the post, "Legal Detective" stated that many users have privately asked whether there is a fee. He admitted he had invested substantial resources, but ultimately chose to open-source it for free under the MIT license. Part of the reason is that he sees the Taiwan government currently promoting the proactive establishment of knowledge LLM services across various agencies, and he hopes to contribute to this direction through open source. The open-sourcing of TW Legal RAG carries substantial significance for Taiwan's legal tech ecosystem. Currently, Taiwan's legal AI field already has several commercialized products (such as Lawbot AI, LawPlayer, etc.), but an open-source tool covering 22 million judgments, capable of semantic search, with built-in citation verification, remains scarce. Developers or startup teams can directly adopt this tool and integrate it into their own applications without having to build a legal knowledge base from scratch. It is worth noting that TW Legal RAG records user queries on the server side for retrieval analysis, but declares that they are not used for model training. Real-world usage experience still requires feedback from judicial-related professionals. FAQ How to use TW Legal RAG? After installing with pip install twlegalrag , use the CLI command search to search judgments in natural language, pack to package them into an AI-readable format, and then feed them to any LLM such as ChatGPT or Claude. Can TW Legal RAG's citation verification prevent AI hallucinations? Citation verification can confirm whether judgment case numbers in AI responses exist in the search results, but cannot determine whether the cited content is correct or detect semantically paraphrased hallucinations. It can only perform existence verification at the case number level.