Having completed Phase I of Turo’s Chatbot I stumbled upon how many small developers, startups, and SMEs struggle to find effective and affordable ways to evaluate their Large Language Model applications. Currently the process is very fragmented with developers moving between multiple tools, PMs and Data Scientists unable simulate environments to test their applications. This makes it hard to ensure their LLMs are performing well and producing reliable and unbiased results pre and post deployment.
Partnering with Bertrand , for an MVP we’re currently building the Evaluation Platform to monitor LLMs. With our first feature being automated testing and real-time monitoring. It will also provide clear, actionable feedback and integrate with existing development tools to make the evaluation process as smooth as possible. Currently I am testing with startups and solo developers.
Our goal is to empower developers to create and deploy high-quality LLMs that are reliable, unbiased, and accurate.
Our goal is to empower developers to create and deploy high-quality LLMs that are reliable, unbiased, and accurate.
Watch this space 👀