I create an Adversary Framework to Measure LLMs Totally failed

I create a framework called "Agent-eval" that simulates an agent cascade with tool calls. I then tested the responses using three pyramidal assessment levels: Tier 1 (Decisive Detectors), Tier 2 (Heuristic Statistics) and Tier 3 (Judgement Model). The opposing tests were performed against 5 different models, and one of the best reached only 62.5% of points.

The problem with current assessments is that they do not test the ability of models to withstand attacks. For example, they do not check if a template responds correctly to a quick injection or if it misuses existing files. Furthermore, they do not take into account the fact that agents, with their tools and ability to link tasks, present different challenges.

I didn't find any framework that tested these adverse tests until I created my own. The Agent-eval system is a triptych of checks: Tier 1 (Decisive Detectors), which boils down to the response of the model containing an SQL opening, the absence of a system reference, etc.; Tier 2 (Heuristic Statistics) to detect the repetitive and relevant nature of responses; and Tier 3 (Model of Judgment), which will evaluate the model's response to a fixed heading.

The most important thing is that checks continue online, which means that they do not cost to use another platform. If Tier 1 or Tier 2 fail (the result is empty or the system has been influenced), there is no need to proceed to Tier 3 verification.

The evaluation model can be seen as a "judgment model" that boils down to a second artificial language platform capable of directly verifying the outputs of a first, only if the previous tests are successful.

Source: DEV Community. IA-assisted editorial summary — TechnoExpress.

I create an Adversary Framework to Measure LLMs Totally failed

Essential tech, every morning