Yinkozi
Contact
services / AI / LLM security

Security assessment of AI and LLM-based systems — by humans, for humans deploying them.

Manual, research-driven evaluation of LLM applications, RAG pipelines, agentic systems, and the foundation models behind them. Prompt injection, data exfiltration, training-data leakage, model integrity, tool-use abuse, identity confusion in agent chains. The methodology is human; we use local LLMs we run in-house to accelerate analysis — never the customer's data through a commercial SaaS model.

01 / why this is different

LLM systems break in shapes traditional pentest does not catch.

A web application has a defined input space, a defined output space, and a deterministic backend. An LLM application has a near-infinite input space, a non-deterministic output, and a model that can be persuaded to act outside its instructions by data it ingests at inference time.

Standard application-security testing finds standard application-security bugs in the surface around the model. The novel risks — prompt injection from retrieved documents, training-data leakage through prompt construction, agent-tool-use confusion, jailbreak chains that survive guardrails — require a different methodology.

We have built that methodology over real engagements with real production LLM systems.

AI / LLM application security surfaces — five layers with their respective attack vectors — application stack — attack vectors 01 User input · Prompt construction Where the user (or another system) talks to the application Direct prompt injection jailbreak chains, encoding bypass 02 RAG & retrieval pipeline Vector store, retrieval logic, citation handling Indirect injection · Context exfil embedding-collision, citation forgery 03 Tool use · Agent chains · MCP Function calling, multi-step agents, identity propagation Tool-call abuse · Scope confusion privilege escalation via composition 04 Foundation model · Inference The LLM itself — guardrails, output filters, safety classifiers Guardrail bypass · Output coercion multi-language, image-modality bypass 05 Weights · Fine-tune · Supply chain Hugging-face dependencies, adapter packs, training data Data poisoning · Malicious adapter embedding-store poisoning
02 / what we test

The risks specific to LLM-based systems.

Prompt injection
Direct and indirect

Direct prompt injection from user input, indirect via retrieved documents, tool outputs, and connected data sources. Cross-tenant injection in shared deployments. Persistence across conversation turns.

Data exfiltration
Through model and through pipeline

Training-data leakage, system-prompt extraction, RAG-context exfiltration via covert channels in markdown / images / tool responses, multi-turn extraction strategies.

Tool-use abuse
Agent chains and function calling

Tool-call injection, scope confusion across function-calling pipelines, MCP and connector abuse, identity drift in multi-step agent chains, privilege escalation via tool composition.

Model integrity
Supply chain and weights

Fine-tuning data poisoning, malicious adapter / LoRA, weights-tamper detection, hugging-face supply-chain risk, embedding-store poisoning.

RAG & retrieval
Pipeline-shaped attacks

Embedding-collision attacks, retrieval poisoning, vector-store boundary leakage, context-window pollution, citation forgery.

Guardrail bypass
Output filters and safety layers

Jailbreak chains, encoding bypasses, multi-language attacks, safety-classifier evasion, output-format coercion, image-modality bypass.

03 / our line

Manual methodology, accelerated by local tooling.

The market for AI-driven security testing is loud. Most of it pipes the customer's data through a commercial SaaS model and bills for the result. That is not what we do.

The novel risks in LLM applications are exactly the risks an LLM-based scanner is least equipped to find. Prompt-injection vectors depend on understanding the customer's specific architecture, the model's specific behaviour, and the context the model has at inference time. A general-purpose model scanning another model hallucinates findings, misses architectural context, and produces noise the customer cannot triage.

Our methodology is manual, expert-led, and research-driven. We read the customer's prompts, study the customer's tool definitions, analyse the customer's RAG corpus, and design attacks specific to that system. The deliverable is reproducible, has working proof-of-concept payloads, and identifies the architectural choice that made the attack possible.

We do use local LLMs we run in-house — to accelerate analysis, scaffold engagement-specific tooling, generate payload variants, and pattern-match across our private corpus of jailbreak primitives. The model never touches a commercial SaaS endpoint, and the human in the loop is never optional.

04 / engagement shape

Scoped to the architecture, not to a calendar.

Single-application AI assessment. One LLM-based product, full attack surface, including the surrounding application.

RAG / retrieval-pipeline assessment. Vector store, retrieval logic, prompt construction, citation handling, multi-tenant isolation.

Agentic-system assessment. Multi-agent chains, tool-call surfaces, identity propagation, privilege boundaries between steps.

Continuing engagement. Ongoing review as the customer's LLM stack evolves — every model version, prompt change, and tool addition shifts the attack surface.

Duration depends on the architecture and the depth of coverage. We scope against the actual system, not against a fixed table.

05 / start a conversation

Building or deploying an LLM system?

We work with teams shipping LLM products at tier-1 financial, government, and energy customers. Earlier is better — most of the load-bearing security choices happen at architecture time.

email