Controlled local AI runtime
A curated public view of a local AI backend built around FastAPI, contracts, provider adapters, optional document retrieval and JSON traces.
Problem
Many AI prototypes treat the model as the whole system. Prompt, model call, tool access, business rules and final response are often mixed together, which makes failures hard to reproduce and behavior hard to audit.
Architectural decision
Build a runtime around the model. In plain English, the runtime is the controlled execution layer that decides how requests are validated, how policies are applied, which provider is called and how the result is recorded.
The model does not own the system. It receives controlled context and returns text; the backend owns contracts, permissions, retrieval, tracing and error handling.
Key concepts
JSON contracts
JSON is structured text used for predictable requests, responses and execution records. Here it acts as a contract between components, not as loose notes.
Provider adapter
A provider adapter is a small layer that lets the backend call different model providers, such as a local model server or a cloud API, without rewriting the application.
Simplified flow
What it demonstrates
Backend judgment for turning an AI prototype into an inspectable local system with explicit contracts, replaceable model providers, controlled evidence and useful execution records.
Stack
Python, FastAPI, Pydantic-style contracts, JSON traces, Telegram integration, local model providers, optional document RAG and local evaluation scripts.
Current status
Advanced prototype. Useful as a curated technical demo and local experimental backend. It is not presented as production-ready or deployment-ready.
Limitations
- • No validated high-concurrency behavior.
- • Local JSON-based observability, not centralized monitoring.
- • Private source corpora and real logs are not published.
- • Runtime, labs and scripts must remain clearly separated in public presentation.