HolaNolis Safety
How it works, what it covers, what it doesn't
Preliminary legal note. This document describes the design and the safety mechanisms of HolaNolis. It is not a statement of legal compliance and is pending review by external counsel and, in due course, third-party audit. If you find a case where the system fails or produces an unexpected result, please report it to security@holanolis.com (see the final section, Report a flaw).
HolaNolis is an AI conversational companion designed for minors aged 10 to 20, supervised by a tutor (parent or legal guardian). This page explains, in plain language, how the safety system around that conversation works.
What HolaNolis guarantees, and what it does not
What HolaNolis does not guarantee
- It does not guarantee the absence of failures. As with any AI-based system, no filter is perfect. Detection and filtering rates are high but not 100 %, and we publish them below as honestly as we can.
- It does not guarantee a substitute for a healthcare professional, a psychologist, social services or an emergency response. HolaNolis is not a mental-health service.
- It does not guarantee that a minor with a determined intent to bypass the system cannot, in some case, succeed. We design to reduce that probability — not to reduce it to zero, because that would not be honest.
What HolaNolis does offer
- A multi-layer architecture, described below, designed to reduce the risk of exposure to harmful content for minors.
- A mechanism to alert the tutor when the system detects signals consistent with a crisis situation (suicidal ideation, self-harm, severe eating disorders, abuse, grooming, substances).
- Operational transparency: this document, the technical metrics, and an open channel for users to report flaws.
Safety pipeline — four layers
Every message a minor sends to HolaNolis passes through four verification layers. Each layer is designed to catch a different kind of risk; together they offer redundant protection (this is called defense in depth). If one layer fails, the next can still catch the issue.
Layer 1 — Input filter (pre-LLM)
What it does. Before the message reaches the AI model, the system scans it against:
- Keyword lists (suicide, self-harm, abuse, substances, sexual content, etc.).
- Evasive text patterns (digit-substitution writing, zero-width characters, leetspeak, mixed alphabets).
- Personally identifiable information (PII) detectors: phone numbers, addresses, ID numbers.
- Prompt-injection patterns (attempts by the minor or a third party to manipulate the system instructions).
What it does not do. It does not understand deep context. A phrase like "I'm dying laughing" may trigger a keyword that a later layer must filter. That is why this layer alone is not enough.
Typical latency. Under 10 milliseconds.
Layer 2 — Safety rules inside the AI model
What it does. The AI model receives, as part of its system instructions, a set of non-negotiable rules (forbidden categories, red lines, prescriptive behaviours to avoid). Those rules travel with every conversation and are designed so that the model will not "forget" them or be "talked out of" them through manipulation (jailbreak, role-play, emotional pressure).
What it does not do. An AI model can, in some cases, be persuaded to ignore instructions. That is why this layer never operates alone; layers 1 and 3 exist precisely because the model can fail.
Layer 3 — Output filter (post-LLM)
What it does. Before the AI response is shown to the minor, the system checks it against:
- Red lines (medical advice, diagnosis, prescription, explicit content, etc.).
- Age-appropriateness for the minor's declared age.
- Personal information leakage.
- Coherence with the companion's persona and tone.
If the response does not pass, the system regenerates it, replaces it with a safe canned response, or suspends the conversation and shows static crisis resources (depending on the case).
What it does not do. It cannot recover the compute cost of having generated a response that is then discarded. That is why it is not the first barrier but the last net.
Layer 4 — Crisis detection and tutor alert
What it does. In parallel with the three layers above, the system classifies messages against a set of crisis categories (suicidal ideation, self-harm, eating disorders, abuse/grooming, substances, violence toward others). It assigns a severity level (high, medium, low) and, when appropriate, fires an alert to the tutor via email and push notification.
What the minor sees. The minor does not receive a message of the form "your tutor has been notified". Detection is invisible to the minor by design — if minors learned which exact phrases trigger an alert, they would avoid saying them and the system would stop working.
What the tutor sees. The tutor receives an alert with the category, the severity, the timestamp, and a link to the dashboard where they can see the relevant context of the conversation. Crisis alerts are always free, even for users without a premium subscription.
Categories the system tracks
The system tracks, on a non-exhaustive basis, the following categories. The detailed taxonomy and the treatment of each one are kept as internal technical documentation; here we give an accessible summary.
| Category | What it covers (summary) | Typical action |
|---|---|---|
| Suicidal ideation / self-harm | Expressions of wanting to harm oneself or to stop living | Tutor alert, immediate help resources |
| Eating disorders | Extreme restriction, purging, body dysmorphia | Tutor alert, redirect to professional |
| Abuse, grooming, sextortion | Adults contacting the minor with sexual or coercive intent | Tutor alert, specialised resources |
| Substances | Use, exchange, or seeking of drugs or alcohol | Tutor alert, informational resources |
| Violence toward others | Threats, plans to harm a specific person | Tutor alert, resources as appropriate |
| Explicit sexual content | Material inappropriate for the minor's age | Filtered at input and/or output |
| Personally identifiable information | Phone numbers, addresses, sensitive data shared without protection | Filter or alert depending on context |
| Attempts to manipulate the system | Jailbreak, prompt injection, impersonation | Filtered, recorded in audit log |
This table is indicative. The full taxonomy and severity criteria are kept as internal technical documentation for two reasons: (a) to avoid publishing an evasion map, and (b) because the taxonomy evolves with the system.
Metrics (preliminary)
Honesty notice. Today we can publish the count of automated tests that protect each layer. Real detection rates, false-positive rates and false-negative rates require running the suite against a third-party labelled corpus and publishing them with statistical rigour — and we would rather not publish numbers we cannot defend. When that work is done, we will publish it here.
| Layer | Automated tests | Measured detection rate |
|---|---|---|
| L1 — Input filter | 139 | pending — see "Known limitations" |
| L2 — System-level rules (HARDSKILL) | 35 | pending — see "Known limitations" |
| L3 — Output filter | 16 | pending, with known bug — see "Known limitations" |
| L4 — Crisis detection | 98 | pending — see "Known limitations" |
| Other (helpers, integration) | 36 | n/a |
| Total | 324 |
Last metric extraction: 2026-04-29.
Document version: v0.1 (draft, pending legal review).
Known limitations (honest transparency)
Like any AI system, HolaNolis has flaws. Publishing them openly is part of our safety commitment — not pretending to be perfect.
L3 — Output filter under declarative diagnosis
Detected: 2026-04-29. Status: patch in preparation.
Our automated test suite found that the output filter (Layer 3) does not cover direct declarative patterns of the form "you have X" + imperative "you should Y". The traditional modal patterns ("might have", "could be") are filtered. The patch is identified and pending deployment.
Practical implication: in the period between detection and patch deployment, a Nolis response of the form "you have depression and you should start therapy" might not be intercepted by L3, even though Layers L1 (input) and L2 (Nolis system prompt) are designed to avoid generating it.
Coverage by age bracket
Today we cannot quantify test coverage by age bracket (10-12 / 13-15 / 16-18 / 19-20) because our tests are not tagged by age. This is on the improvement list.
Real detection rates
We do not yet publish false-positive rates (Nolis incorrectly interprets something safe as a risk) nor false-negative rates (Nolis fails to detect a real risk signal). To do so we need a corpus labelled by human specialists, currently in construction.
What the system does NOT do
To avoid confusion — and because we believe an AI companion for minors must be explicit about its limits:
It does not diagnose. HolaNolis does not say "you have anxiety", "you are depressed", or "you have ADHD". Only healthcare professionals can do that.
It does not prescribe. It does not recommend medication, diets, specific therapies or clinical routines.
It does not replace a professional. If your child needs professional help, HolaNolis is not that service. The alerts it sends to tutors are a signal, not a diagnosis.
It does not automatically alert law enforcement or social services. What to do with an alert is always the tutor's decision.
It does not record ambient audio, activate the camera without permission, or track the minor's location. The system only processes what the minor writes (or sends as an image) inside the chat.
It does not guarantee that a determined minor cannot evade the system. We design against that scenario, but we do not claim to have eliminated the risk.
Incident review process
When a user reports a safety flaw — a message that should have been blocked and was not, an alert that fired when it should not have, or any other unexpected behaviour — we follow this process:
- Acknowledgement within a reasonable window (target: 72 working hours).
- Reproduction of the case, to the extent the supplied information allows.
- Analysis and, if appropriate, fix in the system, accompanied by a test that detects the failure to prevent future regressions.
- Communication to the reporting user when the cycle is closed.
We do not yet publish a formal responsible disclosure policy with contractual deadlines. It is on the list of items to formalise in the next phase of the product.
Report a flaw
If you find a case where the system:
- Blocked something it should not have blocked (false positive).
- Did not block something it should have blocked (false negative).
- Fired an alert with no clear cause (crisis false positive).
- Did not fire an alert when it should have (crisis false negative).
- Produced medical advice, a diagnosis, or a prescription.
- Was manipulated by a minor or a third party to bypass the rules.
Please write to security@holanolis.com including:
- What you expected to happen.
- What actually happened.
- Steps to reproduce (if you have them).
- Screenshots, if it is safe to share them.
- Language and app version, if you know them.
Your report helps us make the system better. We take it seriously.
Document v0.1 — pending legal review and external audit before official publication. Test counts reflect the latest extraction on 2026-04-29. Real detection rates and false-positive / false-negative rates will be published when the labelled corpus is ready.