ToothFairyAI — Agentic Ingestion Guide (Python-backed)
This guide shows how to ingest user-uploaded PDFs into the Knowledge Hub via an agent using code execution.
The agent runs a Python hook that calls the public HTTP API Endpoints without the use of API functions.
We also include optional cURL commands for manual testing.
How it works (at a glance)
Chat upload → Agent → code_execution
→ Python script:
- Presign:
GET /documents/requestPreSignedURL
→ temporary S3uploadURL
- Upload:
PUT
bytes touploadURL
- Create:
POST /doc/create
withexternal_path
pointing to the uploaded file - Agent returns the JSON (e.g.,
{ success: true, document_id: "…" }
)
Pre-signed URLs expire quickly (minutes) so all steps need to be run in sequence.
Prerequisites
- An agent with code execution enabled.
- The Python helper is available to the agent (e.g., a hook called
upload_file_and_create_document
that exposesrun_step
actions likeend_to_end
,check_env
,sanitize
,presign
,upload
,create
). - Secrets configured in the agent (no quotes/whitespace):
TF_API_KEY
— API keyWORKSPACE_ID
— UUID v4 of your workspace (e.g.,5ebf…316f5
)USER_ID
— the user on whose behalf the doc is created- Optional:
TF_TOPICS
— comma-separated topic IDs
- File size ≤ your plan’s limit (e.g., 20 MB).
Filename rule: Many tenants require the presign filename to be scoped:
<WORKSPACE_ID>/<filename.ext>
Always include the extension (e.g., .pdf
).
Quick start (one-shot, recommended)
Use a natural prompt in chat (attach a PDF):
Upload the attached PDF to the Knowledge Hub using the default workspace and user. Title it “ingestion-test.pdf” and assign the topic id ("bc06be63...").
Behind the scenes the agent will run the Python helper (end_to_end
): presign → upload → create and return JSON with document_id
on success.
Step-by-step (debug mode)
If you need to isolate a failure, ask the agent to run these in order, returning only JSON each time.
1) Check env
Run code_execution tool upload_file_and_create_document with:
run_step(action="check_env")
Return only the JSON.
Expect: has_api_key: true
, workspaceid
, userid
, and a version
string.
2) Sanitize → full_name
Run code_execution tool upload_file_and_create_document with:
run_step(action="sanitize", original_filename={{attachments.0.filename}}, desired_title="ingestion-test.pdf")
Return only the JSON.
Copy full_name
(must look like <WORKSPACE_ID>/ingestion-test.pdf
).
3) Presign
Run code_execution tool upload_file_and_create_document with:
run_step(action="presign", full_name="<paste full_name>", reveal_upload_url=false)
Return only the JSON.
Use returned uploadURL
(with query!) for the next step. Keep file_path
for external_path
(or object_url
if file_path
missing).
4) Upload
Run code_execution tool upload_file_and_create_document with:
run_step(action="upload", uploadURL="<paste uploadURL>", pdf_base64={{attachments.0.base64}})
Return only the JSON.
Expect status 200/204.
5) Create
Run code_execution tool upload_file_and_create_document with:
run_step(action="create", desired_title="ingestion-test.pdf", external_path="<file_path or object_url>", topics=["bc06be63-a5ed-4caf-be86-645e07b3ab31"])
Return only the JSON.
Expect response JSON with a document id (varies by tenant; may be top-level or in data[0].id
).
What the Python helper does
- Sanitize filename → lowercase, spaces→
_
, keep[A-Za-z0-9_.-]
, ensure.pdf
.
Build presign name as<WORKSPACE_ID>/<safe>.pdf
. - Presign →
GET /documents/requestPreSignedURL?filename=…
(optionally&contentType=application/pdf
).
If you get446 pattern
errors, double-check the scoped name. If403
, verifyx-api-key
or retry withoutcontentType
. - Upload →
PUT
raw bytes touploadURL
withContent-Type: application/pdf
.
Must keep the query string intact. - Create →
POST /doc/create
with body:
{
"workspaceid": "<WORKSPACE_ID>",
"userid": "<USER_ID>",
"data": [
{
"type": "readComprehensionPdf",
"title": "<title>",
"external_path": "<s3 filePath or object url>"
}
]
}
If you use topics, add "topics": ["<topic-id>", …]
.
Minimal Python (for verification outside the agent)
import json, requests, re, os
from urllib.parse import quote
API = "https://api.toothfairyai.com/"
TF_API_KEY = os.getenv("TF_API_KEY")
WORKSPACE_ID = os.getenv("WORKSPACE_ID")
USER_ID = os.getenv("USER_ID")
def sanitize(name):
base = (name or "uploaded-document.pdf").strip().lower().replace(" ", "_")
base = re.sub(r"[^a-z0-9_.\-]", "_", base)
return base if base.endswith(".pdf") else base + ".pdf"
def presign(scoped_name, content_type="application/pdf"):
url = f"{API}documents/requestPreSignedURL?filename={quote(scoped_name, safe='/._-')}"
r = requests.get(url, headers={"x-api-key": TF_API_KEY, "accept":"application/json"}); r.raise_for_status()
return r.json()
def upload(upload_url, path, content_type="application/pdf"):
with open(path, "rb") as f:
r = requests.put(upload_url, data=f, headers={"Content-Type": content_type}); r.raise_for_status()
def create_doc(title, external_path, topics=None):
body = {"workspaceid": WORKSPACE_ID, "userid": USER_ID, "data": [{
"type":"readComprehensionPdf", "title": title, "external_path": external_path
}]}
if topics is not None: body["data"][0]["topics"] = topics
r = requests.post(f"{API}doc/create", headers={"x-api-key": TF_API_KEY, "Content-Type":"application/json"}, json=body); r.raise_for_status()
return r.json()
Optional: cURL test snippets
Presign
curl -G "https://api.toothfairyai.com/documents/requestPreSignedURL" --data-urlencode "filename=<WORKSPACE_ID>/ingestion-test.pdf" -H "x-api-key: YOUR_API_KEY" -H "accept: application/json"
Upload
curl -X PUT "<uploadURL from presign>" -H "Content-Type: application/pdf" --data-binary @./document.pdf
Create
curl -X POST "https://api.toothfairyai.com/doc/create" -H "x-api-key: YOUR_API_KEY" -H "Content-Type: application/json" -d @body.json
Common errors and fixes
Symptom | Likely cause | Fix |
---|---|---|
403 on presign | API key missing/invalid | Re-enter x-api-key (no quotes/whitespace) |
446 filename pattern | Missing scoped format | Use <WORKSPACE_ID>/<filename.ext> |
403 AccessDenied on upload | PUT to object URL or URL expired | PUT to full uploadURL (with query) or re-presign |
Topics must be an array | Field type mismatch | Send "topics": ["<topic-id>"] or omit |
500 No data body | Wrong Content-Type or empty JSON | Use Content-Type: application/json with valid body |
Success checklist
- Got
uploadURL
from presign; upload returned 200/204. POST /doc/create
returned JSON; you can locate the new document id (top-level ordata[0].id
).- The Knowledge Hub shows the new document (initially
draft
in some tenants; embedding may occur asynchronously).