Skip to main content

ToothFairyAI — Agentic Ingestion Guide (Python-backed)

This guide shows how to ingest user-uploaded PDFs into the Knowledge Hub via an agent using code execution.
The agent runs a Python hook that calls the public HTTP API Endpoints without the use of API functions.
We also include optional cURL commands for manual testing.


How it works (at a glance)

Chat upload → Agent → code_execution → Python script:

  1. Presign: GET /documents/requestPreSignedURL → temporary S3 uploadURL
  2. Upload: PUT bytes to uploadURL
  3. Create: POST /doc/create with external_path pointing to the uploaded file
  4. Agent returns the JSON (e.g., { success: true, document_id: "…" })
note

Pre-signed URLs expire quickly (minutes) so all steps need to be run in sequence.


Prerequisites

  • An agent with code execution enabled.
  • The Python helper is available to the agent (e.g., a hook called upload_file_and_create_document that exposes run_step actions like end_to_end, check_env, sanitize, presign, upload, create).
  • Secrets configured in the agent (no quotes/whitespace):
    • TF_API_KEY — API key
    • WORKSPACE_ID — UUID v4 of your workspace (e.g., 5ebf…316f5)
    • USER_ID — the user on whose behalf the doc is created
    • Optional: TF_TOPICS — comma-separated topic IDs
  • File size ≤ your plan’s limit (e.g., 20 MB).

Filename rule: Many tenants require the presign filename to be scoped:

<WORKSPACE_ID>/<filename.ext>

Always include the extension (e.g., .pdf).


Use a natural prompt in chat (attach a PDF):

Upload the attached PDF to the Knowledge Hub using the default workspace and user. Title it “ingestion-test.pdf” and assign the topic id ("bc06be63...").

Behind the scenes the agent will run the Python helper (end_to_end): presign → upload → create and return JSON with document_id on success.


Step-by-step (debug mode)

If you need to isolate a failure, ask the agent to run these in order, returning only JSON each time.

1) Check env

Run code_execution tool upload_file_and_create_document with:
run_step(action="check_env")
Return only the JSON.

Expect: has_api_key: true, workspaceid, userid, and a version string.

2) Sanitize → full_name

Run code_execution tool upload_file_and_create_document with:
run_step(action="sanitize", original_filename={{attachments.0.filename}}, desired_title="ingestion-test.pdf")
Return only the JSON.

Copy full_name (must look like <WORKSPACE_ID>/ingestion-test.pdf).

3) Presign

Run code_execution tool upload_file_and_create_document with:
run_step(action="presign", full_name="<paste full_name>", reveal_upload_url=false)
Return only the JSON.

Use returned uploadURL (with query!) for the next step. Keep file_path for external_path (or object_url if file_path missing).

4) Upload

Run code_execution tool upload_file_and_create_document with:
run_step(action="upload", uploadURL="<paste uploadURL>", pdf_base64={{attachments.0.base64}})
Return only the JSON.

Expect status 200/204.

5) Create

Run code_execution tool upload_file_and_create_document with:
run_step(action="create", desired_title="ingestion-test.pdf", external_path="<file_path or object_url>", topics=["bc06be63-a5ed-4caf-be86-645e07b3ab31"])
Return only the JSON.

Expect response JSON with a document id (varies by tenant; may be top-level or in data[0].id).


What the Python helper does

  • Sanitize filename → lowercase, spaces→_, keep [A-Za-z0-9_.-], ensure .pdf.
    Build presign name as <WORKSPACE_ID>/<safe>.pdf.
  • PresignGET /documents/requestPreSignedURL?filename=… (optionally &contentType=application/pdf).
    If you get 446 pattern errors, double-check the scoped name. If 403, verify x-api-key or retry without contentType.
  • UploadPUT raw bytes to uploadURL with Content-Type: application/pdf.
    Must keep the query string intact.
  • CreatePOST /doc/create with body:
{
"workspaceid": "<WORKSPACE_ID>",
"userid": "<USER_ID>",
"data": [
{
"type": "readComprehensionPdf",
"title": "<title>",
"external_path": "<s3 filePath or object url>"
}
]
}

If you use topics, add "topics": ["<topic-id>", …].


Minimal Python (for verification outside the agent)

import json, requests, re, os
from urllib.parse import quote

API = "https://api.toothfairyai.com/"
TF_API_KEY = os.getenv("TF_API_KEY")
WORKSPACE_ID = os.getenv("WORKSPACE_ID")
USER_ID = os.getenv("USER_ID")

def sanitize(name):
base = (name or "uploaded-document.pdf").strip().lower().replace(" ", "_")
base = re.sub(r"[^a-z0-9_.\-]", "_", base)
return base if base.endswith(".pdf") else base + ".pdf"

def presign(scoped_name, content_type="application/pdf"):
url = f"{API}documents/requestPreSignedURL?filename={quote(scoped_name, safe='/._-')}"
r = requests.get(url, headers={"x-api-key": TF_API_KEY, "accept":"application/json"}); r.raise_for_status()
return r.json()

def upload(upload_url, path, content_type="application/pdf"):
with open(path, "rb") as f:
r = requests.put(upload_url, data=f, headers={"Content-Type": content_type}); r.raise_for_status()

def create_doc(title, external_path, topics=None):
body = {"workspaceid": WORKSPACE_ID, "userid": USER_ID, "data": [{
"type":"readComprehensionPdf", "title": title, "external_path": external_path
}]}
if topics is not None: body["data"][0]["topics"] = topics
r = requests.post(f"{API}doc/create", headers={"x-api-key": TF_API_KEY, "Content-Type":"application/json"}, json=body); r.raise_for_status()
return r.json()

Optional: cURL test snippets

Presign

curl -G "https://api.toothfairyai.com/documents/requestPreSignedURL"   --data-urlencode "filename=<WORKSPACE_ID>/ingestion-test.pdf"   -H "x-api-key: YOUR_API_KEY" -H "accept: application/json"

Upload

curl -X PUT "<uploadURL from presign>"   -H "Content-Type: application/pdf"   --data-binary @./document.pdf

Create

curl -X POST "https://api.toothfairyai.com/doc/create"   -H "x-api-key: YOUR_API_KEY" -H "Content-Type: application/json"   -d @body.json

Common errors and fixes

SymptomLikely causeFix
403 on presignAPI key missing/invalidRe-enter x-api-key (no quotes/whitespace)
446 filename patternMissing scoped formatUse <WORKSPACE_ID>/<filename.ext>
403 AccessDenied on uploadPUT to object URL or URL expiredPUT to full uploadURL (with query) or re-presign
Topics must be an arrayField type mismatchSend "topics": ["<topic-id>"] or omit
500 No data bodyWrong Content-Type or empty JSONUse Content-Type: application/json with valid body

Success checklist

  • Got uploadURL from presign; upload returned 200/204.
  • POST /doc/create returned JSON; you can locate the new document id (top-level or data[0].id).
  • The Knowledge Hub shows the new document (initially draft in some tenants; embedding may occur asynchronously).