Backend Engineer: AI Tooling

About Mentalyc:

We redesign therapy—and we don’t take no for an answer.

“Impossible” gets us excited. We chase hard problems, missing questions, and truths no one’s cracked. We think big, fight hard, and stare fear in the face.

Mentalyc started in 2021 when analyzing therapy sessions seemed untouchable. Investors passed. We proved them wrong. We grew on revenue, product-market fit, and relentless experimentation.

Today we lead the fast growing AI scribe therapists market. We have millions in revenue from thousands of clients. And we are just getting started.

We’re building a future where therapy is measurable, effective, and radically transformed. We won’t stop until we get there.

What We Offer:

Opportunity to work at the intersection of psychotherapy, AI, and cutting-edge technology
Collaborative work environment with a diverse and international team
A remote-first, flexible environment where outcomes matter more than hours.
Competitive pay based on experience and qualifications

Responsibilities:

Prompt Infrastructure, Tooling, and Experimentation System

Design, build maintain, and expand infrastructure for prompt versioning, testing, evaluation, and deployment.
Design, build, maintain, and expand scalable prompt experimentation infrastructure for clinical and product stakeholders.
Develop internal tools (e.g. lightweight Streamlit apps, QA dashboards) for prompt testing, debugging, and analysis accessible to both technical and non-technical collaborators.
Ensure all workflows are clean, scalable, traceable, and integrated smoothly with production APIs

Product + Prompt Workflows

Maintain the end-to-end flow of AI-generated notes: from user input → LLM output → post-processing → client-facing delivery.
Write and maintain prompt post-processing code (Python, JS/TS) for output formatting, error handling, and clinical compliance.
Implement structured logging and debugging to track and resolve QA issues
Collaborate with Product and Engineering stakeholders to ensure prompt logic integrates smoothly with product infrastructure.

Prompt Development Testing, and QA

Own prompt pipeline debugging — with a heavy focus on regex, string manipulation, JSON handling, and fixing prompt logic issues.
Monitor and address prompt behavior issues, edge-case errors, and output inconsistencies.

Requirements:

4+ years experience building backend systems, internal tooling, or infrastructure for experimentation or operational workflows.
Proficiency in Python and JavaScript/TypeScript for data processing, post-processing logic, and internal tool development.
Very strong command of string manipulation, JSON processing, regex, error handling, and prompt logic implementation.
Experience designing complex distributed systems and DB schemas
Familiarity with AI /ML observability tooling (MlFlow, LangGraph, LangFuse)
Comfortable with SQL for data extraction, transformation, and QA tasks.
Familiarity with APIs, model endpoints, and REST-based data flows.
Exposure to LLM toolchains (LangChain, Haystack, OpenAI API, prompt templating, token budgeting).
Familiarity with working with AWS infrastructure
Strong debugging instincts and ability to handle ambiguous, evolving systems.

Personal Qualities:

Strong Communication Skills: Ability to effectively articulate technical concepts and collaborate across diverse teams.
High Adaptability: Comfortable working in a fast-paced and evolving environment and can pivot quickly to address changing priorities.
Results-Driven: Focused on achieving measurable outcomes and continuously seeking to improve both team performance and product quality.
Strategic Thinker: Able to align technical decisions with broader business goals and has a knack for seeing the bigger picture.

Apply Now