This section is for practical AI engineering work: building LLM-based workflows, reasoning about agent behavior, and making these systems measurable enough to improve them with intent rather than guesswork.

Start with Evaluation Harness and Platform for a dedicated page on evaluating LLM-based workflows and coding agents.