Case Study - Finance Automation

A morning task that eats 45 minutes daily, automated in under 60 seconds

Every finance team running on manual reconciliation has a version of this ritual: open the shipping file, open the invoice file, scan line by line, flag the mismatches, type a summary email to the logistics manager. Repeat tomorrow. The task is not complex. It is just relentless. This project automates the full cycle, from raw PDFs to a ready-to-send email, with an AI layer that filters noise so the manager sees only what actually needs attention.

45-90 min

Daily manual task replaced

2,155 -> handful

Raw flags filtered to actionable items

<60 sec

Full reconciliation cycle, end to end

The process being replaced

A finance clerk at Northwind Traders starts every morning comparing shipping orders against invoices, line by line, checking quantities, prices, and order completeness. When something does not match, they note it, write a summary, and email the logistics manager. The whole process takes between 45 and 90 minutes depending on volume.

The interesting part is that most of those mismatches are not actionable. Some are normal operational differences: different customer names between document types, orders that are legitimately on backorder. The real errors are buried in the noise. So the clerk's actual job is not comparison. It is judgment. Figuring out which flags are real.

That is the real problem with automating reconciliation: a naive comparison creates more noise than the manual process it is replacing. The system has to be smarter about what constitutes a real error.

First automated run returned 2,155 flagged discrepancies. Almost all were false positives. The invoice and shipping datasets legitimately use different customer names for the same order. Shipping is not wrong. Finance is not wrong. The documents just reflect different parts of the business.

Getting from 2,155 flags to something useful

The first fix was definitional. Instead of comparing every field, the pipeline matches records on a compound key: OrderID + ProductName. That is the financial identity of a line item. CustomerName, warehouse labels, billing contacts, those fields differ by design and removing them from comparison dropped the noise significantly.

The remaining discrepancies were split into two categories with different business meanings.

After this refinement: 81 discrepancies. Still too many for a readable email, but now the right kind of problem, the kind where judgment matters, and where an LLM actually helps.

Category 1

Missing rows

An order exists in one document but not the other. Structural issue, the records are not aligned. Could be a data integrity problem or a legitimate backorder.

Category 2

Field mismatches

Same order and product in both files, but quantities, prices, or dates differ. Potential financial error, the numbers do not agree.

Where the AI step fits

An LLM is not useful for the comparison itself. That is deterministic logic, and JavaScript handles it cleanly. Where the model adds value is in reading each flagged discrepancy with business context and deciding whether a human needs to act on it.

Each of the 81 items goes through an AI triage node. The model receives context about the business and the specific discrepancy, then classifies it as include or suppress with a one-line reason. Items that look like operational noise get suppressed. Items that look like real financial errors get passed forward.

The output of the triage step is a short, structured list of discrepancies that actually warrant attention. That list goes directly into the email draft.

The 81 items were batched into groups of 15 for classification, a practical constraint around context window limits. Each batch is classified independently, and only the include items reach the email step.

The output is a drafted email

A dashboard would require someone to open it, log in, and check it each morning. An email arrives in the inbox the manager already uses. For a daily operational workflow, the delivery format matters as much as the content.

The final step drafts a professional email with the date, total discrepancies found, count flagged for action, critical items first, minor items below, and a clear ask. It is ready to send with one click, or to be reviewed and edited if the manager prefers.

From raw PDFs to a ready-to-send email in under 60 seconds. The clerk who used to spend an hour on this can now spend that hour on something that actually requires their judgment.

Pipeline flow

Invoices PDF->Shipping PDFs->CSV extraction->Comparison logic->AI triage (batched)->Email draft

Fun fact

The source data was too clean for a demo, so I planted intentional discrepancies: quantity errors, price differences, and missing entries, to create verifiable ground truth.

Technical decisions and why

n8n over LangGraph	The core pipeline is deterministic: no dynamic routing, no agent loop. n8n is faster to build and easier to demo for this pattern. LangGraph is the right tool when the model needs to reason about what to do next. Here, the steps are fixed.
JavaScript in the Code node	n8n runs JS natively. Python would require an external task runner, unnecessary overhead for comparison logic that is a few dozen lines.
Match on OrderID + ProductName	Natural compound key for a financial line item. Matching on more fields creates false positives from legitimate operational differences between document types.
AI triage as a middle step	The comparison generates flags. The LLM reviews flags with business context and classifies them. Keeps the deterministic logic clean and the AI step focused on judgment under ambiguity.
Email over dashboard	Meets the user where they already work. Reduces the friction between automation ran and action taken.
Intentional discrepancies in dataset	The source data was too clean for a demo. Planted specific mismatches so the pipeline has verifiable ground truth to catch.

What this does not do, and the honest reason

The PDF extraction is a one-time pre-processing step, not part of the live workflow. In production, you would connect directly to an ERP or accounting system and skip the PDF layer entirely. The extraction script exists because the demo data came as PDFs. In a real deployment, structured exports would replace it.

Similarly, the exact text search on 81 items works fine at this scale. A high-volume operation processing thousands of discrepancies daily would want a proper indexed search rather than in-memory filtering. That is a known production step, not an architectural gap in the current design.

Stack

n8n (self-hosted, Docker)JavaScript (Code node)Python (pdfplumber)LLM - classification & draftingNorthwind Traders dataset