Skip to main content

Extracting Data from Files with AI

Use Trustana's AI to automatically extract product attributes from your uploaded PDF files. Learn how to run extraction tasks and review results.

Updated over 2 weeks ago

Use Trustana's AI to automatically extract product attributes from your uploaded PDF files. This guide shows you how to run extraction tasks and review the results.

Supported on: All plans
Required role: Admin or Content Manager


What is File Data Extraction?

File Data Extraction uses AI to read your uploaded PDFs and fill in product attributes automatically. Unlike web-based enrichment, this extracts data directly from your own documents — spec sheets, catalogs, and data sheets.

Key benefits:

  • Extract data from your trusted first-party sources

  • Process hundreds of products in a single task

  • AI automatically matches file content to your product attributes

Before You Start

Make sure you have completed these prerequisites:

Requirement

Why It's Needed

Files uploaded and processed

Files must be ingested before extraction (wait for email notification)

Products have SKU or Product Model

These identifiers match products to file pages

Products have at least one image

Required for enrichment tasks

Attributes created in your account

AI extracts data into your existing attributes

Important: Attribute names should closely match column headers or labels in the PDF for best results.

Step 1: Select Products to Enrich

  1. Go to All Products from the menu

  2. Use checkboxes to select one or more products

  3. Hover over Enrich Data in the action bar

  4. Click File Data Extraction

Step 2: Configure Extraction Options

A modal appears with extraction options:

Include Enriched Attributes

Option

What It Does

No (default)

Only extract into empty attributes

Yes

Re-extract all attributes, even if they already have values

Note: Extraction takes approximately 2-4 minutes per product.

Step 3: Start the Extraction

  1. Review your options

  2. Click Start Extraction

  3. Your task is now queued!

Step 4: Monitor Your Task

  1. Go to Enrichment Task from the menu

  2. Find your task in the list

    • Source will show File-Based Extraction

  3. Click the Task ID to see individual products

Step 5: Review Extracted Data

Once extraction is complete:

  1. Click any Product ID to view its details

  2. Look for the PDF icon next to enriched fields

    • This indicates the value came from a file

  3. Click the source icon to see which file page was used

Viewing Sources

The source panel shows:

  • PDF sources — which file and page the data came from

  • Web sources — if you also ran web-based enrichment

  • Enrichment Summary — breakdown of all sources used

Step 6: Approve or Reject Values

For each extracted field:

  • Click Approve to lock the value

  • Click Reject to clear it and allow re-enrichment

Tip: Approved values won't be overwritten in future extractions.

Tips for Best Extraction Results

  1. Match attribute names to file headers

    • If your PDF has a column called "Dimensions", name your attribute "Dimensions"

    • Close matches also work (e.g., "Size" may match "Dimensions")

  2. Use both SKU and Product Model

    • Having both identifiers improves matching accuracy

    • At minimum, one must be present

  3. Check your identifier format

    • The SKU or Product Model in Trustana must match exactly how it appears in the PDF

    • Copying product name to Product Model field doesn't work

Troubleshooting

Issue

Solution

Product not included in task

Check that it has an image and SKU/Product Model

No data extracted

Verify the product identifier appears in the PDF

Wrong data extracted

Check if attribute names match PDF column headers

File not available

Ensure file processing is complete (wait for email)

What's Next

Need Help?

Reach out to your Customer Success Representative if you have questions about file data extraction.

Did this answer your question?