Skip to main content

🤖 Extracting Data from Files with AI

Use Trustana's AI to automatically extract product attributes from your uploaded PDF files. Learn how to run extraction tasks and review results.

Updated this week

Use Trustana's AI to automatically extract product attributes from your uploaded PDF files. This guide shows you how to run extraction tasks and review the results.

🎯 What is File Data Extraction?

File Data Extraction uses AI to read your uploaded PDFs and fill in product attributes automatically. Unlike web-based enrichment, this extracts data directly from your own documents — spec sheets, catalogs, and data sheets.

Key benefits:

  • Extract data from your trusted first-party sources

  • Process hundreds of products in a single task

  • AI automatically matches file content to your product attributes

✅ Before You Start

Make sure you have completed these prerequisites:

Requirement

Why It's Needed

Files uploaded and processed

Files must be ingested before extraction (check for email notification)

Products have SKU or Product Model

These identifiers match products to file pages

Attributes created in your account

AI extracts data into your existing attributes

Attribute names match file headers

Best results when your attribute names closely match column headers or labels in the PDF

⚠️ Important: Products must have at least one image to be included in extraction tasks.

🔍 Step 1: Select Products to Enrich

  1. Go to "All Products" from the menu

  2. Use checkboxes to select one or more products

  3. Hover over "Enrich Data" in the action bar

  4. Click "File Data Extraction"

⚙️ Step 2: Configure Extraction Options

A modal appears with extraction options:

Include Enriched Attributes

Option

What It Does

No (default)

Only extract into empty attributes

Yes

Re-extract all attributes, even if they already have values

Preferred Files

Optionally select specific files to prioritize. This is useful when:

  • A product appears in multiple files

  • You want data from a specific catalog or spec sheet

▶️ Step 3: Start the Extraction

  1. Review your options

  2. Click "Start Extraction"

  3. Your task is now queued!

📦 Step 4: Monitor Your Task

  1. Go to "Enrichment Task" from the menu

  2. Find your task in the list — Source will show "File-Based Extraction"

  3. Click the Task ID to see individual products

🧠 Step 5: Review Extracted Data

Once extraction is complete:

  1. Click any Product ID to view its details

  2. Look for the PDF icon next to enriched fields — this indicates the value came from a file

  3. Click the source icon to see which file page was used

Viewing Sources

The source panel shows:

  • PDF sources — which file and page the data came from

  • Web sources — if you also ran web-based enrichment

  • Enrichment Summary — breakdown of all sources used

✔️ Step 6: Approve or Reject Values

For each extracted field:

  • Click Approve to lock the value

  • Click Reject to clear it and allow re-enrichment

💡 Tip: Approved values won't be overwritten in future extractions.

💡 Tips for Best Results

1. Match attribute names to file headers

  • If your PDF has a column called "Dimensions", name your attribute "Dimensions"

  • Close matches also work (e.g., "Size" may match "Dimensions")

2. Use both SKU and Product Model

  • Having both identifiers improves matching accuracy

  • At minimum, one must be present

3. Clean tables extract better

  • Simple table structures give best results

  • Avoid files with complex nested layouts

4. One-page spec sheets are ideal

  • One product per page = highest accuracy

❓ Troubleshooting

Issue

Solution

Product not included in task

Check that it has an image and SKU/Product Model

No data extracted

Verify the product identifier appears in the PDF

Wrong data extracted

Check if attribute names match PDF column headers

File not available

Ensure file processing is complete (wait for email)

Did this answer your question?