Use Trustana's AI to automatically extract product attributes from your uploaded PDF files. This guide shows you how to run extraction tasks and review the results.
🎯 What is File Data Extraction?
File Data Extraction uses AI to read your uploaded PDFs and fill in product attributes automatically. Unlike web-based enrichment, this extracts data directly from your own documents — spec sheets, catalogs, and data sheets.
Key benefits:
Extract data from your trusted first-party sources
Process hundreds of products in a single task
AI automatically matches file content to your product attributes
✅ Before You Start
Make sure you have completed these prerequisites:
Requirement | Why It's Needed |
Files uploaded and processed | Files must be ingested before extraction (check for email notification) |
Products have SKU or Product Model | These identifiers match products to file pages |
Attributes created in your account | AI extracts data into your existing attributes |
Attribute names match file headers | Best results when your attribute names closely match column headers or labels in the PDF |
⚠️ Important: Products must have at least one image to be included in extraction tasks.
🔍 Step 1: Select Products to Enrich
Go to "All Products" from the menu
Use checkboxes to select one or more products
Hover over "Enrich Data" in the action bar
Click "File Data Extraction"
⚙️ Step 2: Configure Extraction Options
A modal appears with extraction options:
Include Enriched Attributes
Option | What It Does |
No (default) | Only extract into empty attributes |
Yes | Re-extract all attributes, even if they already have values |
Preferred Files
Optionally select specific files to prioritize. This is useful when:
A product appears in multiple files
You want data from a specific catalog or spec sheet
▶️ Step 3: Start the Extraction
Review your options
Click "Start Extraction"
Your task is now queued!
📦 Step 4: Monitor Your Task
Go to "Enrichment Task" from the menu
Find your task in the list — Source will show "File-Based Extraction"
Click the Task ID to see individual products
🧠 Step 5: Review Extracted Data
Once extraction is complete:
Click any Product ID to view its details
Look for the PDF icon next to enriched fields — this indicates the value came from a file
Click the source icon to see which file page was used
Viewing Sources
The source panel shows:
PDF sources — which file and page the data came from
Web sources — if you also ran web-based enrichment
Enrichment Summary — breakdown of all sources used
✔️ Step 6: Approve or Reject Values
For each extracted field:
Click Approve to lock the value
Click Reject to clear it and allow re-enrichment
💡 Tip: Approved values won't be overwritten in future extractions.
💡 Tips for Best Results
1. Match attribute names to file headers
If your PDF has a column called "Dimensions", name your attribute "Dimensions"
Close matches also work (e.g., "Size" may match "Dimensions")
2. Use both SKU and Product Model
Having both identifiers improves matching accuracy
At minimum, one must be present
3. Clean tables extract better
Simple table structures give best results
Avoid files with complex nested layouts
4. One-page spec sheets are ideal
One product per page = highest accuracy
❓ Troubleshooting
Issue | Solution |
Product not included in task | Check that it has an image and SKU/Product Model |
No data extracted | Verify the product identifier appears in the PDF |
Wrong data extracted | Check if attribute names match PDF column headers |
File not available | Ensure file processing is complete (wait for email) |
