Use Trustana's AI to automatically extract product attributes from your uploaded PDF files. This guide shows you how to run extraction tasks and review the results.
Supported on: All plans
Required role: Admin or Content Manager
What is File Data Extraction?
File Data Extraction uses AI to read your uploaded PDFs and fill in product attributes automatically. Unlike web-based enrichment, this extracts data directly from your own documents — spec sheets, catalogs, and data sheets.
Key benefits:
Extract data from your trusted first-party sources
Process hundreds of products in a single task
AI automatically matches file content to your product attributes
Before You Start
Make sure you have completed these prerequisites:
Requirement | Why It's Needed |
Files uploaded and processed | Files must be ingested before extraction (wait for email notification) |
Products have SKU or Product Model | These identifiers match products to file pages |
Products have at least one image | Required for enrichment tasks |
Attributes created in your account | AI extracts data into your existing attributes |
Important: Attribute names should closely match column headers or labels in the PDF for best results.
Step 1: Select Products to Enrich
Go to All Products from the menu
Use checkboxes to select one or more products
Hover over Enrich Data in the action bar
Click File Data Extraction
Step 2: Configure Extraction Options
A modal appears with extraction options:
Include Enriched Attributes
Option | What It Does |
No (default) | Only extract into empty attributes |
Yes | Re-extract all attributes, even if they already have values |
Note: Extraction takes approximately 2-4 minutes per product.
Step 3: Start the Extraction
Review your options
Click Start Extraction
Your task is now queued!
Step 4: Monitor Your Task
Go to Enrichment Task from the menu
Find your task in the list
Source will show File-Based Extraction
Click the Task ID to see individual products
Step 5: Review Extracted Data
Once extraction is complete:
Click any Product ID to view its details
Look for the PDF icon next to enriched fields
This indicates the value came from a file
Click the source icon to see which file page was used
Viewing Sources
The source panel shows:
PDF sources — which file and page the data came from
Web sources — if you also ran web-based enrichment
Enrichment Summary — breakdown of all sources used
Step 6: Approve or Reject Values
For each extracted field:
Click Approve to lock the value
Click Reject to clear it and allow re-enrichment
Tip: Approved values won't be overwritten in future extractions.
Tips for Best Extraction Results
Match attribute names to file headers
If your PDF has a column called "Dimensions", name your attribute "Dimensions"
Close matches also work (e.g., "Size" may match "Dimensions")
Use both SKU and Product Model
Having both identifiers improves matching accuracy
At minimum, one must be present
Check your identifier format
The SKU or Product Model in Trustana must match exactly how it appears in the PDF
Copying product name to Product Model field doesn't work
Troubleshooting
Issue | Solution |
Product not included in task | Check that it has an image and SKU/Product Model |
No data extracted | Verify the product identifier appears in the PDF |
Wrong data extracted | Check if attribute names match PDF column headers |
File not available | Ensure file processing is complete (wait for email) |
What's Next
Uploading and Managing Files — Learn how to upload and prepare files
Understanding AI Enrichment Sources — Compare file-based and web-based enrichment
Need Help?
Reach out to your Customer Success Representative if you have questions about file data extraction.
