Learn how to upload PDF documents to your knowledge base and extract insights through intelligent search and summarization.
What You’ll Learn
This guide demonstrates how to:
- Upload PDF documents to your knowledge base
- Check processing status for document content
- Search within uploaded documents
- Generate summaries from PDF content
Prerequisites
- A Senso API key
- A PDF file to upload (max size: 20MB)
- Basic understanding of multipart form data uploads
Uploading a PDF Document
const API_URL = 'https://sdk.senso.ai/api/v1';
const API_KEY = 'YOUR_API_KEY';
async function uploadPDFDocument() {
try {
// Prepare form data
const formData = new FormData();
const pdfFile = await fetch('./documents/product-guide.pdf');
const pdfBlob = await pdfFile.blob();
formData.append('file', pdfBlob, 'product-guide.pdf');
formData.append('title', 'Product Guide 2024');
formData.append('summary', 'Comprehensive guide to our product features and capabilities');
// Upload the PDF document
const response = await fetch(`${API_URL}/content/file`, {
method: 'POST',
headers: {
'X-API-Key': API_KEY
},
body: formData
});
const content = await response.json();
console.log('PDF uploaded successfully!');
console.log('Content ID:', content.id);
console.log('Processing status:', content.processing_status);
// Check processing status
let processedContent = content;
while (processedContent.processing_status !== 'completed') {
await new Promise(resolve => setTimeout(resolve, 3000));
const statusResponse = await fetch(`${API_URL}/content/${content.id}`, {
headers: { 'X-API-Key': API_KEY }
});
processedContent = await statusResponse.json();
console.log('Status:', processedContent.processing_status);
}
console.log('PDF processing completed!');
// Search within the PDF
const searchResponse = await fetch(`${API_URL}/search`, {
method: 'POST',
headers: {
'X-API-Key': API_KEY,
'Content-Type': 'application/json'
},
body: JSON.stringify({
query: 'What are the main product features?',
max_results: 5
})
});
const searchResult = await searchResponse.json();
console.log('\nSearch Results:');
console.log('Answer:', searchResult.answer);
console.log('Number of sources:', searchResult.results.length);
// Generate a summary
const summaryResponse = await fetch(`${API_URL}/generate`, {
method: 'POST',
headers: {
'X-API-Key': API_KEY,
'Content-Type': 'application/json'
},
body: JSON.stringify({
content_type: 'product documentation PDF',
instructions: 'Create a concise executive summary of the key points',
max_results: 10
})
});
const summary = await summaryResponse.json();
console.log('\nGenerated Summary:');
console.log(summary.generated_text);
} catch (error) {
console.error('Error processing PDF:', error);
}
}
uploadPDFDocument();
Working with Large PDFs
For larger PDF documents, processing may take longer. Here’s how to handle it efficiently:
// Upload with progress tracking
async function uploadLargePDF(filePath) {
const formData = new FormData();
const fileResponse = await fetch(filePath);
const fileBlob = await fileResponse.blob();
const fileSize = fileBlob.size / 1024 / 1024;
console.log(`Uploading ${fileSize.toFixed(2)}MB PDF...`);
formData.append('file', fileBlob, 'large-document.pdf');
formData.append('title', 'Large Document');
const response = await fetch(`${API_URL}/content/file`, {
method: 'POST',
headers: { 'X-API-Key': API_KEY },
body: formData
});
const content = await response.json();
// Poll for status with exponential backoff
let delay = 2000;
while (content.processing_status === 'processing') {
await new Promise(resolve => setTimeout(resolve, delay));
const statusResponse = await fetch(`${API_URL}/content/${content.id}`, {
headers: { 'X-API-Key': API_KEY }
});
const status = await statusResponse.json();
if (status.processing_status === 'failed') {
throw new Error('PDF processing failed');
}
content.processing_status = status.processing_status;
delay = Math.min(delay * 1.5, 30000); // Max 30 seconds
}
return content;
}
Understanding PDF Processing
When you upload a PDF, Senso:
- Extracts text content from all pages
- Preserves document structure (headings, lists, tables)
- Chunks the content for optimal search performance
- Creates embeddings for semantic search
The response includes:
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"type": "document",
"title": "Product Guide 2024",
"file_name": "product-guide.pdf",
"mime_type": "application/pdf",
"processing_status": "processing",
"created_at": "2024-01-15T10:30:00Z"
}
Best Practices
- File size limits: Keep PDFs under 20MB for optimal processing
- Text-based PDFs: Ensure PDFs contain selectable text (not scanned images)
- Descriptive titles: Use clear titles to help identify content later
- Processing time: Allow 1-2 minutes per MB for processing
- Error handling: Always check processing status before searching
Common Use Cases
result = requests.post(
f'{API_URL}/search',
headers={'X-API-Key': API_KEY},
json={
'query': 'What are the technical specifications?',
'category_id': 'technical-docs-category'
}
).json()
Generate FAQ from PDF
faq = requests.post(
f'{API_URL}/generate',
headers={'X-API-Key': API_KEY},
json={
'content_type': 'technical documentation',
'instructions': 'Generate 10 frequently asked questions with answers',
'save': True
}
).json()
Next Steps