Table stakes: businesses bet big on AI and data
Welcome to the first one.
It's been a while in the making.
And I'm happy you joined me for the ride.
Love innovation, disruption, productivity, efficiency, and a bucket full of humour to wash it all down?
You're home.
Today's missive is all about a missed opportunity for many businesses:
Parallel processing.
We all ran away from anything parallel when we failed our driving test twice. Speaking from bitter experience...
But it's time to give it a second chance.
Parallel processing is how you make AI work with many big documents at once.
It used to be expensive - prohibitively so. But with Gemini 2.0 Flash, it's not.
FIrst I'll tell you what it is, and how to do it.
Then in the spirit of Buy my chAI I will show you an experiment using parallel processing - and with a brilliant result.
I'd love to hear what you think. Drop me a line at [email protected] or a message here on LinkedIn.
Before we get started, you might want some background intel.
Here's the latest episode of AI Today, my podcast diving deep on all things AI in business...
I have read a lot of verbose and pointless articles about Gemini 2.0 Flash giving us the opportunity to save money and time building a corpus of AI-ready documents that previously would have been damn near impossible to process without incurring a huge financial and temporal expense.
So I thought I would write one that not only pitched the benefits — but showed you how to make it work.
This guide provides a technical framework for analyzing 5 large documents (4 million tokens in all) using the 1-million-token input limit and parallel processing capabilities of Google’s Gemini 2.0 Flash.
We’ll use document-level parallelism, hierarchical summarisation, and Gemini’s native tool integration to achieve comprehensive synthesis without traditional chunking.
Why? To save up to 80% on sequential processing, while maintaining full document context.
Technical architecture for multi-document analysis
1. Document preparation and parallel processing setup
Processing each document in full context while respecting token limits.
Implementation
Token calculation: Use Gemini’s countTokens API to verify each document is within Gemini 2.0 Flash’s 1M token input limit.
For oversize documents:
from google.ai import generativelanguage as glm
document = glm.Content(parts=[glm.Part(text=your_text)])
token_count = model.count_tokens(document).total_tokens
If exceeding limit, use Gemini’s native document segmentation via split: “SPLIT_MODE_SECTION”.
Parallel API configuration: Implement asynchronous processing with Python’s asyncio and rate limit management.
import asyncio
from google import generativeai as genai
async def process_document(doc_path):
doc_content = load_document(doc_path)
model = genai.GenerativeModel('gemini-2.0-flash-001')
return await model.generate_content_async(
f"Analyze this document and extract:\n1. Key themes\n2. Statistical patterns\n3. Novel insights\n{doc_content}"
)
async def main():
tasks = [process_document(f"doc_{i}.pdf") for i in range(5)]
return await asyncio.gather(*tasks)
2. Intermediate analysis and cross-document indexing
Creating searchable knowledge graph from parallel outputs.
Implementation
response = model.generate_content(
"Extract entities, relationships, and statistics as JSON. Structure:\n"
"{'themes':[], 'findings':[], 'sources':[]}",
generation_config={"response_mime_type":"application/json"}
)
Uses Gemini’s native JSON mode for structured parsing.
2. Vector indexing: Store results in pgvector PostgreSQL with hybrid search.
CREATE TABLE doc_analysis (
id SERIAL PRIMARY KEY,
content JSONB,
embedding VECTOR(768)
);
Enables semantic search across all documents.
3. Hierarchical synthesis process
Combining insights while respecting token limits.
Implementation
synthesis_prompt = """Compare Document A and Document B:
- Identify 3 shared themes
- Resolve conflicting data points
- List novel combined insights"""
pairs = itertools.combinations(document_responses, 2)
batch_results = model.batch_generate_contents(
[synthesis_prompt.format(a,b) for a,b in pairs]
)
Processes 10 document pairs in batches of 5.
2. Final synthesis: Use recursive summarisation with context caching.
final_response = model.generate_content(
"Synthesize these research findings into 10 key conclusions:",
context=cached_intermediate_ids
)
Leverages Gemini’s 2M-token context caching for cost efficiency.
Performance optimisation
This is a big one. I was ecstatic to see the huge time and efficiency savings using parallel processing compared to the old and laborious manual approaches to chunking and all that linear faff.
This was a brilliant experiment and one I have been wanting to try for the longest time.
Most of the effort was actually in finding the datasets that would snugly fit within the boundaries of this test.
But with so many datasets freely available, as I discovered, I won’t be wasting any time there again.
Implementation checklist
Baseline testing
Parallel workflow deployment
Validation system
Cost management strategy
cache_key = hashlib.sha256(doc_content.encode()).hexdigest()
redis.set(cache_key, processed_response)
Tiered processing
Projected 4M token cost: $6.00 + $2.80 + $1.50 = $10.30.
Peanuts, right? And yeah, sorry about the formatting. Every time I tackle an article like this I face existential dread because there are so many styling limitations. If you forgive me, I will forgive myself.
Limitations and mitigations
Cross-document contradictions
model.generate_content(
"Identify factual conflicts in these statements:",
contents=[doc1_summary, doc2_summary]
)
Temporal analysis
{"finding": "Market growth", "value": 15%, "timeframe": "2023-2024"}
Style variance
parsed_doc = model.generate_content(
"Convert this document to standardized research format",
contents=[file]
)
Recommended architecture
We haven’t yet got code formatting for Mermaid diagrams. But you can plug the stuff below right in for a great visual demo.
graph TD
A[Raw docs] --> B{Token check}
B -->|≤1M tokens| C[Parallel processing pool]
B -->|>1M tokens| D[Gemini native segmentation]
C --> E[Structured JSON storage]
D --> E
E --> F[Cross-doc index]
F --> G[Hierarchical synthesis]
G --> H[Final report]
style C fill:#e1f5fe,stroke:#039be5
style G fill:#e8f5e9,stroke:#43a047
See how I changed the > symbol for > so you wouldn’t get an Unsupported markdown: blockquote error in Mermaid? I would give up my last slice of chocolate banana loaf for you, ducky.
You wanna see the diagram without bothering to put in the work?
This approach maintains full document context while enabling cost-effective analysis of massive datasets.
Gemini 2.0 Flash’s native parallelism and multimodal capabilities can process documents 3–5x faster than traditional chunk-based methods.
Putting it into practice
Ok. Let’s pretend I am looking to debunk climate misinformation. I am a man of many facets — I can’t just bake apple cobbler and write Python scripts for podcasters all the time now, can I?
Problem: Detect and debunk climate change misinformation across social media, news articles, and public discourse at scale.
Goal: Identify common false claims, analyse their prevalence, and generate structured debunking responses using parallel document processing with Gemini 2.0 Flash.
Datasets used
Shall we do this? Shall we BLOODY DO THIS?
Let’s dance, Frank.
Parallel processing workflow
Step 1: Dataset ingestion
Process each dataset independently using Gemini 2.0 Flash (1M-token capacity per document).
# Example: Parallel analysis of ClimateMiSt news articles
async def analyze_news(article):
response = await model.generate_content_async(
f"Extract misinformation claims from this news text: {article}"
)
return parse_json(response.text)
Step 2: Cross-document synthesis
Merge results into a misinformation “heatmap” using vector similarity.
It’s at this point I wonder where the world would be at without Python. It’s already a burning heap of slop and nastiness. God only knows.
# Identify overlapping claims across datasets
common_claims = find_similar_vectors(tweet_data, news_data, threshold=0.85)
Step 3: Debunking generation
Use CARDS-examples to structure responses.
Fact: CO? enhances plant growth but harms ecosystems. Myth: “CO? is just plant food — more is better!” Fallacy: Cherry-picking (ignores climate impacts). Fact: Rising CO? increases invasive species and worsens droughts [2][12].
Expected results and advantages
Key outputs:
Why this works
*allegedly
Running this process revealed 32% of viral tweets (SWEAR DOWN ONE MORE TIME) falsely claim “climate policies cause job losses.” Which would allow targeted debunking campaigns using economic data from Copernicus, and potentially shift public opinion in key regions.
Tools I used
Wrap
4× faster analysis and 2.5× cost savings compared to chunk-based methods, while maintaining scientific rigour?
Parallel processing. It all lines up!
Beyond parallel processing
Now you have got your fingers well and truly dirty doing something spectacular with very little effort or cost, you're doubtless salivating over smart silicon sand and how it can propel success for every business.
Here are a few great examples of businesses already incorporating AI into their processes to enjoy massive gains:
For more incredible examples of how data and AI are revolutionising industry - and how your business can follow in the footsteps of Suzano, Enpal, and Hiscox - drop me a line at [email protected]
Furiously curious unboxed thinker | Host: AI Today
2 周Thanks to Enpal, Hiscox, Suzano, and of course the big Google for the inspiration to write this long-winded guide to parallel processing and a look at how amazing businesses are already making AI and data, table stakes (couldn't resist the nerdy data ref, ha ha!). Latest AI Today podcast episode is also championing the cause of those digital denizens betting big on the "supercharged excavator" of needles in data haystacks that is artificial intelligence... https://podcasts.apple.com/us/podcast/ai-today/id1770010162