Summarizing text is one of the main use cases for large language models. Clients often want to summarize articles, financial documents, chat history, tables, pages, books, and more. We all expect that LLM will distill only the important pieces of information, especially from long texts. However, this isn't always possible with the expected level of quality. Even a larger token limit isn’t a guaranteed solution. Fortunately, there are approaches that help summarize texts of different lengths - whether it’s a couple of sentences, paragraphs, pages, an entire book, or an unknown amount of text.
Basic Prompts to Summarize a Couple of Sentences
This is default behavior across almost any LLM, whether it's OpenAI, Anthropic, Mistral, Llama or others.
In this case, we simply copy and paste some text from the source and put it inside a prompt, giving the LLM an instruction like: “Please provide a summary of the following passage”.
If the output is still a little too complicated, we can adjust the instructions to get a different type of summary, for example: “Please provide a summary of the following text. Your output should be in a manner that a five-year-old would understand”, to get a much more digestible result.
This approach works when there aren’t too many tokens in the prompt (let’s say 200 tokens or 150 words). But as the number of tokens increases, like with larger documents, the summarization using basic prompts won’t be accurate and many things will be omitted, whether they were important for us or not.
Prompt Templates to Get the Summary in a Preferred Format
Prompt Templates help deal with the issue of inconsistent summary output across different texts - something that often happens when the input is long and you're using only a basic prompt like “Summarize this”.
For example, the prompt template may look like a rule: "Please write a one-sentence summary of the following text {}". Notice that we ask for “one sentence” instead of just a “summarize”.
With Prompt Templates, we can influence the quality of summary output in specific directions: format and length (“1 sentence” or “3 bullet points”), tone (simple language, executive style), and focus (only risks, only outcomes, only decisions). Keeping the output structure uniform is what we need for automation.
MapReduce method to summarize…summaries
Most LLM users think of summarization as "throw the whole text at the model → get a summary”. That’s why the output is often not as expected.
But the MapReduce method changes that into "break the text into chunks → summarize each →summarize the summaries”. You first generate individual summaries (map), then combine and condense them into one final summary (reduce). This reflects how we deal with long texts: we read in parts, take notes, then consolidate them to get a big picture.
So again, the main idea of the MapReduce method is to “chunk our document into pieces (that fit within the token limit), get a summary of each individual chunk, and then finally get a summary of the summaries”.
MapReduce method is mostly used for creating customized apps or workflows that run on top of general-purpose LLMs (like GPT-4, Claude, LLaMA, etc.) using frameworks like LangChain or raw Python.
Here is the general workflow for summarization using the MapReduce technique:
- Load the input document into RAM (LangChain equivalent: open(file).read())
- Estimate whether the text exceeds the token limit (for example, 2,000 tokens) (LangChain equivalent: llm.get_num_tokens(text))
- Split the text into smaller chunks that fit within the LLM's context window (LangChain equivalent: RecursiveCharacterTextSplitter())
- Convert chunks into a structured format (for example, a list of texts or document objects) (LangChain equivalent: create_documents())
- Write a per-chunk prompt template to summarize each individual chunk (LangChain equivalent: map_prompt_template)
- Write a final combine prompt template that summarizes the chunk-level summaries into bullet points or another format (LangChain equivalent: combine_prompt_template)
- Run the Map phase - apply the per-chunk prompt to each chunk (LangChain: map_reduce)
- Run the Reduce phase - apply the final combine prompt to the intermediate summaries (LangChain: map_reduce)
- Output the final result - the combined summary (in list, bullet point, or paragraph form) (LangChain equivalent: print(output))
As you can see, if you try to build a real application that does reliable document summarization - say, for legal teams, financial analysts, internal knowledge search, or anything beyond casual reading - the tools available in the ChatGPT web interface or a raw API call aren't enough on their own.
Yes, you can upload a PDF into a web version of ChatGPT, and ask it to apply the MapReduce method, and if the file is small and the content is simple, you’ll get a decent summary. But, you’ll hit limitations: unpredictable behavior, like content skipped or compressed too aggressively. It's very hard to control how the content is split, to loop over each section with a consistent prompt, and combine those outputs in a structured way.
Even if you use OpenAI API, you still have to build everything else yourself: chunk the input, manage prompts for each part, send multiple API calls, and then combine the outputs. The API just gives you the LLM - it doesn’t provide a system for managing workflows.
That’s where a middle layer comes in. You build a lightweight backend that handles the logic: read a long document, split it, summarize each piece with the same prompt, and combine the results at the end. This logic is what frameworks like LangChain or LlamaIndex help with, but you can also build it yourself in plain Python.
Embeddings and Clustering to Summarize… Books
Some PDFs may contain as much content as a full book, and sometimes we want to summarize that amount of text as well. What kind of size are we talking about? For example, 140,000 tokens - roughly 100,000+ words.
If you send a prompt with that much text to a commercial LLM, it’ll cost you a significant amount, even if the model can process it in one go. Commercial LLMs like ChatGPT charge you twice: once for the input tokens it processes, and once for the output tokens it generates.
Moreover, there’s something important to understand that should make you think twice before applying the MapReduce method to such a large amount of text: semantic similarity. Experienced readers know that a book rarely contains completely unique content from beginning to end. The same ideas are often repeated - just phrased in different ways.
That’s yet another reason against blindly chunking a book and applying MapReduce - you’ll likely send multiple chunks with the same meaning to the LLM, get nearly identical summaries, and overpay for it. That’s not the kind of situation smart people who care about costs want to end up in.
Starting from the idea of semantic similarity, we may realize that all we need to do before sending a book’s text to an LLM is remove parts that are similar (in other words, not important for the summary because they repeat the same ideas). So, in general, our goal becomes compressing the meaning before submitting it for processing.
This is exactly the stage where we start thinking about using text preprocessing methods like embedding (converting each text chunk into a vector that captures its meaning, like [0.11, -0.44, 0.85, ...], so we can measure similarity between chunks) and clustering (grouping similar vectors together to avoid redundancy and pick one best passage from each group).
So again, we’re not going to feed the entire book to the model - only the important parts, let’s say the 10 best sections that represent most of the meaning. We ignore the rest because it adds no new angle or dimension to what we want to learn from the summary.
At this stage, what we really want is to scientifically select only those sections of the book that represent a holistic and diverse view - covering the most important, distinct parts that describe the book best. To do this, we need to form “meaning clusters,” and from each diverse cluster, we want to select just one best representative - the one that is closest to the “cluster centroid” (each cluster has its own centroid).
Building AI Agents to Summarize Documents
At the very least, what should we do if our workflow logic requires summarizing an unknown amount of text? We should use agents for this.
Such agents are able to handle complex tasks. For example, the question we want to ask the LLM requires several steps to answer: searching more than one source, summarizing, and combining the findings.
The agent should grab the first doc, pull out the key points, then do the same with the second, etc. After that, it should combine overlapping ideas and write the final answer. The agentic approach is designed to handle that chain of steps automatically.
How Belitsoft Can Help with Large-Scale LLM Summarization
Belitsoft builds custom software systems for clients who need to summarize large documents with LLMs - whether it’s legal contracts, medical records, financial disclosures, or internal knowledge bases.
1. Building the Backend Logic
Summarization workflows need orchestration: splitting, prompting, combining, looping. We can design and implement that backend layer to manage the full workflow. You get a consistent system that runs every time.
2. Integrating with Your Data
Have data in SharePoint, PDFs, CSVs, internal portals, or something else? We can pull from all of them. We connect documents, preprocess the text, and structure it into chunks or embeddings - ready for summarization or retrieval.
3. Adding Embeddings & Clustering
If you’re dealing with high-volume or repetitive content (like training manuals or clinical trial logs), we implement embeddings plus clustering to compress meaning before summarization. That saves cost and improves output diversity.
4. Deploying Agent-Based Workflows
We build multi-step agents using open-source stacks or your preferred LLM API - fully custom and scoped to your use case.
5. Packaging it All in a UI or API
Whether you need an internal tool with an intuitive interface or just an API - we wrap the logic in a usable form. You click, and get structured summaries back.
Rate this article
Recommended posts
Our Clients' Feedback













We have been working for over 10 years and they have become our long-term technology partner. Any software development, programming, or design needs we have had, Belitsoft company has always been able to handle this for us.
СEO at ElearningForce International (Currently Zensai) (United States/Denmark)