BloombergGPT is Live. A Custom Large Language Model for Finance

Bloomberg is a financial technology and data company with a core business - a subscription-based software platform (Terminal) used by finance professionals for real-time data, analytics, trading tools, and news (this is what Wall Street traders, analysts, and portfolio managers rely on daily). Bloomberg has a strong AI and machine learning division, which developed BloombergGPT, its own financial large language model trained on proprietary financial data. So, it’s time to take the lesson, if you want to build or modernize your own financial product based on a custom LLM and do it smarter, faster, and cheaper.

Contents

BloombergGPT Is in Production or Not?

The last official updates about custom financial LLM BloombergGPT came in 2023. All attempts to find out whether BloombergGPT is open source, to locate it on Hugging Face or GitHub, to download and try it, or just to understand how to access the BloombergGPT’s demo and its cost -  end in silence. The internet’s full of rumors. Some say BloombergGPT is already obsolete, frustrated by the lack of updates. Others laugh at the money spent, saying ChatGPT-4 does the same thing better, faster and cheaper for the end-user. They argue Bloomberg should’ve waited for stronger models.

The team keeps quiet. Maybe because money doesn’t like noise. What matters: the model is already in production, built into Bloomberg’s stack.

Doug Levin, a successfull startup founder, now at Harvard, wrote a review after testing BloombergGPT inside the Terminal. Called it a disruptive layer in Bloomberg’s legacy architecture. Not a research demo. A system already shaping workflows from the inside.

Directly mentioned use cases from Doug Levin’s article:

  1. Financial report generation
  2. Market summaries
  3. Trading ideas or analysis
  4. Financial data analysis
  5. Market trend predictions
  6. Sentiment analysis
  7. Automated report generation
  8. Language translation
  9. Financial document text generation
  10. Risk assessment
  11. Real-time market updates
  12. Support in client communication
  13. Support in regulatory compliance
  14. S-1 analysis and modeling
  15. Search functionality via Bloomberg Search (SEAR)

BloombergGPT-like LLM: Train from Scratch or Fine-Tune?

A lot was written about BloombergGPT in broad terms: about the impressive results achieved by the team behind this custom financial LLM, how it outperformed many other models. But very little was said about what was happening backstage: what it’s actually like to develop a specialized model for the financial industry, how resource-intensive that process is, and what kinds of challenges these teams run into. Time to change that.

Given that, we decided it was worth doing some reverse engineering: cutting through the PR noise in the available information to uncover insights that will likely stay relevant for a long time for any company that decides to build their own financial LLM.

David Rosenberg, who leads the BloombergGPT development team, is still in his position (LinkedIn says so). And according to his social media, he insists that the information from mid-2023 about the model is still relevant. In this context, the information shared on The TWIML AI Podcast with Sam Charrington genuinely deserves close attention.

“Using an API like OpenAI’s is not suitable for us: we have data we don’t want to send out. So for internal and sensitive use, in-house models are preferable.”
— David Rosenberg, The TWIML AI Podcast

Let’s take a closer look at what else they discussed.

Financial LLM use cases

What the BloombergGPT's development team actually spent the most time on was thinking about financial LLM use cases from a variety of angles.

For example, could BloombergGPT LLM help them solve problems they already had solutions for, but in a better way? Or with less investment in training data?

They explored use cases like natural language to BQL (Bloomberg Query Language is used inside Bloomberg Terminal to pull structured financial data, the idea was to build a kind of financial code assistant that translates human language into Bloomberg-specific queries). They wanted an internal code assistant that understood their libraries. Or the ability to input a large document and interact with it: ask what information it contained, that sort of thing.

In that sense, they were exploring many directions to see where the model could have the most impact.

“As for production use, we need to be very cautious. No one has really solved the hallucination problem yet. These language models can say wrong things, do strange things, so there needs to be a process around making them safe to use, either internally or, eventually, with clients.”
— David Rosenberg, The TWIML AI Podcast

They started with internal use, and in that context, it wasn’t so much about safety or reputation,  it was about function: was it useful, did it do the job? That was their focus at the time.

They were also aware that if people started relying on an LLM for internal tasks, they become less critical of its output. 

So teams building custom LLMs needed a special system for checking its work. But then came the obvious question: if someone was always checking it anyway, should they just do the task themselves?

In short, they kept things internal and focused on finance, code completion, and basic summarization tasks.

Backstage of BloombergGPT Financial LLM Development

Decision Behind Creating a Custom Financial LLM

BloombergGPT is an example of a project where a team inside an enterprise trained and built a custom large language model specializing in financial language.

The enterprise made a strategic decision to invest money, time, and human resources into this machine learning effort when GPT-3 was released.

“The question was, is this a direction we pursue, we invest in? Because it was clearly a big investment. We didn’t know how much GPT-3 actually cost to make, but it was clear that it was a huge investment. We decided it was worth making the move. Maybe there’s some risk there, but it seemed like the possibilities were pretty great. That was kind of a decision made back in late 2020 — to start building towards this goal of our own GPT-3-style model. I’m not sure we knew exactly at that time what it would be used for. We're still experimenting to figure out how best to use it.” —
— David Rosenberg, The TWIML AI Podcast

Training Dataset for BloombergGPT

In some ways, it was a general-purpose model — but also purpose-built for financial applications.

The training dataset was a mix of standard general-purpose data used for GPT-style models and Bloomberg’s proprietary financial data. About half of the dataset came from Bloomberg’s curated collection called FinPile, built over many years starting in 2007.

It included financial reports, news articles, filings, press releases, earnings call transcripts, and other structured content.

Some documents included tables and charts. They didn’t do any special processing for this training run — but when that information had already been extracted, they used it.

Structured data wasn’t treated differently. It was tokenized like any other content.

However, one area of concern was numerical data.

Finance involves a heavy amount of numbers. They were concerned that the GPT-2 tokenizer didn’t treat numbers in any special way. A number like 5,234 could be split unpredictably — not digit-by-digit, not as a single unit — making it harder for the model to reason about numeric values.

So they used character-level tokenization for numbers, allowing the model to learn digit structure and positional order. They followed an approach similar to Google’s PaLM model, where numbers were split into individual digits. This helped the model understand that the first digit carries the highest value, and so on — one way they adapted the model for financial data.

BloombergGPT Training Process

At the time, Meta’s OPT model had just been released. They used its training logs as a roadmap. Hugging Face’s BLOOM also published detailed logs, which helped guide their process.

To reduce risk, they made their architecture as close as possible to something that had already worked.

“We copied the BLOOM model architecture fairly closely, with some small tweaks. Tokenization and number handling were two key pieces. We called it v0.”
— David Rosenberg, The TWIML AI Podcast

One thing they tried was sorting FinPile data chronologically — thinking newer data might be more accurate — while the rest of the data was randomly shuffled. For validation, they used the month immediately following the training set.

They trained for 4–5 days and saw the loss curve level off. After 8–10 days, they stopped training. They suspected that curriculum learning (via time sorting) wasn’t helping. They restarted with fully shuffled data.

That became version one of training. It started off stronger.

Around day 8, the gradient norm spiked. Validation performance also dropped. They turned to the OPT paper’s troubleshooting guide, rolled back to a checkpoint, reshuffled data, and lowered the learning rate — but saw no major improvement.

They investigated further and noticed something strange: out of 70 layers, the first layer’s layer norm scale weights dropped, then suddenly increased. This pointed to a bug: they were applying weight decay to weights that should’ve remained centered around one.

They fixed this in version two, did a full code review, improved mixed-precision handling, and added an extra layer norm at the beginning.

Then they restarted training. This time, it worked.

They trained for 42 days, with a steady loss decrease. They hit some challenges around 75% of the dataset and eventually stopped training — performance had already exceeded expectations.

Resources Used to Train BloombergGPT

The core team included about nine people:

  • Four focused on implementation
  • Three on ML and data
  • One on optimization and compute
  • The rest handled evaluation, literature review, and support

They trained on Amazon SageMaker using SMP (Sharded Model Parallelism) with 40GB A100 GPUs — 512 in total.

They pre-purchased around 1.3 million GPU hours at a negotiated rate.

Validation and Performance Evaluation

During training, they used the last month of training data for validation. Later, they added a random validation set.

They also monitored downstream tasks like MMLU and BBH (Big Bench Hard).

After training, they did a full evaluation and compared BloombergGPT to OPT-66B, BLOOM, and GPT-NeoX. On general-purpose benchmarks, it was competitive. On financial tasks, it significantly outperformed open models.

For example, on ConfFinQA (a benchmark requiring numerical reasoning in financial docs), it performed extremely well.

They also had internal benchmarks:

  • Sentiment analysis on financial news and social media
  • Named entity disambiguation (linking "Apple" to its stock ticker, etc.)
  • Natural language to BQL, which is like SQL for Bloomberg Terminal

Even though the model wasn’t trained on BQL directly, it performed well in few-shot scenarios.

They also experimented with headline generation and other generative tasks.

What’s Next for the BloombergGPT Team?

They had skipped a lot of early experimentation due to time constraints. Now that they have a working model, they’re going back to small-scale experiments - testing tokenization strategies, data mixtures, and architecture choices in a more disciplined way.

They’re continuing instruction tuning using public data (like FLAN) and internal labeled datasets. They have rich internal data for tasks like entity recognition, which they’re formatting into query–response pairs for tuning.

“We’re more interested now in smaller models. They’re easier to use: you can run inference on a single GPU. Our 50B model requires multi-GPU infrastructure. Inspired by the LLaMA paper, we’re exploring what we can achieve with smaller models, longer training, and careful design. We want both small and large models for practical use.” — David Rosenberg, The TWIML AI Podcast

Financial LLM: A Mirage?

  1. Financial LLMs are not a fantasy. There are already startups in this space that have gone to production, raised funding, and landed their first clients. The most well-known are those backed by YCombinator, for example, Truewind. So in fact, the question is simple: does the team have the Product Thinking or not?
  2. LLMs have made serious progress over the past two years, and it’s entirely possible that Bloomberg’s team are gradually shifting to a new technological foundation, replacing what they’d built before. Or maybe they’re just continuing to improve their system quietly - and we’ll hear updates soon enough.
  3. The key point: you can’t expect a financial LLM to do what it simply wasn’t built to do. LLMs have core limitations by design. This is clearly shown in this YCombinator discussion and explored in detail in this 2025 review of financial LLM capabilities.

So, is a financial LLM a mirage? Yes,  if used the wrong way. No, if used right.

How Belitsoft Can Help

Building a financial LLM is not a technical challenge. It’s a product challenge.

Belitsoft helps financial firms build custom language models that are production-ready from the start. A good financial LLM needs to fit the firm’s data, language, workflows, and risk boundaries.

This means:

  • Training smaller models on tightly scoped tasks
  • Fine-tuning existing open models with in-house financial content
  • Building the validation systems
  • Designing instruction datasets from internal annotations instead of starting from scratch
  • Embedding the model behind interfaces users already trust, like dashboards and query layers

The result is not just a model. It’s a usable system. A decision tool. An internal assistant. A document engine. Whatever your workflow needs.

Belitsoft turns that strategy into a service: all deployments stay in-house. No data leaves the firm. No prompts go to external APIs. The entire model lifecycle is private, owned, and secure.

Frequently Asked Questions

BloombergGPT is not open-source. That was never the plan.

There’s no GitHub repo, or Hugging Face model card. No downloadable checkpoint. Everything released so far has been limited to the original research paper, the podcast, and scattered quotes from Bloomberg's team. The architecture was based on BLOOM, but BloombergGPT itself remains proprietary.

It was never about open innovation. It was about internal experimentation, driven by control, compliance, and data privacy. So if you're looking to build on BloombergGPT, you can't. You have to build your own.

What Did BloombergGPT Really Cost?

The cost of BloombergGPT depends on what question you are actually asking. Is it the cost to train the model? The cost to develop it? Or the cost to use it?

Cost of Training

The hardware footprint is public. Bloomberg trained BloombergGPT on 512 A100 40GB GPUs over ~42 days, clocking in around 1.3 million GPU hours. They used Amazon SageMaker with Sharded Model Parallelism (SMP) for distributed training.

Even at the low end of cloud GPU pricing, $1.00 per hour for reserved capacity, that puts the compute cost at $1.3 million minimum.

They never published a dollar figure. But they gave enough numbers to back into one.

Cost of Development

The team was quite big: nine people in total. Four engineers worked on modeling and data. One focused on compute optimization. The rest covered evaluation and research.

Training took six weeks. But the real work started long before. Architecture decisions, tokenizer experiments, data prep, validation setup, and debugging stretched across six to nine months. You can trace that from the interviews.

Assume senior-level compensation and long-term staffing in Canada. The cost of development alone lands somewhere between $2 million and $3 million. That’s before you turn on a single GPU.

The takeaway: the full R&D cost of BloombergGPT lands somewhere between $3.5 million and $8 million, with training.

The Cost of Using

Then there’s the question of what it costs to use BloombergGPT.

There is no public API. No license. No open weights. No product offering called “BloombergGPT.” Instead, the model is quietly embedded into the Bloomberg Terminal: Bloomberg’s decades-old enterprise product, used by almost every major player in finance.

The Terminal starts at $30,000 per user per year. It accounts for over 85% of Bloomberg’s total annual revenue. And now it includes BloombergGPT under the hood, powering SEAR (Bloomberg Search), report generation, narrative analysis, and financial modeling workflows.

That means the cost to “use” BloombergGPT is inseparable from the cost of using the Terminal itself. There is no separate price tag. The model is an embedded feature, bundled into an enterprise suite.

In effect, BloombergGPT becomes a retention moat, not a revenue line. It doesn’t monetize directly. It reinforces the value of something that already does.

Never miss a post! Share it!

Written by
Delivery Manager
"I've been leading projects and managing teams with core expertise in ERP development, CRM development, SaaS development in HealthTech, FinTech and other domains for 15 years."
5.0
1 review

Rate this article

Leave a comment
Your email address will not be published.

Recommended posts

Belitsoft Blog for Entrepreneurs

Our Clients' Feedback

elerningforce
technicolor
crismon
berkeley
hathway
howcast
fraunhofer
apollomatrix
key2know
regenmed
moblers
showcast
ticken
Let's Talk Business
Do you have a software development project to implement? We have people to work on it. We will be glad to answer all your questions as well as estimate any project of yours. Use the form below to describe the project and we will get in touch with you within 1 business day.
Contact form
We will process your personal data as described in the privacy notice
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply
Call us

USA +1 (917) 410-57-57

UK +44 (20) 3318-18-53

Email us

[email protected]

to top