Skip to content

Quantum Computing News

  • Home
  • Quantum News
    • Quantum Computing
    • Quantum Hardware and Software
    • Quantum Startups and Funding
    • Quantum Computing Stocks
    • Quantum Research and Security
  • IMP Links
    • About Us
    • Contact Us
    • Privacy & Policies
  1. Home
  2. Quantum Computing
  3. Thompson Sampling Via Fine-Tuning LLM for Bayesian Optimize
Quantum Computing

Thompson Sampling Via Fine-Tuning LLM for Bayesian Optimize

Posted on October 17, 2025 by Agarapu Naveen5 min read
Thompson Sampling Via Fine-Tuning LLM for Bayesian Optimize

Thompson Sampling Via Fine-Tuning (ToSFiT) of LLMs Achieves Scalable Bayesian Optimization in Complex Discrete Spaces

Thompson Sampling via Fine-Tuning (ToSFiT), a major breakthrough in optimization algorithms, has been revealed by researchers from ETH Zürich and IBM Research, Zurich. This innovative strategy overcomes the difficulty of searching huge and complex search spaces, where conventional gradient-based approaches usually fall short, by utilizing the power of large language models (LLMs). ToSFiT presents a scalable approach to Bayesian optimization (BO) that effectively circumvents the computationally costly acquisition function maximization procedure.

The novel method achieves high theoretical performance guarantees while significantly increasing efficiency in real-world applications by gradually modifying LLMs to reflect the growing understanding of the search area. Together with Abbas Rahimi from IBM Research, Zurich, the team behind this work consists of Nicolas Menet, Aleksandar Terzić, and Andreas Krause from ETH Zürich.

You can also read SPDC Quantum: Spontaneous Parametric Down Conversion news

Overcoming the Optimization Hurdle in Discrete Domains

When reward function evaluations are expensive or time-consuming, Bayesian optimization is a crucial algorithmic framework for automated discovery and large-scale experimental design. BO employs this statistical model to direct the search for promising configurations while maintaining a posterior distribution over unknown rewards. Traditionally, the process of choosing new candidates involves optimizing an acquisition function that strikes a compromise between exploitation (improving on current solutions) and exploration (testing out new options).

Thompson sampling (TS) is unique among acquisition procedures because to its robust empirical performance and cutting-edge convergence guarantees. In order to treat the realization as an acquisition function, TS usually draws a reward function realization from the posterior and chooses the point that maximizes it.

However, because effective search is impossible in huge unstructured discrete domains like the space of amino acid sequences or correct code for quantum circuits due to the lack of gradients, this maximization stage poses a fundamental issue. An exhaustive search is impossible because, for example, a protein search space with 20 amino acids and a maximum sequence length of 100 already surpasses the number of atoms in the observable universe. In these combinatorial spaces, conventional gradient-based techniques are intractable and frequently necessitate iteration over every point.

You can also read Universal Gröbner Bases Enable Next-Gen Post-Quantum World

ToSFiT: LLMs as Generative Optimizers

The researchers created ToSFiT in order to scale BO to these complicated, high-dimensional areas. ToSFiT uses a generative LLM to parameterize the probability of minimality (PoM), or the likelihood that a candidate solution is optimal, directly rather than maximizing an acquisition function. By treating the resulting proposals as Thompson samples, costly acquisition function maximization is avoided.
The Variational Bayesian Optimistic Sampling (VBOS) paradigm serves as the foundation for ToSFiT. Importantly, ToSFiT begins the optimization process with a pre-trained language model that has been prompt-conditioned. This gives it a solid prior knowledge basis, which speeds up learning. Online fine-tuning is the method by which it carefully adjusts the model parameters towards the posterior PoM utilising the VBOS objective.

In order to compute the reward posterior in closed form and enable conditioning on observations, the researchers used linear kernels over learnt features to implement scalable Gaussian process (GP) inference. This indicates that the memory and computational complexity scale in Θ(dim(H) 2) rather than the number of previous observations.

Reinforcement learning techniques, notably the Reinforce Leave-One-Out (RLOO) baseline, were used to stabilize the gradient estimation needed for fine-tuning the LLM. Group Relative Policy Optimization’s (GRPO) advantage function is technically identical to standardized RLOO.

You can also read Meson-Antimeson Mixing Studies CP Violation in Standard Model

Theoretical Guarantees and Policy Initialization

The study offers substantial theoretical support for ToSFiT. In order to show that the cumulative regret scales with the maximal information gain (γT) rather than the size of the search space (∣X∣), the researchers developed a novel regret restriction for a variational formulation of Thompson Sampling. This significantly outperforms earlier constraints for precise VBOS, which scaled as O ~ ( T∣X∣), a constraint in combinatorically huge domains that is vacuous. In d dimensions, this new bound scales nicely as O(dlogT) for a linear kernel.

This theoretical approach emphasizes how important careful adaptation is. The approximation error between the precise VBOS maximiser (πt) and the sampling policy (~t) It runs the risk of overpowering the total remorse. To address this, it is crucial to initialise ToSFiT through pre-training and context, which guarantees that the policy begins in the appropriate area of the probability simplex. A robust initial policy produces significantly superior performance, according to empirical research, and careful adaptation (using low learning rates) is necessary to preserve this prior knowledge and prevent performance stagnation.

Validation Across Diverse Tasks

ToSFiT’s efficacy in sample efficiency with minimal impact on computing cost was confirmed by empirical validation across three very different search issues.

  1. FAQ Response Refinement: Using a Qwen3-1.7B model, this natural language challenge optimizes content according to semantic alignment to an unknown ground-truth response.
  2. Thermally Stable Protein Search: Creating amino acid sequences that optimize thermal stability a crucial characteristic for medication development is the challenge in this field. Sequences were sampled using ProtGPT2, and the search space is exponentially big.
  3. Quantum Circuit Design: This calls for employing a Qwen2.5-Coder-1.5B model to navigate a large, discrete space of legitimate quantum programs in order to create Qiskit circuits that prepare low-energy quantum states in unknown contexts.

Because Unguided Generation does not use feedback, it rapidly reaches an unsatisfactory reward level in all experimental conditions. Although Post-Generation TS, a traditional BO technique over a predetermined subset of candidates, finds effective solutions quickly, it is limited to its starting pool and saturates too soon. ToSFiT, on the other hand, performs BO throughout the whole solution space and constantly finds candidates with larger rewards. Additionally, it demonstrated better exploration efficiency through optimism in the face of uncertainty, outperforming baselines such as Actor Critic and Soft Actor Critic.

Additionally, Thompson sampling is ideal for batched optimization since it naturally produces a variety of candidates. This ability is demonstrated by ToSFiT, which shows that batching greatly increases iteration efficiency and reaches target performance in fewer rounds, even while it somewhat decreases sample efficiency. This is crucial when observations are time-consuming or delayed.

The results validate that complex, discrete search problems can be solved by combining principled Bayesian optimization with strong foundation models. To further lower computing cost, future work will try to incorporate jointly learnt task-adaptive embeddings, investigate more expressive reward models like Bayesian neural networks, or limit updates to just a subset of the generative model.

You can also read Quantum Enhanced Markov Chain Monte Carlo MCMC Method

Tags

ETH ZürichFine tuningFine tuning llmFine-TuningLLM fine tuningSampling thompsonThompson sampling (TS)Thompson Sampling Via Fine-TuningToSFiT

Written by

Agarapu Naveen

Naveen is a technology journalist and editorial contributor focusing on quantum computing, cloud infrastructure, AI systems, and enterprise innovation. As an editor at Govindhtech Solutions, he specializes in analyzing breakthrough research, emerging startups, and global technology trends. His writing emphasizes the practical impact of advanced technologies on industries such as healthcare, finance, cybersecurity, and manufacturing. Naveen is committed to delivering informative and future-oriented content that bridges scientific research with industry transformation.

Post navigation

Previous: NMI-Q Initiative Unites G7 & Australia for Quantum Standards
Next: Researchers Investigate Rydberg Atoms QRC For AI Systems

Keep reading

QbitSoft

Scaleway & QbitSoft Launch European Quantum Adoption Program

4 min read
USC Quantum Computing

USC Quantum Computing Advances National Security Research

5 min read
SuperQ Quantum Computing Inc. at Toronto Tech Week 2026

SuperQ Quantum Computing Inc. at Toronto Tech Week 2026

4 min read

Leave a Reply Cancel reply

You must be logged in to post a comment.

Categories

  • Scaleway & QbitSoft Launch European Quantum Adoption Program Scaleway & QbitSoft Launch European Quantum Adoption Program May 23, 2026
  • USC Quantum Computing Advances National Security Research USC Quantum Computing Advances National Security Research May 23, 2026
  • SuperQ Quantum Computing Inc. at Toronto Tech Week 2026 SuperQ Quantum Computing Inc. at Toronto Tech Week 2026 May 23, 2026
  • WISER and Fraunhofer ITWM Showcase QML Applications WISER and Fraunhofer ITWM Showcase QML Applications May 22, 2026
  • Quantum X Labs Integrates Google Data for Error Correction Quantum X Labs Integrates Google Data for Error Correction May 22, 2026
  • SEALSQ and IC’Alps Expand Post-Quantum Security Technologies SEALSQ and IC’Alps Expand Post-Quantum Security Technologies May 21, 2026
  • MTSU Events: Quantum Valley Initiative Launches with MTE MTSU Events: Quantum Valley Initiative Launches with MTE May 20, 2026
  • How Cloud Quantum Computers Could Become More Trustworthy How Cloud Quantum Computers Could Become More Trustworthy May 20, 2026
  • Quantinuum Expands Quantum Leadership with Synopsys Quantum Quantinuum Expands Quantum Leadership with Synopsys Quantum May 20, 2026
View all
  • QeM Inc Reaches Milestone with Q1 2026 Financial Results QeM Inc Reaches Milestone with Q1 2026 Financial Results May 23, 2026
  • Arqit Quantum Stock News: 2026 First Half Financial Results Arqit Quantum Stock News: 2026 First Half Financial Results May 22, 2026
  • Sygaldry Technologies Raises $139M to Quantum AI Systems Sygaldry Technologies Raises $139M to Quantum AI Systems May 18, 2026
  • NSF Launches $1.5B X-Labs to Drive Future Technologies NSF Launches $1.5B X-Labs to Drive Future Technologies May 16, 2026
  • IQM and Real Asset Acquisition Corp. Plan $1.8B SPAC Deal IQM and Real Asset Acquisition Corp. Plan $1.8B SPAC Deal May 16, 2026
  • Infleqtion Q1 Financial Results and Quantum Growth Outlook Infleqtion Q1 Financial Results and Quantum Growth Outlook May 15, 2026
  • Xanadu First Quarter Financial Results & Business Milestones Xanadu First Quarter Financial Results & Business Milestones May 15, 2026
  • Santander Launches The Quantum AI Leap Innovation Challenge Santander Launches The Quantum AI Leap Innovation Challenge May 15, 2026
  • CSUSM Launches Quantum STEM Education With National Funding CSUSM Launches Quantum STEM Education With National Funding May 14, 2026
View all
  • QTREX AME Technology May Alter Quantum Hardware Connectivity QTREX AME Technology May Alter Quantum Hardware Connectivity May 23, 2026
  • Quantum Spain: The Operational Era of MareNostrum-ONA Quantum Spain: The Operational Era of MareNostrum-ONA May 23, 2026
  • NVision Inc Announces PIQC for Practical Quantum Computing NVision Inc Announces PIQC for Practical Quantum Computing May 22, 2026
  • Xanadu QROM Innovation Ends Seven-Year Quantum Memory Stall Xanadu QROM Innovation Ends Seven-Year Quantum Memory Stall May 22, 2026
  • GlobalFoundries Quantum Computing Rise Drives U.S. Research GlobalFoundries Quantum Computing Rise Drives U.S. Research May 22, 2026
  • BlueQubit Platform Expands Access to Quantum AI Tools BlueQubit Platform Expands Access to Quantum AI Tools May 22, 2026
  • Oracle and Classiq Introduce Quantum AI Agents for OCI Oracle and Classiq Introduce Quantum AI Agents for OCI May 21, 2026
  • Kipu Quantum: Classical Surrogates for Quantum-Enhanced AI Kipu Quantum: Classical Surrogates for Quantum-Enhanced AI May 21, 2026
  • Picosecond low-Power Antiferromagnetic Quantum Switch Picosecond low-Power Antiferromagnetic Quantum Switch May 21, 2026
View all
  • Terra Quantum Quantum-Secure Platform for U.S. Air Force Terra Quantum Quantum-Secure Platform for U.S. Air Force May 23, 2026
  • Merqury Cybersecurity and Terra Quantum’s Secured Data Link Merqury Cybersecurity and Terra Quantum’s Secured Data Link May 23, 2026
  • ESL Shipping Ltd & QMill Companys Fleet Optimization project ESL Shipping Ltd & QMill Companys Fleet Optimization project May 23, 2026
  • Pasqals Logical Qubits Beat Physical Qubits on Real Hardware Pasqals Logical Qubits Beat Physical Qubits on Real Hardware May 22, 2026
  • Rail Vision Limited Adds Google Dataset to QEC Transformer Rail Vision Limited Adds Google Dataset to QEC Transformer May 22, 2026
  • Infleqtion Advances Neutral-Atom Quantum Computing Infleqtion Advances Neutral-Atom Quantum Computing May 21, 2026
  • Quantinuum News in bp Collaboration Targets Seismic Image Quantinuum News in bp Collaboration Targets Seismic Image May 21, 2026
  • ParityQC Achieves 52-Qubit Quantum Fourier Transform on IBM ParityQC Achieves 52-Qubit Quantum Fourier Transform on IBM May 21, 2026
  • PacketLight And Quantum XChange Inc Optical Network Security PacketLight And Quantum XChange Inc Optical Network Security May 21, 2026
View all
  • Quantum Computing Funding: $2B Federal Investment in U.S Quantum Computing Funding: $2B Federal Investment in U.S May 22, 2026
  • Quantum Bridge Technologies Funds $8M For Quantum Security Quantum Bridge Technologies Funds $8M For Quantum Security May 21, 2026
  • Nord Quantique Inc Raises $30M in Quantum Computing Funding Nord Quantique Inc Raises $30M in Quantum Computing Funding May 20, 2026
  • ScaLab: Advances Quantum Computing At Clemson University ScaLab: Advances Quantum Computing At Clemson University May 19, 2026
  • National Quantum Mission India Advances Quantum Innovation National Quantum Mission India Advances Quantum Innovation May 18, 2026
  • Amaravati Leads Quantum Computing in Andhra Pradesh Amaravati Leads Quantum Computing in Andhra Pradesh May 18, 2026
  • Wisconsin Technology Council Spotlights Quantum Industries Wisconsin Technology Council Spotlights Quantum Industries May 18, 2026
View all

Search

Latest Posts

  • Scaleway & QbitSoft Launch European Quantum Adoption Program May 23, 2026
  • Terra Quantum Quantum-Secure Platform for U.S. Air Force May 23, 2026
  • Merqury Cybersecurity and Terra Quantum’s Secured Data Link May 23, 2026
  • USC Quantum Computing Advances National Security Research May 23, 2026
  • QTREX AME Technology May Alter Quantum Hardware Connectivity May 23, 2026

Tutorials

  • Quantum Computing
  • IoT
  • Machine Learning
  • PostgreSql
  • BlockChain
  • Kubernettes

Calculators

  • AI-Tools
  • IP Tools
  • Domain Tools
  • SEO Tools
  • Developer Tools
  • Image & File Tools

Imp Links

  • Free Online Compilers
  • Code Minifier
  • Maths2HTML
  • Online Exams
  • Youtube Trend
  • Processor News
© 2026 Quantum Computing News. All rights reserved.
Back to top