Blog

Agentic AI Can Supercharge Biomedical Research

But Only If You Choose the Right Agentic Tools for Your Team. Here Are Four Key Considerations.

Agentic AI Can Supercharge Biomedical Research

Listen to this article:

0:00
0:00

A few years ago, during the GenAI boom that brought ChatGPT into the mainstream, biomedical researchers in the Life Sciences and Healthcare industries considered AI mainly in the form of chatbots that could aid their daily lab routines. The technology has moved quickly, to say the least, and today the AI paradigm not only includes chatbots aiding humans but also complex AI entities called agents interacting with each other, autonomously enhancing the work of human researchers—and sometimes even doing the work for them. This agentic technology has opened a whole new world: What kind of previously unconsidered medical possibilities can we now unlock? 

With more than half of our clients in Healthcare and Life Sciences, Loka’s engineers hear the same question over and over: We want to implement AI within our workflows, but which agentic framework should we use? Choosing the right framework is not a trivial decision. The wrong choice can compromise compliance and the overall viability of the AI system.

We’ve been working closely across drug discovery, genomics, and clinical research to implement purpose-built commercial platforms, academic research systems, and general-purpose engineering frameworks. After engaging with hundreds of companies and advising them on best practices for integrating genAI, we’ve arrived at four essential considerations for researchers who aspire to implement AI into their work. 

What Does Your Agent Need to Do?

Biomedical research spans a wide range of tasks, from literature review and hypothesis generation to experiment design, data analysis, and clinical research. Each of these tasks requires a different architectural framework. We've seen clients use simple architectures for complex workflows and complex architectures for simple workflows, mostly because they don’t have a clear view of the tasks they want to automate with their agentic system. Choosing the wrong architecture for the task is one of the most common reasons production deployments fail.

For knowledge-intensive, single-domain tasks like hypothesis generation, a simpler single-agent architecture (where the agent reasons, acts, and observes results in a loop) is usually enough. 

For multi-step, tool-heavy workflows like autonomous experiment design, you'll need a more complex multi-agent system, where specialized agents handle different parts of the pipeline. 

To align your architecture with project requirements during the ideation phase, address the following parameters:

  1. Does the task require different sub-agents instantiated with distinct roles for specific tasks, such as planner, researcher or analyst? (This point is particularly crucial when you need to implement multi-step workflows.)
  2. How available and how versatile are your tools (e.g., online vs. offline, model context protocol (MCP) vs. command line interface (CLI)?
  3. Which LLM will be appropriate for the given tasks (e.g., planning tasks with Opus-level, analysis with Sonnet-level or finding particular entries with Haiku-level models)?
  4. How will you deal with memory and persistence, e.g., how will results from various hypotheses be stored and compared so that the agents don’t fall in a reasoning loop? How will experiments be logged (e.g., AWS S3, relational database, MLflow)?

What Are the Core Challenges for Production Deployment?

Deploying an agentic AI system is significantly more challenging than launching it in a demo. In our experience, successfully designing and deploying AI agents in the biomedical field requires addressing three core challenges.

  1. Define and trust the right benchmarks. 

Many of our clients look to use biomedical agents for incredibly specific research tasks, and thus are hesitant to trust more generic benchmarks, even those tailored toward expert-level evaluations (e.g., GPQA Diamond). Without the right evaluation, you can't know how reliable your agent truly is, so we work with our clients to find the best evaluation pipeline tailored to their specific use-case. 

Public benchmarks might not be ideal for these clients’ use cases, given that agent performance is highly sensitive to how tools are described, prompts are structured, and errors are handled. That said, we suggest that our clients go a step further by creating their own evaluation framework. You want your agent to address the specificities of their business or use case, so why not devote some effort into the creation of an internal benchmark? After creating and deploying the first version of the agent, they can assemble an internal team of testers to use the system and push its limits. This way, they can catch unexpected behaviors or discover edge cases that should be fixed. After a round of tests, they can then iterate with their development team and repeat the cycle. This process will allow them to create an internal ground-truth dataset to validate their models.

  1. Understand and adhere to relevant compliance requirements. 

Due to the rapidly evolving nature of AI, regulations across countries have not kept up with technological advancements. Hence, before deploying any agentic system, teams should perform a compliance assessment to first understand the relevant requirements specific to their use case(s), and to implement the right logging, traceability, and human-in-the-loop mechanisms to ensure adherence based on the relevance of their use case. For instance, Loka’s dedicated compliance team works with our clients to help them understand their project requirements before moving into any implementation step.

  1. Integrate upstream and downstream. 

Even when the agent is trustworthy and compliant, implementing biomedical agents directly into real workflows often requires complex upstream and downstream integrations, just like any other IT project. From integration with complex lab equipment to downstream datastreams to partners, regulatory agencies, and patients, engineers should work closely with cloud and data engineers to ensure that biomedical agents work in your company’s specific business and technology context.

  1. Manage change. 

While many of our clients, especially researchers, like to push the boundaries of what’s possible with the latest tools, coming to trust and understand the limitations of systems that they do not fully understand or control can be challenging. We’ve found that iterative rollouts, done closely in conjunction with key adopters within your organization, work best to help adjust people to a new, accelerated way of working. Additionally, emphasizing how agents can augment rather than replace researchers is critical to supporting adoption.

What Is Your Starting Point?

Before looking at the available frameworks, it's worth turning the lens inward. The right choice for your team depends less on the tool itself and more on where your organization stands today: Who will use the agent, your data readiness and your engineering capacity, and how well defined are your workflows?

Readiness Tiers

Choosing the right agentic AI framework requires aligning your team's readiness and capabilities. Based on your team’s technical expertise, the right solution might be ready for immediate deployment or it might require customization to fit your needs. A four-tier readiness model helps match the complexity of the framework to your team's expertise:

  1. What You See Is What You Get: A fully-managed platform with a graphical user interface (GUI) that requires no coding, suitable for lab researchers or technicians
  2. Low-Code/API-accessible: A fully managed framework that can be integrated into in-house workflows, requiring low effort on setup, suitable for computational biologists comfortable with Python who want more flexibility
  3. Technical Setup Required: A framework that demands ML/DevOps expertise for production-ready systems
  4. Expert/Research Mode: A framework which requires significant domain adaptation and deep expertise for specialized capabilities 

Remember: Choosing a framework that is mismatched to your team’s technical capabilities is a fast route to a stalled pilot.

Deployment Environment

Along with poor choice of framework, another common failure we see in agentic AI adoption is deployment into an unprepared environment. Before a full implementation, you should conduct an in-house analysis focusing on two key aspects:

  1. Data readiness is critical, as agentic systems are only as good as the data they can access. Clean, versioned, and queryable data is a prerequisite.
  2. ML engineering capacity must be sufficient, as even mature frameworks require careful tool design, prompt engineering (which will require additional user training), and error handling. Treating framework adoption as configuration rather than software engineering is a common mistake.

Process Clarity

Finally, success depends on scientific process clarity, risk tolerance, and governance. Workflows that agents automate most successfully are already well-defined as discrete steps; if the task relies on expert judgment, a human-in-the-loop design is necessary. Furthermore, larger organizations must align deployments with quality management systems and data governance policies. Most biomedical teams are advised to begin with augmentation (agents assist researchers), progress to automation (agents execute defined workflows), and then, with strong ML teams, potentially move toward autonomy (agents adaptively pursue research goals).

What Frameworks Are Currently Available?

The technology behind agentic AI is continuously evolving, so keeping up with the state of the art is essential. Following best practices and developing applications with the most updated frameworks, which usually come with novel or enhanced features as well as recent security guardrails, ensures that we maintain safety and trust. Staying up to date is harder than it sounds, because the publication of novel platforms and frameworks has accelerated dramatically in the last few years. Below are just a few of the most relevant examples of available frameworks.

If you're looking for a ready-to-use tool with minimal setup that your team can open in a browser and start using today, Edison and Phylo are the obvious starting points. Both offer conversational interfaces with built-in tools for literature research, hypothesis generation, and protocol development, and both have free tiers that allow you to explore your specific use cases.

If you want to build a more custom solution with specific security requirements, you might need something with more control over where your data goes. In that case, a framework that runs within your own infrastructure (e.g., Biomni, ChemCrow, CRISPR-GPT, K-Dense BYOK or the AI Scientist) is worth the additional setup effort. For teams already on AWS, the natural path is to deploy within that ecosystem (e.g., Claude for Life Sciences enhanced by BioMCP). You can still query external APIs, but your proprietary data never leaves your environment.

Naturally, like in many other AI use cases, there is no “one size fits all” rule here. Depending on your in-house requirements, you might even need or want to go for a truly customized solution. In that case, the “divide and conquer” motto will be key: Approach the problem with a modular view and check which modules require more technical setup and which ones can fall back into straightforward API calls.

The Real-World Example: Meet ADDA, Loka’s Biomedical Agent

Everything we've discussed so far—task scoping, architecture matching, data compliance, readiness tiers—came into play when we built ADDA, or Advanced Drug Discovery Assistant, an LLM-based agent we built to accelerate therapeutic discovery. Built in LokaLabs, our internal R&D incubator, ADDA is based on lessons we learned after working with dozens of biotech companies looking to solve similar challenges (e.g., de novo generation of compounds, exploration of new molecular structures, or predicting the functional traits of proteins or nucleic acids).

Rather than replacing researchers, ADDA augments them, retrieving and interacting with internal and external databases, executing predictive models, and generating research outputs, all triggered by natural language prompts. As shown in this demo, a researcher can ask ADDA a complex question about a target protein and receive a synthesis of relevant literature, model predictions, and suggested next steps in seconds rather than days. 

Designing ADDA required the same decisions we delineated in this article: choosing the right architecture for a multi-step, data-heavy workflow; keeping proprietary data within a compliant cloud environment; and designing human checkpoints into the pipeline from day one, not as an afterthought. To achieve all these requirements, we built ADDA using Strands Agents, AWS's open-source agentic framework, running on Amazon Bedrock. Under the hood, it connects to biological databases like GEO (Gene Expression Omnibus), KEGG PATHWAY, and UniProt for evidence lookup; runs protein folding via ESMFold, performs ligand docking via DiffDock, and generates novel molecules with MolMIM, all within a compliant AWS environment where proprietary data never leaves your infrastructure. ADDA is a practical example of what the right architecture looks like when task complexity, data compliance, and scientific rigor all have to coexist.

Whatever kind of biomedical field you’re involved in, agentic AI can potentially  enhance your workflows, allow you to prototype faster and gain new information with fewer errors and less cost. To make this work easier, we proposed an agent assembly line that can accelerate the development of agentic AI use cases from months to weeks. 

The AI field is moving fast. Today’s research prototype often becomes production infrastructure within 12–18 months. The teams that benefit most from adopting AI are the ones that start with a well-scoped pilot, instrument it properly from day one, and build human oversight into the architecture rather than retrofitting it later. And the ones that choose the right AI implementation partner. With more than 250 AI projects put into production since 2023 Loka knows how to build, deploy, and iterate on these systems, helping biomedical businesses move faster, avoid costly missteps, and get to meaningful results without reinventing the wheel at every turn.

Loka's syndication policy

Free and Easy

Put simply, we encourage free syndication. If you’re interested in sharing, posting or Tweeting our full articles, or even just a snippet, just reach out to medium@loka.com. We also ask that you attribute Loka, Inc. as the original source. And if you post on the web, please link back to the original content on Loka.com. Pretty straight forward stuff. And a good deal, right? Free content for a link back.

If you want to collaborate on something or have another idea for content, just email me. We’d love to join forces!