• Nextool AI
  • Posts
  • GeneBench-Pro reveals the hard part of AI science

GeneBench-Pro reveals the hard part of AI science

Plus: Anthropic’s Mythos 5 returns after security concerns

In partnership with

OpenAI, Anthropic, and Acti all point to the same shift in AI: the race is moving from raw intelligence to real-world usefulness. OpenAI’s GeneBench-Pro shows that models still struggle with messy scientific judgment. Anthropic’s Mythos news shows that powerful AI now comes with government scrutiny. And Acti’s keyboard shows where AI may actually live next: inside the tools people already use every day.

In today’s post:

  • AI still struggles with scientific judgment

  • Anthropic just got a second chance

  • AI is moving into the keyboard

SPONSORED BY

HR and IT need to work as one. Here's how

Every missed onboarding step, delayed offboard, or broken provisioning handoff has a root cause: HR and IT aren't aligned. This guide gives both teams a shared framework for the full employee lifecycle.

What’s Trending Today

LAUNCH

OpenAI’s new biology benchmark reveals what AI still cannot fake

Image Credits: OpenAI.

OpenAI just introduced GeneBench-Pro. It is a benchmark for testing AI agents in computational biology. But the real story is not biology. It is judgment. Because science rarely arrives as a clean prompt. The hard part is knowing what the data means. And knowing when it does not mean enough.

  • GeneBench-Pro tests whether AI can handle messy scientific work, not just recall facts or follow steps.

  • Each task gives the model unclear data, context, and a decision that depends on careful analysis.

  • The benchmark focuses on what OpenAI calls “research taste,” which means choosing the right question, method, and moment to revise.

  • This matters because real researchers constantly separate signal from noise before making conclusions.

  • The strongest model, GPT-5.6 Sol, passed 28.7% of tasks at its highest reasoning level.

  • With Pro mode, that result rose to 31.5%, which is impressive but still far from reliable.

  • Human experts estimated each problem could take 20 to 40 hours to solve, showing how difficult these tasks are.

This is a useful reminder. AI is getting better at doing the work. But the harder question is whether it understands the work. In science, a correct answer is not enough. The path matters. The doubt matters. The decision to stop, revise, or question the data matters. That is where expertise still lives. GeneBench-Pro shows progress. It also shows the gap. And that gap may define the next phase of AI.

BREAKTHROUGH

The US lifted Anthropic’s AI export ban, but the warning still matters

Image Credits: BBC

Anthropic is back in motion. The US government has lifted its export ban on Claude Fable 5 and Mythos 5. That sounds like a win for Anthropic. But it is also a signal. Advanced AI is no longer just a product race. It is becoming a trust race. And governments are watching closely.

  • The US had suspended access to Fable 5 and Mythos 5 on June 12 over national security concerns.

  • Officials worried the models could help hackers find and exploit weaknesses in computer systems.

  • Anthropic says it will now restore access after the Commerce Department lifted the restrictions.

  • The company agreed to proactively detect security risks tied to these models.

  • Anthropic will also work with the government on future releases and report malicious activity.

  • Fable 5 is built for consumers, while Mythos 5 is aimed at businesses and cybersecurity experts.

  • The real concern is not just what these models can do, but who can use them and how.

This is not a clean victory for Anthropic. It is a preview of how AI will be governed. The most powerful models will not be judged only by benchmarks. They will be judged by risk, control, and cooperation. That may slow some releases. It may also make serious AI companies stronger. Because in the next era, capability alone will not be enough. Trust will become part of the product.

STREATEGY

Acti wants your keyboard to become the next AI interface

Image Credits Acti

Acti is betting on a simple idea. People do not want another app. They want help where they already are. So instead of building another chatbot, Acti built an AI keyboard. It works inside messages, email, social apps, and more. That makes the keyboard feel less like a typing tool. And more like an action layer.

  • Acti launched an AI-powered keyboard for iOS and Android that can take actions inside the apps people already use.

  • The company says this reduces the constant app-switching that happens when people need AI help mid-conversation.

  • If someone asks for a restaurant nearby, Acti can suggest one without forcing the user to open another app.

  • If a stock comes up in a chat, Acti can share the live price directly inside the conversation.

  • The product runs on Google’s Gemini models, chosen for speed, reliability, multilingual support, and cost.

  • Acti also has “Skills,” which let users trigger tasks like translation or meeting links by long-pressing a keyboard key.

  • The company says personal context stays on the device by default, unless a user invokes a feature needing external processing.

This is one of the more practical AI ideas. Not because it sounds futuristic. Because it removes friction. Most AI tools still ask users to leave their flow. Acti tries to meet them inside it. That matters. The next big AI interface may not look like a chatbot. It may look like something boring. The winners will not just make AI smarter. They will make it easier to use without thinking.

Free Guides

My Free Guides to Download:

🚀 Founders & AI Builders, Listen up!

If you’ve built an AI tool, here’s an opportunity to gain serious visibility.

Nextool AI is a leading tools aggregator that offers:

  • 500k+ page views and a rapidly growing audience.

  • Exposure to developers, entrepreneurs, and tech enthusiasts actively searching for innovative tools.

  • A spot in a curated list of cutting-edge AI tools, trusted by the community.

  • Increased traffic, users, and brand recognition for your tool.

Take the next step to grow your tool’s reach and impact.

That's a wrap:

Please let us know how was this newsletter:

Login or Subscribe to participate in polls.

Reach 150,000+ READERS:

Expand your reach and boost your brand’s visibility!

Partner with Nextool AI to showcase your product or service to 140,000+ engaged subscribers, including entrepreneurs, tech enthusiasts, developers, and industry leaders.

Ready to make an impact? Visit our sponsorship website to explore sponsorship opportunities and learn more!