AI agents are poised to make significant progress this year. They will improve tool usage, context understanding, coding assistance, and safety. Enterprises can also expect advancements in data integration, multi-agent orchestration, observability, and ROI tracking. These developments will expand AI’s impact across industries.

The AI Daily Brief helps you understand the most important news and discussions in AI.

Welcome back to the AI Daily Brief today We are doing something a little bit Different that I'm quite excited about It is no secret at this point that 2025 Is definitely the year of Agents or at Least the year of agent pilots and Basically since chat gbt came out or Very soon thereafter people were already Racing ahead to the possibility of Autonomous agents that were actually Executing using AI tools on people's Behalf so that instead of just having a Smart assistant you could actually have Employees a team an Army in fact working For you allowing you to do more things And the first attempts at this although Very hyped and Incredibly exciting to People things like autog gbt and baby AGI from all the way back in April 2023 Just were very very limited in what they Could do in fact agents have been Limited in what they could do all the Way through up until now still in the Back half of last year we saw a lot of Very specific agents start to come to Market Salesforce announced their agent Force and that's basically all CEO mark Benof would talk about for the last Quarter of the Year Google announced Agent space in December which went Beyond just a framework for building Agents and actually started to provide Out of the box agent experiences we had The nudges towards agentic from

Companies like open AI who were clearly Seeing their 01 and 03 reasoning models As a step in that direction and we had Anthropic showing off computer use a way That agents could actually start to Manipulate and interact with websites in The same way that humans would all of This has created a scenario where for Many many big companies 2025 will be the First year that they experiment and do Proof of Concepts in the agent space Some of the areas that we anticipate to Be the most common for that to happen Will be customer service and coding but There are many other examples as well And yet we also anticipate that this Will happen in fits and starts many of The things that big companies will hope They can do with agents just won't be Quite ready yet in fact right now we're In the middle of deploying what we're Calling our agent Readiness audit we Started advertising for this at the very End of last year and have been just Absolutely inundated with companies who Want to figure out their agent strategy And one of the things that is clearest Is that companies who have very clear And defined expectations are likely Going to do better with these pilots and Have a better experience than those who Come in assuming that agents can do Everything that they imagine they could Right now but with all of that said an

Important thing to remember is that this Is the worst that agents will ever be Recently in the MIT technology review Anthropics Chief scientist Jared Kaplan Gave four ways that he believes agents Will be even better over the course of This year what we're going to do today Is go through his four ways agents will Get better and then we're going to add Four of my own first up Kaplan believes Agents will get much better at using Tools he said I think there are two axes For thinking about what AI is capable of One is a question of how complex the Task is that a system can do and as AI Systems gets smarter they're getting Better in that direction but another Direction that's very relevant is what Kind of environments or tools they can Use we were excited about computer use Basically for that reason until recently With llms it's been necessary to give Them a very specific prompt give them Very specific tools and then they're Restricted to a specific kind of Environment what I see is that computer Use will probably improve quickly in Terms of how well models can do Different tasks and more complex tasks And also to realize when they've made Mistakes or realize when there's a high Stakes question and it needs to ask the User for feedback in short tools are Going to be a key way that agents

Actually get more autonomous and Generalizable next up Kaplan suggests That agents will better understand Context anthropic recently introduced New features to train Claude to use a Particular tone a writing guide making It all the more useful in business Settings doing something similar with Agents might mean being able to apply Set of business logic industry context Regulatory environment Etc to the agent Kaplan said I think we'll see Improvements there where Claude will be Able to search through things like your Documents your slack Etc and really Learn what's useful for you that's Underemphasized a bit with agents it's Necessary for systems to be not only Useful but also safe doing what you Expected this is definitely what the big Players are promising a big part of the Value proposition for Google agent space As they frame it is that these agents Which are way more out of the box than Their previous Frameworks have access to All the information that makes your Company run they write Google agent Space offers pre-built connectors for The most commonly used applications in The Enterprise so you can save time when You need quick answers or actions by Doing them right from your agent space Experience Kaplan also pointed out that Recognizing context would mean cutting

Down on resource use he pointed out that Reasoning models shouldn't need to think Very hard to open a Word document Commenting I think that a lot of what We'll see is not just more reasoning but The application of reasoning when it's Really useful important but also not Wasting time when it's not necessary Kaplan's thirdd prediction is a very Specific use case agency say will make Coding assistance better developer Assistance is definitely a breakout use Case not only of gen but agents now as Well Kaplan said my expectation is that We'll see further improvements to coding Assistance that's something that's been Very exciting for developers there's Just a ton of interest in using Claude 3.5 for coding where it's not just Autocomplete like it was a couple of Years ago it's really understanding What's wrong with code debugging it Running the code seeing what happens and Fixing it and lastly Kaplan points to Something that he seems to think is a Necessity which is that agents will need To be made safe he said we found it Anthropic because we expected AI to Progress very quickly and thought that Inevitably safety concerns were going to Be relevant I think that's just going to Become more and more visceral this year Because I think these agents are going To become more and more integrated into

The work we do we need to be ready for Challenges like prompt injection prompt Injection refers to the ability to sneak Prompts past guard rails he continued Prompt injection is probably one of the Number one things with thinking about in Terms of broader usage of Agents I think It's especially important for computer Use and it's something we're working on Very actively because if computer use is Deployed at Large Scale there could be Pernicious websites or something that Try to convince Claud to do something That it shouldn't now one of the things That was really interesting when Anthropic announced computer use was That that's something that people have Been historically concerned about Sil Anthropic seems to be on a similar page At least to open AI in the idea that the Best way to figure out how this is all Going to play out is to release very Incrementally and try to let people Adapt and see how AI interacts in the Real world so those are kaplan's Suggestion for how agents are going to Get better this year but as I said I Wanted to add a few of my own and once Again these come out of the now dozens Of agent Readiness audits that we are Currently engaged in so one way in which Agents will get better which is really Kind of an extension perhaps of Understanding context is better data

Organizations are hyper aware right now And have a very strong belief that a big Determining factor in how well AI works For them is going to be how good their Data is and how prepared it is to be Used by that AI in kpmg's Q4 pulse Survey about AI which surveyed about 100 Executives from firms with a billion Dollars or more in Revenue those people Actually identified the quality of Organizational data as the biggest Challenge for their gen strategy in 2025 85% said that they expected it to be a Big challenge compared to for example 71% who pointed to data privacy and Cyber security this is something that We're seeing as well organizations are Very very conscientious and aware of how They need to improve their data to make It more accessible for Gen and Specifically agents given how much of a Focus that is I think that that context That Jared was talking about won't just Be from these casual plugins to existing Data sources but will also interact with Real significant Enterprise efforts to Make data agent ready next up Orchestration and multi-agent systems Right now a lot of the use cases where People can reasonably do proof of Concepts for agents are very very Specific single agent kind of workflows In fact the agents that most people will Be testing for at least the first half

Of this year and probably most of the Year in total are fairly close to what Might have previously been called an Automation still everyone knows that This is just an incremental step towards Where they're really trying to go which Is agents that are capable of taking on Complex tasks from end to end without a Human moving them from one step to the Next those sort of multi-agent systems Require orchestration and this is one of The most fertile categories for agent Infrastructure development right now Companies like emergence are working Furiously on platforms that allow agents To come together to do much more complex Tasks than would be available in the Past and I think in addition to seeing Those very specific and singular agent Proof of Concepts we're also going to See Enterprise leaders start to get a Little bit more sophisticated and Actually hack at these multi-agent Systems I anticipate that in 2025 it Will only be the Vanguard of Enterprises Especially those who have a little bit More in the way of Technical Resources Internally but that won't be the case Forever next is sort of a catchall Observability evaluations and Infrastructure basically the tooling Around agents is going to get a heck of A lot better this year as well you are Starting to see purpose built platforms

For things like observability start to Emerge observability in this case refers To the idea of having full visibility Into what the agent is actually doing so That you can see how it's working and More particularly where it's not working And what got it stuck if you hang out on Agent Twitter one of the things you'll Hear a lot is Agent companies griping About the fact that Enterprise customers Who are trying agents don't want to Think think about evaluations but once Again I anticipate that to be something The third party platforms start to Normalize and make much easier for them And just in general right now a huge Amount of the development effort and Entrepreneurial effort frankly around Agents is going into developer tooling And more simply put just trying to make The agents actually work as well as all The business people think they should be Working however one of the things that I Anticipate in 2025 is a massive Explosion in the infrastructure and Deployment support specifically focused On business use of Agents that's Obviously a place that super intelligent Is playing around as an AI Transformation and workforce management Platform and I think that we are going To be far from alone in those Endeavors lastly let's talk about ROI ROI has had a very interesting place

When it comes to AI for the last couple Of years Roi is never far away from the Conversation when you talk to people who Are in charge of AI transformation and Yet it has not been a barrier to Adoption at this moment what I mean by That is that companies have such a Strong sense that these AI tools are so Powerful that they will inevitably make Their workforces work better that the Fact that gen tools maybe have a little Bit more trouble exactly explaining Their own Roi hasn't slowed down Adoption in fact there has been such a Push for adoption that Roi has been Shunted backward as a thing we'll figure Out later think about it this way for The last 2 years if you were a CEO What's more likely to get you fired Saying we're not exactly sure what the ROI of gen is so we're going to hang Back and let people figure it out before We get into the game or diving in head First and saying we don't know how to Measure it yet but we're fairly Convinced there's Roi there and we want To be out ahead figuring out the use Cases and how it actually benefits our Business now it's not even a question The idea of slowing down because Roi Measures haven't been clear hasn't Really been on the agenda even a little Bit at the same time it lurks around the Corner and part of I believe why agents

Are so explosive right now in the Marketplace is that they have an Implicit Roi built into them if an agent Works it does a certain task or set of Tasks for much cheaper than the Equivalent human labor Full stop now it is an entirely separate Question of what organizations choose to Do with those savings this gets back to Our frequent conversation about the Efficiency era of AI or doing the same With less versus the opportunity era of AI which is not about cost savings but About reinvesting those savings into Building fundamentally different more Innovative and better services and Products but still the point remains That agents will do certain tasks and Categories of tasks much faster cheaper And eventually better than their human Equivalents and if your robot does task X for a tenth of the cost of the Equivalent human labor there is Roi Right there and this I think explains a Huge amount of why agents are so Attractive and so on the agenda for 2025 and yet there is a big gap between The implicit knowledge that there is Roi If these things work and actually Tracking and measuring it and I expect This to be a huge opportunity for Companies and startups in the space to Actually help Enterprises with and one That I anticipate many jumping into as

We are helping companies with this agent Readiness audit and then pilot support Which involves not only scoping and Partner selection but also monitoring And evaluation we are certainly thinking About how you measure or at least Estimate Roi in real time once again I Do not think we are going to be alone in Those Endeavors so that's my complete List combined me and anthropics chief Scientist clearly two very equivalent Perspectives when it comes to expertise In this area for those who aren't clear That is tongue and cheek once again Agents will get better at using tools Agents will understand context agents Will make coding assistance better and Agents will need to be made more safe And then my additions Enterprise efforts For better data orchestration and Multi-agent systems observability Evaluations and infrastructure and Roi Tracking for those of you who are in the Agent game let me know what you think if You want to talk about the Readiness Audit or Pilot support shoot me a note At nlw ATB super. and for all of you Listeners who are just along for the Ride join the conversation Spotify has Now started allowing comments and so far The discussion has been really cool Anyways guys that is going to do it for Today's episode appreciate you listening Or watching as always and until next

Time peace