The Evolution of OpenClaw: From Hype to Reality

OpenClaw has transitioned from a trend to a practical tool, reshaping AI interactions and workflows in the workplace.

The Evolution of OpenClaw

OpenClaw has sparked a trend in AI-assisted tasks, transitioning from a phase of excitement to a more pragmatic approach. Some users, initially drawn to the novelty of AI-driven tasks, have returned to tools like DeepSeek, ChatGPT, and Gemini due to unmet expectations. Others have integrated lobsters into their workflows for enhanced efficiency, while some have sought more advanced capabilities with Hermes Agent, shifting from lobsters to horses.

This evolution reflects a changing perception of AI capabilities.

First, OpenClaw has shifted user expectations from mere conversation to actionable tasks. Before OpenClaw, AI was primarily a chatbot providing answers and suggestions, requiring human intervention for execution. Post-OpenClaw, users realized that AI could not only suggest actions but also execute them autonomously, leaving users to await results.

Second, OpenClaw has introduced new collaborative methods. The essence of using lobsters is to cultivate an operational system for an open AI era. Users can communicate with their lobsters in natural language, assigning tasks that the lobsters execute through CLI calls to various applications and skills. This allows for more flexible and efficient workflows, fostering a new way of working.

Finally, OpenClaw has dismantled invisible barriers in internet applications, initiating competition around new system-level barriers. OpenClaw serves as a collaborative system and the foundational support for an open ecosystem. Applications that have become isolated in the internet era no longer fit this new open ecosystem, suggesting that skills may be a more suitable application model for native AI systems.

Although OpenClaw may not represent the ultimate form of native AI systems, it offers a new starting point for building operational systems that cater to the interaction needs of the AI era, moving beyond the constraints of the internet age. Major Chinese internet companies, cloud providers, office software, and hardware manufacturers are all participating in refining and adapting to this new operational system, seeking their ecological niche in the AI era.

In recent months, our focus on lobsters has highlighted the rapid follow-up by major internet companies, the proactive adaptation by cloud providers, and the impact of lobsters on hardware. This holiday, we compiled articles related to lobsters as follows:

Lobsters Reveal Tencent’s AI Strategy

In the rapid surge of the lobster trend post-Chinese New Year, Tencent has been the most proactive among major internet companies. In March, they quickly launched products related to lobsters, such as Workbuddy, QClaw, and Lighthouse, along with the SkillHub mirror site. Tencent’s CEO, Ma Huateng, has publicly supported these lobster products multiple times on WeChat. Tencent has also opened QQ and WeChat as entry points for lobster products.

By April, Tencent continued to push forward. They enhanced QClaw with a multi-agent mechanism and connectors for third-party applications, and introduced QBotClaw, which allows browsers to perform tasks. They also began supporting users in creating dedicated agents within their knowledge base product, ima. More of Tencent’s products are evolving from chat-based AI to operational AI.

On the event front, Tencent is actively promoting lobsters through nationwide tours and competitions. At the Beijing station of the Penguin Lobster Friend Competition, we observed various practical and interesting skills derived from lobsters. For instance, a participant distilled 18 years of experience in the construction industry into a “Landscape Chief Review Assistant Skill” to enhance review efficiency, while another created the “He Shen Skill” to quantify and analyze workplace nuances.

Tencent’s proactive approach aims to leverage the connectivity advantages of its IM products, optimizing user experience to transition into the agent era. It also lays the groundwork for a future version of WeChat that incorporates agent capabilities, potentially evolving its mini-program ecosystem into a new environment composed of agents and skills.

If OpenClaw emerged at a critical juncture of breakthrough in large model capabilities, presenting a simpler, more executable personal agent product form—IM + memory-capable lobsters + callable hardware + personal databases + rich skills—then Tencent has the potential to establish a new operational system rooted in various hardware through the combination of WeChat and personal agents.

Wukong: Alibaba’s First Step Towards Token Economy

On March 19, Alibaba CEO Wu Yongming announced that in the coming years, they aim to exceed $100 billion in external revenue from cloud and AI (including Tongyi Qianwen). He initiated the establishment of the Alibaba Token Hub (ATH) business group, consolidating Tongyi Lab, MaaS, Qianwen, Wukong, and AI Innovation departments into a unified force for the AI era.

The launch of “Wukong” marks the first major action of the ATH group. The DingTalk of the internet era has been restructured into a token-driven enterprise-level AI-native work platform, Wukong. DingTalk’s CEO Wu Zhao believes this transformation enables a product that allows AI to programmatically operate various capability modules to complete tasks.

Wukong can be seen as Alibaba’s B-end lobster product. It integrates with DingTalk, transforming existing capabilities into skills to allow AI to understand and invoke them directly through CLI (command-line interface). To support this CLI transformation, Wukong has developed an AI-native file system called RealDoc. RealDoc can retain the context of AI reasoning, thinking, decision-making, and problem execution, ensuring traceability.

Wukong also aims to connect with more hardware and is building its skill marketplace. Alibaba can convert its B-end service capabilities from Taobao, Tmall, Alipay, and Alibaba Cloud into skills for Wukong, laying the foundation for a B-end skill market. Subsequently, they can attract enterprises to transform their internal workflows into skills for distribution in this market.

For Alibaba, Wukong is part of tokenization. The layout of the entire ATH group is planned along the chain of model research and development, model invocation, and scene implementation using tokens. Under the concept of agentic computing, the ATH group can become a cohesive entity, ensuring that the capabilities of Tongyi Lab’s models smoothly serve the iterative development of AI products like Qianwen and DingTalk, while also realizing a complete token usage chain and fostering the carrier of agentic internet.

Byte’s Volcano Engine: Dual Spiral of Agility and Stability for Agents

Byte’s Volcano Engine emphasizes a dual spiral structure of “agility” and “stability” for enterprises using lobster scenarios, supporting the implementation of this structure internally with updates to ArkClaw and Hi Agent.

The agile agent focuses on exploration, aiming to solve individual productivity issues by quickly realizing ideas from employees’ minds using lobster-type products, creating an AI innovation lab within the enterprise. The stable agent emphasizes process management, addressing cost, efficiency, and risk to solve organizational productivity issues and achieve scalable use. Some agile agents validated as AI best practices can be transformed into stable agents.

Volcano Engine believes that utilizing lobsters effectively is becoming another demand explosion point beyond AI videos. The proliferation of lobster-type products has increased token consumption for personal productivity enhancement and complex task handling. Volcano Engine hopes that ArkClaw will not just be a tool but become the core hub of the digital ecosystem for individuals and enterprises, helping users transition from “using AI tools” to “owning their personal intelligent systems.”

Compared to casual users of lobsters, we observed at the “Feishu AI Pioneer Competition” that manufacturing companies like BAIC Foton, Dongfeng Yipai, and SKG are already using lobsters to optimize existing workflows, distilling job skills and enhancing information flow efficiency.

In the MaaS market, while competition has focused on stronger model capabilities, lower costs, and richer application ecosystems, there exists an underlying competition to help enterprises complete their AI transformation. Cloud providers need to assist enterprises in finding a rapid embrace of change and a methodology for executing AI transformation.

Lobster Products Must Become Truly Deliverable Systems

OpenClaw represents a technological path that gives agents a clear profile distinct from chatbots: a personalized soul, always online, proactively executing tasks, seemingly capable of managing everything. Established powers aim to maintain their entry points, while new forces seek to leverage this opportunity.

Currently, this competition has not diminished with the waning popularity of OpenClaw; rather, it has evolved into a long-distance race to explore the optimal agent experience. Tech companies are actively constructing the infrastructure needed to support the stable and secure operation of lobster-type products while seeking specific scenarios where these products can enter more swiftly, equating the use of lobsters with productivity enhancement.

The consensus behind the lobster battle is that Coding Agents are becoming the foundational operating system for the new generation of agents. The focus of competition is the delivery completeness of Coding Agents—who can integrate LLMs, Coding Agents, and Harness Engineering into a truly deliverable system.

This competition revolves around two key actions: the first is the revival of CLI—connecting the old and new worlds. GUI serves the interaction between humans and software, while CLI serves the interaction between software and agents; the second action is the collaboration between humans and agents—either through a universal platform hosting many skills covering numerous vertical scenarios or through multiple entry points and diverse vertical agents forming an ecosystem similar to the current app landscape.

This scenario resembles the long-standing route debate in the autonomous driving industry: the L4 faction advocates for achieving full automation in one step, while the L2 faction supports human-machine co-driving and gradual evolution. Ultimately, L4 defines the imaginative space for direction, while L2 wins the real market. The reason lies not in L2’s technological superiority but in its pragmatic handling of the trust relationship that requires time to build between humans and machines.

Lobsters Bring AI-Native Interactions to Hardware

The changes brought by lobsters extend beyond software; they are also influencing the AI transformation of hardware.

Wu Zhao has stated that DingTalk CLI will be embedded in all executable entities, existing in all execution bodies, including smartphones, computers, smartwatches, and all IoT devices, enabling all hardware to be controlled and managed through CLI. This means that human-computer interaction can be mediated by AI, achieved through natural language.

For smartphone manufacturers, as users move away from relying on graphical interfaces and touch operations to complete tasks, and as the endpoint of interaction becomes a dialogue box, the smartphone—central to human digital life for nearly two decades—must also evolve. Smartphones need to adapt to new interaction methods, building an operating system more suited for the agent era, rather than using an “app-era mindset” to handle agent-era tasks.

Smartphone manufacturers’ explorations of AI can be categorized into three types: enhancing specific capabilities; building AI execution add-ons for Android’s graphical interface; and smartphone Claw. For smartphone manufacturers, constructing a new AI OS based on interaction experience presents an opportunity to break free from the ecological barriers imposed by internet giants, instead attracting them to participate in building a more open agent system by reconstructing interaction experiences and rules.

This is also the most direct manifestation of the system-level barriers mentioned at the beginning of the article. When interaction points are ubiquitous, and applications are segmented into more personalized skills, what can bind users is a systematic, smooth experience. In the era of smartphones, Apple established such a system-level barrier that spans software and hardware; in the agent era, who will seize the opportunity to establish such a systemic experience?

Was this helpful?

Likes and saves are stored in your browser on this device only (local storage) and are not uploaded to our servers.

Comments

Discussion is powered by Giscus (GitHub Discussions). Add repo, repoID, category, and categoryID under [params.comments.giscus] in hugo.toml using the values from the Giscus setup tool.