登录查看更多内容

January 01, 2025

Kannan Subbiah

FCA | CISA | CGEIT | CCISO | GRC Consulting | Independent Director | Enterprise & Solution Architecture | Former Sr. VP & CTO of MF Utilities | BU Soft Tech | itTrident

发布日期: 2025年1月1日

The Architect’s Guide to Open Table Formats and Object Storage

Data lakehouse architectures are purposefully designed to leverage the scalability and cost-effectiveness of object storage systems, such as Amazon Web Services (AWS) S3, Google Cloud Storage and Azure Blob Storage. ... Data lakehouse architectures are purposefully designed to leverage the scalability and cost-effectiveness of object storage systems, such as Amazon Web Services (AWS) S3, Google Cloud Storage and Azure Blob Storage. This integration enables the seamless management of diverse data types — structured, semi-structured and unstructured — within a unified platform. ... The open table formats also incorporate features designed to boost performance. These also need to be configured properly and leveraged for a fully optimized stack. One such feature is efficient metadata handling, where metadata is managed separately from the data, which enables faster query planning and execution. Data partitioning organizes data into subsets, improving query performance by reducing the amount of data scanned during operations. Support for schema evolution allows table formats to adapt to changes in data structure without extensive data rewrites, ensuring flexibility while minimizing processing overhead.

The future of open source will be messy

First, it’s important to point out that open source software is both pervasive and foundational. Where would we be without Linux and the vast treasure trove of other open source projects on which the internet is built? However, the vast majority of software, written for use or sale, is not open source. This has always been true. Developers do care about open source, and for good reason, but it is not their top concern. As Redis CEO Rowan Trollope told me in a recent interview, “If you’re the average developer, what you really care about is capability: Does this [software] offer something unique and differentiated that’s awesome that I need in my application.” ... Meanwhile, Meta and the rest of the industry keep releasing new code, calling it open source or open weights (Sam Johnston offers a great analysis), without much concern for what the OSI or anyone else thinks. Johnston may be exaggerating when he says, “The more [the word] open appears in an artificial intelligence product’s branding, the less open it actually tends to be,” but it’s clear that the term open gets used a lot, starting with category leader OpenAI, which is not open in any discernible sense, without much concern for any traditional definitions.?

What’s next for generative AI in 2025?

“Data is the lifeblood of any AI initiative, and the success of these projects hinges on the quality of the data that feeds the models,” said Andrew Joiner, CEO of Hyperscience, which develops AI-based office work automation tools. “Alarmingly, three out of five decision makers report their lack of understanding of their own data inhibits their ability to utilize genAI to its maximum potential. The true potential…lies in adopting tailored SLMs, which can transform document processing and enhance operational efficiency.” Gartner recommends that organizations customize SLMs to specific needs for better accuracy, robustness, and efficiency. “Task specialization improves alignment, while embedding static organizational knowledge reduces costs. Dynamic information can still be provided as needed, making this hybrid approach both effective and efficient,” the research firm said. ... While Agentic AI architectures are a top emerging technology, they’re still two years away from reaching the lofty automation expected of them, according to Forrester. While companies are eager to push genAI into complex tasks through AI agents, the technology remains challenging to develop because it mostly relies on synergies between multiple models, customization through retrieval augmented generation (RAG), and specialized expertise.?

领英推荐

Understanding Batch and Real-Time Processing in…

Scrumconnect Consulting 1 年前

Deploying SingleStore on Kubernetes for GenAI and RAG…

Kunal Kushwaha 6 个月前

The Future of Data-Driven Ecosystems: Cloud Platforms,…

Jay S. 2 个月前

The Perils of Security Debt: Serious Pitfalls to Avoid

Security debt is caused by a failure to “build security in” to software from the design to deployment as part of the SDLC. Security debt accumulates when a development organization releases software with known issues, deferring the redressal of its weaknesses and vulnerabilities. Sometimes the organization skips certain test cases or scenarios in pursuit of faster deployment and in the process failing to test software thoroughly. Sometimes the business decides that the pressure to finish a project is so great that it makes more sense to release now and fix issues later. Later is better than never, but when “later” never arrives, existing security debt becomes worse. ... Great leadership is the beacon that not only charts the course but also ensures your crew – your IT team, support staff, and engineers – are well-prepared to face the challenges ahead. It instills discipline, vigilance, and a culture of security that can withstand the fiercest digital storms. The Board and leadership must understand and champion the importance of security for the organization. By setting the tone at the top, they can drive the cultural and procedural changes needed to prevent the accumulation of the security debt. Periodic review and monitoring of security metrics, and identifying & tracking security debt as a risk can help keep the organization accountable and on track.

The long-term impacts of AI on networking

Every enterprise who self-hosted AI told me the mission demanded more bandwidth to support “horizontal” traffic than their normal applications, more than their current data center needed to support. Ten of the group said that this meant they’d need the “cluster” of AI servers to have faster Ethernet connections and higher-capacity switches. Everyone agreed that a real production deployment of on-premises AI would need new network devices, and fifteen said they bought new switches even for their large-scale trials. The biggest problem with the data center network I heard from those with experience is that they believed they built up more of an AI cluster than they needed. Running a popular LLM, they said, requires hundreds of GPUs and servers, but small language models can run on a single system, and a third of current self-hosting enterprises said they believed it is best to start small, with small models, and build up only when you had experience and could demonstrate a need. This same group also pointed out that control was needed to ensure only truly useful AI applications where run. “Applications otherwise build up, exceed, and then increase, the size of the AI cluster,” said users.?

Bridging Skill Gaps in the Automotive Industry with AI-Led Immersive Simulations

This crisis of personnel shortfall is particularly acute in sectors like autonomous driving and AI-driven manufacturing, where the required skillset surpasses the capabilities of the current workforce. This alarming shortage of specialised expertise poses a serious threat to the industry’s progress. It could potentially lead to production halts at various facilities, delay the launch of next-generation vehicles, and hinder the transition to self-driving cars powered by sustainable energy. In order to address this issue, orthodox educational methods must be modernised to incorporate cutting-edge technologies like AI and robotics. ... Unlike traditional training, which often involves static lessons or expensive hands-on practice, immersive simulations allow workers to practice in environments that would be too risky or costly in real life. For example, with autonomous vehicles, workers can practice fixing and calibrating vehicle systems in a virtual world without the risk of damaging anything. These simulations can also create different road conditions for workers to experience, helping them build critical decision-making skills without real-world consequences.?

Read more here ...

Today's Tech Digest

9,245 位关注者

要查看或添加评论，请登录

Kannan Subbiah的更多文章

March 06, 2025

2025年3月6日

March 06, 2025

RIP (finally) to the blockchain hype Fowler is not alone in his skepticism about blockchain. It hasn’t yet delivered…
March 05, 2025

2025年3月5日

March 05, 2025

Zero-knowledge cryptography is bigger than web3 Zero-knowledge proofs have existed since the 1980s, long before the…
March 04, 2025

2025年3月4日

March 04, 2025

You thought genAI hallucinations were bad? Things just got so much worse From an IT perspective, it seems impossible to…
March 03, 2025

2025年3月3日

March 03, 2025

How to Create a Winning AI Strategy “A winning AI strategy starts with a clear vision of what problems you’re solving…
March 02, 2025

2025年3月2日

March 02, 2025

Weak cyber defenses are exposing critical infrastructure — how enterprises can proactively thwart cunning attackers to…
March 01, 2025

2025年3月1日

March 01, 2025

Two AI developer strategies: Hire engineers or let AI do the work Philip Walsh, director analyst in Gartner’s software…
Februrary 28, 2025

2025年2月28日

Februrary 28, 2025

Microservice Integration Testing a Pain? Try Shadow Testing Shadow testing is especially useful for microservices with…
February 27, 2025

2025年2月27日

February 27, 2025

Breach Notification Service Tackles Infostealing Malware Infostealers can amass massive quantities of credentials. To…
February 26, 2025

2025年2月26日

February 26, 2025

Deep dive into Agentic AI stack The Tool / Retrieval Layer forms the backbone of an intelligent agent’s ability to…
February 25, 2025

2025年2月25日

February 25, 2025

Service as Software Changes Everything Service as software, also referred to as SaaS 2.0, goes beyond layering AI atop…

See all articles

January 01, 2025

Kannan Subbiah

FCA | CISA | CGEIT | CCISO | GRC Consulting | Independent Director | Enterprise & Solution Architecture | Former Sr. VP & CTO of MF Utilities | BU Soft Tech | itTrident

领英推荐

Today's Tech Digest

9,245 位关注者

Kannan Subbiah的更多文章

社区洞察

其他会员也浏览了

Unlocking the Full Potential of RAG with MongoDB Vector Search

Which database is best for machine learning?

OpenAI acquires Rockset - a tribute to my friends at Rockset, coupled with personal insights on data processing strategies

Scale with a K.I.S.S: Keep It Simple, Stupid

The Unexplored and Hidden Potential of Elasticsearch

Choosing the right Azure Vector Database

Optimizing Your Data Pipeline with BigQuery: Iceberg Tables, NLP, and Beyond.

Understanding Databases like Graph, Vector, and Relational Databases with Real-World Examples

AZURE DATABRICKS

LLM Series Part 5 | How LLMs Can Chatify Your Database

领英推荐

Today's Tech Digest

9,245 位关注者

Kannan Subbiah的更多文章

March 06, 2025

March 05, 2025

March 04, 2025

March 03, 2025

March 02, 2025

March 01, 2025

Februrary 28, 2025

February 27, 2025

February 26, 2025

February 25, 2025

社区洞察

其他会员也浏览了

Unlocking the Full Potential of RAG with MongoDB Vector Search

Which database is best for machine learning?

OpenAI acquires Rockset - a tribute to my friends at Rockset, coupled with personal insights on data processing strategies

Scale with a K.I.S.S: Keep It Simple, Stupid

The Unexplored and Hidden Potential of Elasticsearch

Choosing the right Azure Vector Database

Optimizing Your Data Pipeline with BigQuery: Iceberg Tables, NLP, and Beyond.

Understanding Databases like Graph, Vector, and Relational Databases with Real-World Examples

AZURE DATABRICKS

LLM Series Part 5 | How LLMs Can Chatify Your Database