登录查看更多内容

October 02, 2024

Kannan Subbiah

FCA | CISA | CGEIT | CCISO | GRC Consulting | Independent Director | Enterprise & Solution Architecture | Former Sr. VP & CTO of MF Utilities | BU Soft Tech | itTrident

发布日期: 2024年10月2日

One of the most significant bottlenecks in training specialized AI models is the scarcity of high-quality, domain-specific data. Building enterprise-grade AI requires increasing amounts of diverse, highly contextualized data, of which there are limited supplies. This scarcity, sometimes known as the “cold start” problem, is only growing as companies license their data and further segment the internet. For startups and leading AI teams building state-of-the-art generative AI products for specialized use cases, public data sets also offer capped value, due to their lack of specificity and timeliness. ... Synthesizing data not only increases the volume of training data but also enhances its diversity and relevance to specific problems. For instance, financial services companies are already using synthetic data to rapidly augment and diversify real-world training sets for more robust fraud detection — an effort that is supported by financial regulators like the UK’s Financial Conduct Authority. By using synthetic data, these companies can generate simulations of never-before-seen scenarios and gain safe access to proprietary data via digital sandboxes.

Five Common Misconceptions About Event-Driven Architecture

Event sourcing is an approach to persisting data within a service. Instead of writing the current state to the database, and updating that stored data when the state changes, you store an event for every state change. The state can then be restored by replaying the events. Event-driven architecture is about communication between services. A service publishes any changes in its subdomain it deems potentially interesting for others, and other services subscribe to these updates. These events are carriers of state and triggers of actions on the subscriber side. While these two patterns complement each other well, you can have either without the other. ... Just as you can use Kafka without being event-driven, you can build an event-driven architecture without Kafka. And I’m not only talking about “Kafka replacements”, i.e. other log-based message brokers. I don’t know why you’d want to, but you could use a store-and-forward message queue (like ActiveMQ or RabbitMQ) for your eventing. You could even do it without any messaging infrastructure at all, e.g. by implementing HTTP feeds. Just because you could, doesn’t mean you should! A log-based message broker is most likely the best approach for you, too, if you want an event-driven architecture.

Mostly AI’s synthetic text tool can unlock enterprise emails and conversations for AI training

Mostly AI provides enterprises with a platform to train their own AI generators that can produce synthetic data on the fly. The company started off by enabling the generation of structured tabular datasets, capturing nuances of transaction records, patient journeys and customer relationship management (CRM) databases. Now, as the next step, it is expanding to text data. While proprietary text datasets – like emails, chatbot conversations and support transcriptions – are collected on a large scale, they are difficult to use because of the inclusion of PII (like customer information), diversity gaps and structured data to some level. With the new synthetic text functionality on the Mostly AI platform, users can train an AI generator using any proprietary text they have and then deploy it to produce a cleansed synthetic version of the original data, free from PII or diversity gaps. ... The new feature, and its ability to unlock value from proprietary text without privacy concerns, makes it a lucrative offering for enterprises looking to strengthen their AI training efforts. The company claims training a text classifier on its platform’s synthetic text resulted in 35% performance enhancement as compared to data generated by prompting GPT-4o-mini.

Open Data Science Conference (ODSC) 7 个月前

Why the modern data stack will fail, and how…

Prukalpa ? 1 年前

Unlocking the Trifecta of AI Value: Productivity…

Helen Yu 7 个月前

Not Maintaining Data Quality Today Would Mean Garbage In, Disasters Out

Enterprises are increasingly data-driven and rely heavily on the collected data to make decisions, says Choudhary. Also, a decade ago, a single application stored all its data in a relational database for weekly reporting. Today, data is scattered across various sources including relational databases, third-party data stores, cloud environments, on-premise systems, and hybrid models, says Choudhary. This shift has made data management much more complex, as all of these sources need to be harmonized in one place. However, in the world of AI, both structured and unstructured data need to be of high quality. Choudhary states that not maintaining data quality in the AI age would lead to garbage in, disasters out. Highlighting the relationship between AI and data observability in enterprise settings, he says that given the role of both structured and unstructured data in enterprises, data observability will become more critical. ... However, AI also requires the unstructured business context, such as documents from wikis, emails, design documents, and business requirement documents (BRDs). He stresses that this unstructured data adds context to the factual information on which business models are built.

Three Evolving Cybersecurity Attack Strategies in OT Environments

Attackers are increasingly targeting supply chains, capitalizing on the trust between vendors and users to breach OT systems. This method offers a high return on investment, as compromising a single supplier can result in widespread breaches. The Dragonfly attacks, where attackers penetrated hundreds of OT systems by replacing legitimate software with Trojanized versions, exemplify this threat. ... Attack strategies are shifting from immediate exploitation to establishing persistent footholds within OT environments. Attackers now prefer to lie dormant, waiting for an opportune moment to strike, such as during economic instability or geopolitical events. This approach allows them to exploit unknown or unpatched vulnerabilities, as demonstrated by the Log4j and Pipedream attacks. ... Attackers are increasingly focused on collecting and storing encrypted data from OT environments for future exploitation, particularly with the impending advent of post-quantum computing. This poses a significant risk to current encryption methods, potentially allowing attackers to decrypt previously secure data. Manufacturers must implement additional protective layers and consider future-proofing their encryption strategies to safeguard data against these emerging threats.

Mitigating Cybersecurity Risk in Open-Source Software

Unsurprisingly, open-source software's lineage is complex. Whereas commercial software is typically designed, built and supported by one corporate entity, open-source code could be written by a developer, a well-resourced open-sourced community or a teenage whiz kid. Libraries containing all of this open-source code, procedures and scripts are extensive. They can contain libraries within libraries, each with its own family tree. A single open-source project may have thousands of lines of code from hundreds of authors which can make line-by-line code analysis impractical and may result in vulnerabilities slipping through the cracks. These challenges are further exacerbated by the fact that many libraries are stored on public repositories such as GitHub, which may be compromised by bad actors injecting malicious code into a component. Vulnerabilities can also be accidentally introduced by developers. Synopsys' OSSRA report found that 74% of the audited code bases had high-risk vulnerabilities. And don't forget patching, updates and security notifications that are standard practices from commercial suppliers but likely lacking (or far slower) in the world of open-source software.?

Read more here ...

October 02, 2024

Kannan Subbiah

FCA | CISA | CGEIT | CCISO | GRC Consulting | Independent Director | Enterprise & Solution Architecture | Former Sr. VP & CTO of MF Utilities | BU Soft Tech | itTrident

领英推荐

Today's Tech Digest

9,213 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

There is No Good AI Without a Good Data Strategy

Databricks AI/BI Series: A Technical Overview of AI/BI Genie

Leveraging Advanced AI Technologies for Enhanced Data Engineering in the Enterprise: A Comprehensive Summary

Synthetic Data Generation: Unlocking the Potential of Artificial Data

Building Generative AI applications with Databricks

When Big Data met AI

Unifying Enterprise Data for Generative AI

Is Your Data Strategy Ready for Generative AI?

Biggest data moments last month: AI giants, BI tools, and data security alerts

How operational databases are fueling the AI revolution

领英推荐

Today's Tech Digest

9,213 位关注者

November 26, 2024

2024年11月26日

November 25, 2024

2024年11月25日

November 24, 2024

2024年11月24日

November 23, 2024

2024年11月23日

November 22, 2024

2024年11月22日

November 21, 2024

2024年11月21日

November 20, 2024

2024年11月20日

November 19, 2024

2024年11月19日

November 18, 2024

2024年11月18日

November 17, 2024

2024年11月17日