登录查看更多内容

Cleora.ai - Swiss Army knife - essential element of systems operating on data in the form of a network of connected nodes.

Jaroslaw Krolewski

synerise.com | basemodel.ai | cleora.ai | wislakrakow.com | agh.edu.pl

发布日期: 2021年3月2日

+ 关注

We created Cleora, one of the fastest graph-embedding algorithms in existence. How was it created and what is the purpose of this project? How can open-source projects grow with the help of the community? How can cleora.ai accelerate the implementation of AI in modern companies?

What's the story behind Cleora's origins? How many people worked on it and for how long?

Since the inception of the AI team at Synerise, our ambition has been to quickly and easily process giant heterogeneous interactive data. Existing libraries, such as StarSpace, Node2Vec, DeepWalk, or various graph convolutional networks, did not meet our requirements.

Each of them had a drawback, like very slow performance, impractical limitation of the maximum graph size or unsatisfactory quality of the results. We needed a solution that would allow us to quickly and accurately calculate graph embeddings with millions of vertices and billions of edges to represent user behavior. Several months of waiting for a calculation result was unacceptable for us.

The first version of Cleora was created at the beginning of 2019 and was implemented in Scala. It was quickly apparent that the tool successfully replaced all existing graph embedding libraries.

In the next iteration, at the beginning of 2020, in addition to optimizing the algorithm, we decided to get rid of JVM. The entire solution was rewritten from Scala to Rust, thanks to which we have more control over memory and processor consumption, and the speed more than doubled.

Initially, a team of several people was involved in creating Cleora. Its development gave us additional opportunities to create a number of solutions based on it, including generation of recommendations, scoring, segmentation and various predictions.

The experience gathered by the entire AI team allowed us to make Cleora what it is today, a universal and reliable "Swiss Army knife" for computing graph embeddings.

What is the purpose of this tool and how can Cleora help entrepreneurs? Who should definitely be interested in using it?

Cleora is one of the fastest graph embedding algorithms in existence. It is an essential element for systems operating on data in the form of a network of connected nodes. These are recommendation systems, systems which predict the existence of connections between users in social media (e.g like / follow), or even systems predicting the biological functions of protein networks, which allows for the creation of new drugs.

No wonder then that such algorithms are created by digital giants such as Facebook and Google, creating a number of new solutions each year. However, Cleora has a significant advantage over these algorithms.

First, it is much faster. Secondly, it does not require specialized hardware (e.g. GPUs for acceleration of calculations) and, in addition, produces high quality embedding vectors. This means that systems (e.g. recommenders) using Cleory may run faster and with greater accuracy.

Cleora is capable of processing graphs of hundreds of millions of nodes. In social networks, one node usually corresponds to a single user, so Cleora can be used to process datasets on a global scale, at the level of the number of users of the largest social networking sites such as Twitter.

The release of the software under an open-source license means that from now on, either a company, an individual or a research institution can use Cleora for any purpose. We recommend Cleora when working with large graphs, especially in conditions of limited computing power. The implementation is available on GitHub.

Cleora is expected to work 8 times faster than the PyTorch-BigGraph created by Facebook, and Synerise itself has been appreciated by Microsoft. Has the time come for Polish companies to start to make their mark on the international arena?

In the scientific sphere, the Synerise team achieved significant success by winning the Rakuten Data Challenge competition at the SIGIR (Special Interest Group on Information Retrieval) conference. The subject of the competition was creating recommendations in e-commerce, and the organizers included Tracy H. King (Adobe), Shervin Malmasi (Amazon), Dietmar Jannach (University of Klagenfurt), Weihua Luo (Alibaba), Surya Kallumadi (The Home Depot).

The Synerise publication on methods for detecting the most important features of products which determines the user's interest, also appeared in the materials of the ICML 2020 conference. A few months later, at the ICONIP 2020 conference, our article was presented describing a model which recommends similar clothes based on a photo gallery from producers and users.

What is the plan for further Synerise activities?

Our research goal is to enable automatic and very efficient processing of various data sources that are owned by our clients, both in terms of the quality of the results and the calculation time.

Graph algorithms can process interactive data typically found in banking, telecommunications, and e-commerce ecosystems. However, there are many other types of data such as images, text, sounds and structured data, and any company looking to improve its performance must be able to seamlessly synthesize all of its data into a form that allows easy and instant real-time predictions.

The business priority for Synerise is now international expansion to Western and Middle Eastern markets.

Why is Cleora open-source? What are the benefits of opening the project to the community?

Activities of this type bring companies a lot of publicity, especially if the published tool is of high quality, i.e. it is quick, easy to run and comprehensive, and when it is offered under an open license allowing for commercial use. This is the case with Cleora.

Synerise wants to stimulate knowledge sharing, following the example of digital giants like Google and Facebook who publish some of their solutions. At the same time, we do not see companies on the Polish technology market, even very large ones, that would share their knowledge in an equally open way.

Of course, Google or Facebook can afford to publish some of their property for free by being a monopoly in their market. However, Cleora is just one of many proprietary technologies being developed at Synerise, so we do not think that its publication will have a negative impact on our company.

One should also remember that the use of this type of tools is not easy (although we tried to make Cleora as easy to use as possible) and often in practice it will require assistance in the form of consulting, so we also treat it as a potential source of new clients.

Cleora underpins some parts of our ecosystem and we continuously implement solutions based on it in many companies. Opening the source code has a positive effect on transparency and increases trust in AI solutions, which for many people are still something incomprehensible.

The transparent approach has many advantages. Scientists employed by our clients may use Cleora to carry out corporate or personal projects to better understand the principle of operation or to validate our performance claims.

The recruitment aspect should not be underestimated either. High-quality open-source often becomes an element that attracts the most ambitious candidates.

The advantage of innovation hubs such as Silicon Valley is largely due to the synergy effect - creating a friendly environment for sharing ideas and inspiring each other. We have taken a bold step in this direction. Opening the Cleora code is an important experiment for us, the consequences of which are sure to come back to us through various endeavors. We will observe them with curiosity and draw conclusions. Within days after publication, the first volunteer contributors have already appeared to bring their improvements to Cleora, which makes us very happy.

How can open-source software contribute to the development of a project?

The publication of the open-source code allows the development of an informal group of users involved in the development of the tool by introducing their own ideas and improvements. In this way, Cleora has a chance to permanently appear in the catalog of graph embedding solutions, as it is constantly developed and updated.

Of course, this is only possible if the tool is interesting for the community and provides a source of inspiration. We encourage all Rust and Python developers to contribute to our solution, which is and will remain open. Any suggestions for improvements to the project can be submitted via GitHub.

Are there any risks related to sharing your projects as free software? Why do so few companies decide to make such a move?

A very important goal of many companies is the protection of intellectual property, understood as a strategic resource. From the company's perspective, revealing a key tool’s key code often means losing a competitive edge.

However, the advantage of Synerise is not based on one unique technology, but on the synergy of many proprietary solutions, concentrated in one ecosystem. Therefore, we believe that the disclosure of a single or even several tools is not a threat to us. The most important reason for such a small number of similar initiatives may be the fact that most companies in our region use technologies created by someone else and profit from implementations.

What can we expect in connection with the development of artificial intelligence in business?

We can expect a gradual elimination of secondary and non-creative activities, as well as the superhuman possibilities of synthesizing giant data sets. Companies will invest in more and more AI services to improve their operation, i.e. searching for target groups, targeting properly matched advertisements, accelerating processes inside companies, or accelerating communication between technical support departments and customers.

The artificial intelligence industry is currently at a very early stage, but the time will come to consolidate it in a few years. Then the players with the most universal and comprehensive offer will remain on the market.

Which processes in the future could be automated with the help of artificial intelligence? Can we talk about self-operating companies?

It depends on the type of industry, but automation and artificial intelligence allow you to achieve better and better results with less and less effort required on the part of human effort.

Probably in a few years we will be able to expect the first, initially very simple "self-operating" companies.

Of course, just like on board of a jet, the human pilot will oversee their operation for many years to come, but thus he will be able to focus his attention on truly creative problems and innovation, instead of performing unambitious and repetitive tasks. Fully automatic grocery stores are being tested even today.

How is artificial intelligence affecting our lives right now?

Machine learning has been around for decades in hedge funds, banks and other parts of the financial sector. For over a dozen years, they have penetrated into ever wider branches of the economy.

The boom in recommendation systems in e-commerce companies is already responsible for over 30% of its turnover. Without artificial intelligence, companies of this type today have no chance to compete with market leaders.

The increasing adoption of AI solutions allows, above all, to automate the most tedious and labor-intensive activities, not only those that have been performed by people so far, but also those that, due to the enormous amount of work, have so far been beyond the reach of humanity.

We have an opportunity to observe the next gigantic achievements from impressive models generating natural text on any topic, such as GPT-3 (OpenAI), to models simulating protein folding such as AlphaFold (DeepMind) with unprecedented accuracy. Despite their relative conceptual simplicity, since these models do not yet have much in common with "human intelligence," such advances will revolutionize entire fields of science, and soon also industry.

And how does AI affect the everyday life of the average person?

AI is everywhere today. Our phone collects data about the pages viewed and displays personalized ads. GPS data analysis systems track the routes on which we travel. We have tools for recognizing human speech (eg Siri, Google Assistant), making translations (language translations), as well as advanced image analysis (thanks to this, we have the ability to unlock the phone using a scan of your own face).

What does the artificial intelligence market look like today? Is it easy to find specialists?

The artificial intelligence market is paradoxical today.

On the one hand, there are many applicants who are expressing great interest in machine learning and AI. Deep learning in particular, as a newly created area, is of great interest. On the other hand, universities practically ignored this field until recently. For this reason, most specialists with more than 6-8 years of experience are self-taught. There are very few of them and that is why they are in high demand.

However, not all work related to the field of artificial intelligence requires the participation of a specialist. The vast majority of companies use solutions created and implemented by others, for which even a general knowledge of programming and data analysis is sufficient.

What skill would you distinguish as needed to work on artificial intelligence?

There are many positions in the work on AI that can be divided into two main groups: research and implementation.

Research positions require knowledge in the field of mathematics, understanding the internals of machine learning models and being up to date with the latest developments in this rapidly growing field by reading scientific publications, blogs, and reports. In addition, programming skills in languages particularly useful for modeling AI tools are required, where the most common choices are Python or R. Experience in publishing and knowledge in the scientific world is an additional advantage.

In implementation positions, an important part of the job are skills related to big data processing. In this case, we emphasize knowledge of databases and technologies dedicated specifically to Big Data or streaming data such as Hadoop, Spark, Hive, Kafka, Clickhouse. We attach great importance to the ability to write high-quality code in languages such as Scala, Java, C++ and Python.

Interview with Barbara Rychalska

要查看或添加评论，请登录

Jaroslaw Krolewski的更多文章

Synerise Monad: Apply science to behavioral data. Automatically.

2022年7月6日

Synerise Monad: Apply science to behavioral data. Automatically.

Deploying AI effectively requires extensive data processing, maintaining separate batch and real-time data flows, and…

1 条评论
How Synerise AI Team challenge the Transformer.

2022年5月31日

How Synerise AI Team challenge the Transformer.

Originally published at sair.synerise.

3 条评论
Synerise Cleora sets new standards in identifying substitutes and complementary products.

2021年4月8日

Synerise Cleora sets new standards in identifying substitutes and complementary products.

Finding similar products or products that complement each other represents one of the most critical challenges in…

1 条评论
AI for good: Cleora.AI created by Synerise in Biomedical Sciences.

2021年2月22日

AI for good: Cleora.AI created by Synerise in Biomedical Sciences.

Artificial Intelligence for drug development in medicine is very hot topic nowadays - which definitely will…

2 条评论
Deconstruction of fake #AI Benchmarks - Recommender Systems Case Study

2021年2月20日

Deconstruction of fake #AI Benchmarks - Recommender Systems Case Study

We have recently spent a lot of time to create & deliver top-notch AI-driven solutions and products in Synerise. Only…

2 条评论
Synerise open-sourcing Cleora AI framework for ultra-fast embeddings in large graphs

2020年11月6日

Synerise open-sourcing Cleora AI framework for ultra-fast embeddings in large graphs

More than 197x faster than DeepWalk, ~4x-8x faster than Pytorch-BigGraph by Facebook. We are open sourcing Cleora AI…

8 条评论
Synerise Terrarium - a massive scale in-memory & disk storage built from scratch

2020年9月17日

Synerise Terrarium - a massive scale in-memory & disk storage built from scratch

Terrarium is a column and row store engine designed specifically for behavioral intelligence, real-time data…

4 条评论
Synerise business continuity during COVID-19: a message for our people, clients, partners and suppliers

2020年3月19日

Synerise business continuity during COVID-19: a message for our people, clients, partners and suppliers

Right from the start, we have built Synerise as an organization that can fully support our clients and partners…

3 条评论
From mass surveillance to fashion advice - can consumer AI benefit from surveillance research?

2020年3月11日

From mass surveillance to fashion advice - can consumer AI benefit from surveillance research?

We've recently published a preprint of our paper with answer on that question. In the last 3 years, there's been a…

1 条评论
How Synerise collaborates with Microsoft to stop the guessing game in retail

2019年11月5日

How Synerise collaborates with Microsoft to stop the guessing game in retail

Relentless competition that pushes prices and margins down, changing and unpredictable consumer tastes and the need to…

3 条评论

See all articles

Cleora.ai - Swiss Army knife - essential element of systems operating on data in the form of a network of connected nodes.

Jaroslaw Krolewski

synerise.com | basemodel.ai | cleora.ai | wislakrakow.com | agh.edu.pl

We created Cleora, one of the fastest graph-embedding algorithms in existence. How was it created and what is the purpose of this project? How can open-source projects grow with the help of the community? How can cleora.ai accelerate the implementation of AI in modern companies?

Jaroslaw Krolewski的更多文章

社区洞察

其他会员也浏览了

Navigating the Future of AI: Insights from The AI Summit London 2024

??Top ML Papers of the Week

AI at scale: Managing ML models over time & across use cases

AI/ML Digest | Issue 35

The Latest AI News You Might Have Missed

ANR Industry News Highlights [August 2023]

Delving into the Algorithmic Renaissance: How Developers are Architecting Tomorrow with AI, APIs, and Intelligent Agents

Why We Use Julia in Our AI Startup

AI/ML news summary: week 27

Unraveling the Future: Emerging Trends in Data Science

We created Cleora, one of the fastest graph-embedding algorithms in existence. How was it created and what is the purpose of this project? How can open-source projects grow with the help of the community? How can cleora.ai accelerate the implementation of AI in modern companies?

Jaroslaw Krolewski的更多文章

Synerise Monad: Apply science to behavioral data. Automatically.

How Synerise AI Team challenge the Transformer.

Synerise Cleora sets new standards in identifying substitutes and complementary products.

AI for good: Cleora.AI created by Synerise in Biomedical Sciences.

Deconstruction of fake #AI Benchmarks - Recommender Systems Case Study

Synerise open-sourcing Cleora AI framework for ultra-fast embeddings in large graphs

Synerise Terrarium - a massive scale in-memory & disk storage built from scratch

Synerise business continuity during COVID-19: a message for our people, clients, partners and suppliers

From mass surveillance to fashion advice - can consumer AI benefit from surveillance research?

How Synerise collaborates with Microsoft to stop the guessing game in retail

社区洞察

其他会员也浏览了

Navigating the Future of AI: Insights from The AI Summit London 2024

??Top ML Papers of the Week

AI at scale: Managing ML models over time & across use cases

AI/ML Digest | Issue 35

The Latest AI News You Might Have Missed

ANR Industry News Highlights [August 2023]

Delving into the Algorithmic Renaissance: How Developers are Architecting Tomorrow with AI, APIs, and Intelligent Agents

Why We Use Julia in Our AI Startup

AI/ML news summary: week 27

Unraveling the Future: Emerging Trends in Data Science