The Art of Annotation: How Keymakr Addresses Complex Data Challenges
Keymakr Data Labeling
We create exceptional Training Datasets for Computer Vision AI and Machine Learning Models
Data annotation is a sophisticated, multi-faceted process that involves numerous intricate steps. As annotators aim for high-quality, precise datasets, they face many challenges. This work has become an art form. Success requires not only technical expertise and industry-specific knowledge but also creativity and an unconventional approach. Tetiana Verbytska, a technical solution architect at Keymakr, shares captivating case studies her team has worked on and reveals the backstage to solving non-standard tasks.
Key Point Marking Reduced Annotation Time by 90%
One of the most common tasks for the Keymakr team is to create high-quality data efficiently in terms of time and resources.?
Recently, we received an unusual request: a client wanted to create a digital map in which each small section would represent a natural feature — a rare plant species or animal habitat. This project allowed people to support nature conservation by selecting a section on the map and making a donation to environmental protection.
The task was to annotate over a million sections, each requiring precise boundary segmentation. Traditional annotation would take 10–15 seconds per section, resulting in substantial costs. The client would have needed hundreds of thousands of hours to complete the project.
We proposed an alternative solution, drastically reducing workload and budget. Instead of detailed annotation, we decided to simply mark each section with a key point, indicating its location. This key point marking took only one second per section, reducing the overall annotation time from 15 seconds to one second. The project duration fell from over 6,000 hours to about 420.?
This approach demonstrated how creative thinking and adapting annotation methods can reduce time and costs while maintaining high-quality results.
Simplifying the Process with Color and Style Categories
In complex projects with diverse data, one of the challenges is to create a way to accurately interpret varied descriptions of characteristics. Such a mechanism should simplify data while ensuring consistency and flexibility in search processes.
One notable example was a project where the goal was to create a system for finding furniture by photo. Users could upload an image from a showroom, website, or even just a screenshot. The goal is to find similar furniture that matches in size and features, like material, style, or unique design elements.
The client wanted maximum detail, distinguishing between a minimalist wooden table and an industrial-style table with metal accents. However, challenges arose with color shades. Different people and websites might describe the same color differently. For example, wood could be called "honey," "amber," or "golden oak," despite minimal differences.
Our solution was to suggest using color categories instead of precise shades. This way, if someone described a shade as "honey," the system could categorize it as "light wood," selecting the closest color category. This not only simplified the process but also made it more flexible. Each image's metadata would include a color category. This would ensure the system matched within the desired range, even if the shade varied.
This solution lets the client convey precise info to the user, despite color perception variations. The system could suggest furniture in the right color range, meeting expectations and simplifying the search task. This approach worked well, showing how to optimize processes in seemingly complex tasks.
Overcoming Object Tracking and Identification Challenges in Dynamic Videos
We always find sports annotation interesting due to the dynamics and speed of the videos involved.
In one case, the client needed to track the movement of each player in a soccer match using object detection and tracking, keeping stable bounding boxes around each player throughout the recording.
To solve this case, we used a pre-annotation approach for player tracking in each video. It reduced the time needed to manually annotate each object in each frame while providing a baseline accuracy level.
This is what it looked like, step by step.
This method cut annotation costs and time, as pre-annotation and real ID use minimized the need for manual adjustments. The client got quality data for analysis. Each player was identified on-screen, not just as "a person" on the field.
Data That Helps Models "Think" Like Humans
At the pre-POC stage, clients often come to us with specific pain points — challenges they seek our help to resolve. Among these are annotation inconsistencies, which can confuse and hinder model training. For example, one person might annotate a head and shoulders as visible, while another annotates only the head. Solving this requires high annotation consistency, and many companies come to us to check how we handle such tasks.
We also collaborate with Quality Match, and their software called HARI which simplifies the verification process and helps to reach a consensus. This system asks users simple questions and collects statistics on subjective cases. For example, on the question “Is this person balding?” most users might answer “yes,” but some may disagree. These data help assess the importance of a subjective case for a dataset. They decide if it should be included in the model being trained. This cuts verification costs. It also shows clients how conflicting data may affect outcomes and where to change their approach.
To summarize, our most significant challenge is finding accurate and detailed annotations. At Keymakr, we create annotations that reflect human perception, as computer vision is trained on human vision. The world needs models that "think" like humans, which requires a human perspective. This is why verification and annotation are crucial for training high-quality models. We continue to improve them by ourselves and in partnership with others to ensure maximum effectiveness.