Acheron Analytics的封面图片
Acheron Analytics

Acheron Analytics

IT 服务与咨询

关于我们

Machine learning and data science consulting. We solve business problems using industrial strength data science. Our mission is to enable our clients with the same big data and machine learning techniques that have made the world's largest enterprises successful. In addition to custom solutions, we can help assess and train your data practice and identify ways to improve the connections between your people, data, and business goals.

网站
https://www.acheronanalytics.com
所属行业
IT 服务与咨询
规模
2-10 人
类型
私人持股
领域
machine learning、data science、analytics、consulting、Analytics、Data Visualization、Strategic Planning和Auditing

Acheron Analytics员工

动态

  • Acheron Analytics转发了

    查看Benjamin Rogojan的档案

    Fractional Head Of Data | Reach Out For Data Infra And Strategy Consults

    Are you looking to learn more about data engineering, data science or ML this weekend? Here are 7 great articles(ok 6 articles + 1 video) that will help get you started in the right direction! 1. Common pitfalls when building generative AI applications by Chip Huyen https://lnkd.in/gYnrfz6A 2. The Ultimate Guide to Data Engineer Interviews by Xinran Waibel https://lnkd.in/gbm8kxzK 3. Build vs. Buy Your A/B Testing Platform? by Olga Berezovsky https://lnkd.in/gvpwhmj5 4. From data engineer to data scientist at Google - with Daliana Liu and Sundas Khalid https://lnkd.in/gNKNSftd 5. Don't Worry About LLMs by Vicki Boykis https://lnkd.in/gT639Auq 6. Managing Drift & Data Quality ?????? Mikiko B. https://lnkd.in/gh5HJqG6 7. Analytics Frameworks Every Data Scientist Should Know with Tessa Xie https://lnkd.in/gQ4hiTYp Any other favorites?

  • Acheron Analytics转发了

    查看Benjamin Rogojan的档案

    Fractional Head Of Data | Reach Out For Data Infra And Strategy Consults

    LLM terms you should know if you’re leading a data team. Embeddings - These are dense, low-dimensional vectors (arrays of numbers) that encode the meaning of text. The goal of embeddings is to position similar pieces of text (in meaning or context) close to each other in the vector space. For instance, the words "cat" and "kitten" would have embeddings that are numerically closer than "cat" and "car." Tokenization -?This refers to the process of breaking text into smaller units, called tokens, which can be words, subwords, or characters. LLMs use tokenization to convert text into a format they can process, often mapping tokens to unique numerical IDs. This is essential for handling? language efficiently, especially in models like GPT, where subword tokenization ensures rare words are split into meaningful parts. Fine-tuning -?In order to improve pre-trained LLM on a specific task or dataset you can continue its training on domain-specific data. It allows the model to learn nuances or specialized knowledge while retaining its general language understanding. This customization improves performance on targeted tasks like sentiment analysis, customer support, or medical text generation. RAG - RAG involves having a model retrieve relevant documents or data from an external source to enhance its responses. Instead of relying solely on pre-trained knowledge, it queries a database or knowledge base during inference. This approach improves accuracy and relevance, especially for tasks requiring up-to-date or domain-specific information. Transformer – Now we are getting into neural networks. In particular this is an architecture designed for processing sequential data, like text, by using attention mechanisms to focus on the most relevant parts of the input. It replaces traditional recurrence with parallel processing,? making?it highly efficient and scalable. Context Window - A context window in LLMs refers to the amount of text the model can process at once, measured in tokens. Meaning longer context windows allow the model to handle more extensive or complex conversations. Now don't get me wrong, it's great to understand these from a high-level. But, if you want to learn even more about how data teams and companies are using LLMs for real use cases sign up for our free webinar I am hosting with Richard Meng from Roe AI. He has been sharing about a lot of his experiences working with enterprises to make unstructured data useful. If you want to learn about what he is actually see work then sign up below! Sign up here - https://lnkd.in/eNrxx-PV

    • 该图片无替代文字
  • Acheron Analytics转发了

    查看Seattle Data Guy的组织主页

    54,069 位关注者

    If you haven't had to pull and report data from Netsuite, consider yourself lucky! There is a reason so many people I've talked to hire consultants to both pull the data and make sense of it. That's why when I saw that Estuary and Fornax recently worked together to help deliver value for a client who had issues with their financial reporting I was excited! Their client wanted the banner-level sales performance in real-time in order to track efficiency and plan further store-level engagement accordingly. But they were struggling to do so. Additionally, prior attempts to customize NetSuite failed to address core data processing challenges, leaving the organization with limited visualization capabilities and insufficient tools for comprehensive analysis of financial performance across customers, products, and categories. That's when they turned to the teams of David Yaffe and Darshan Bhagat who worked together to deliver massive impact fast! You can read more about it here - https://lnkd.in/gjqdCncf

    • 该图片无替代文字
  • Acheron Analytics转发了

    查看Benjamin Rogojan的档案

    Fractional Head Of Data | Reach Out For Data Infra And Strategy Consults

    Are you looking to improve your data team or data infrastructure in 2025? Here are 7 great articles and videos aimed at helping you improve as a data leader, reduce costs and improve output. 1. Thinking Like an Owner: Elevating Your Data Team's Impact In 2025 https://lnkd.in/gYAi_Eky 2. Cutting Your Data Stack Costs: How To Approach It And Common Issues https://lnkd.in/gnWBC3t6 3. From Analyst to Leading Data Teams at Google and Instagram: Essential Skills for Advancing Your Data Career with Kathleen Hayes https://lnkd.in/gcZ36jYq 4. We Need To Simplify Your Data Infrastructure https://lnkd.in/gFYa3rDw 5. Understanding The Business - How To Find High ROI Data Projects https://lnkd.in/gwSe859r 6. Navigating The Data Leadership Landscape - From IC To Director with Celina Wong https://lnkd.in/g5rd2WPZ 7. Developing Production Databricks Pipelines by Daniel Beach https://lnkd.in/gZ_j4HPv What are your favorite articles?

  • Acheron Analytics转发了

    查看Benjamin Rogojan的档案

    Fractional Head Of Data | Reach Out For Data Infra And Strategy Consults

    Please stop using MongoDB for analytics.... Even in 2025, it's not uncommon to see companies using their transaction system (OLTP) to perform analytics. Like many things....this works until it doesn't. There is a reason OLAP systems became popularized and its not because we like spending all day trying to duplicate data from one database to another. OLAP systems are optimized to run OLAP queries and be a little more friendly to the end-user. Here is a slightly deeper dive into the difference between OLTP vs OLAP. ?? Access Patterns The access pattern of an OLTP?system is characterized by a high volume of small, frequent transactions?that require fast response times?and concurrent access?by multiple users. The access pattern of an OLAP?system is characterized by fewer, larger, and more complex queries that?require longer response times but provide greater?analytical capabilities. ????Data Model OLTP systems typically use a?normalized data model, where data is organized into multiple tables and?relationships. Normalization reduces?redundancy and ensures?data consistency. Of course this is different when referring to MongoDB and other similar DBs. OLAP data models tend to be more?denormalized. This should reduce the number of joins required and?generally make it easier for an analyst to understand how to write their query. ?? Size OLTPs tend to be smaller in terms?of memory since they might only hold the current data and not?historical changes. OLAPs will be larger as they will store historical data as well as data from multiple systems. ?? Performance OLTPs need to have fast response times. Otherwise, end-users would be concerned that their tweet didn't go through OLAP systems can get away with being a little slower. But if your dashboard is taking minutes,?DM me. Truthfully, I am surprised I am still seeing so many companies fall into this trap. If you're looking to dive deeper into this topic I wrote an article you can read here - https://lnkd.in/dKP_M4AB

    • 该图片无替代文字
  • Acheron Analytics转发了

    查看Richard Meng的档案

    Co-founder @ Roe AI | Agentic Unstructured Data Workflows | ex-Snowflake ??

    We've spoken with 30 companies who developed RAG-based chatbots on PDF documents. Every single one has failed: Core issues: 1) In vector space, "non-dairy products" is often closer to "milk" than "meat," this is a fundamental flaw of vector embedding search because they're very lossy. 2) Splitting documents into smaller chunks disrupts coherence, breaking cross-references and context. 3) Adopting new RAG architectures, re-embedding chunks with new models, and rerankers requires continuous, costly data (re)engineering efforts. 4) No Support for Aggregations – Vector search struggles with queries requiring aggregation (e.g., max, min, total), making it unreliable for analytical use cases. As a result, companies band-aid their chatbots by writing complex heuristics to patch these failures. Ironically, many end up going back to rule-based chatbots. Our advice is simple - Do You Even Need RAG? LLM models are dirt cheap now and quite comparable to embedding models. If your documents are small: just load them directly into the LLM context. If your documents are large: Enrich with rich metadata and query the right documents and pages based on the metadata. Chatting on documents must be redesigned.

  • Acheron Analytics转发了

    查看Benjamin Rogojan的档案

    Fractional Head Of Data | Reach Out For Data Infra And Strategy Consults

    Sometimes, no matter how hard you’re trying to get a job the market just isn’t right. Then you add in ChatGPT and companies offering to automate the job application process...I can see it being more frustrating. Especially if you're trying to get your first job. I recall when I was first trying to get my first job…it took nearly a year and honestly, if it wasn’t for my dad literally pointing to a job on my screen and saying why don’t I apply there, it might have been even longer. You honestly might be doing everything you can to get a job right now and it may just not be working. But don't give up! You honestly never know where a job is going to come from. Whether it be a conversation with a friend, applying for a job your dad points to on Indeed or something else. You've got this! And if you're looking for more tips on a landing a job, I just put out an article on the topic below!

  • Acheron Analytics转发了

    查看Benjamin Rogojan的档案

    Fractional Head Of Data | Reach Out For Data Infra And Strategy Consults

    I really enjoy working in data engineering, but here are some harsh truths no one will tell you. - Your pipelines will break, and it will always be at 2 a.m. - You're going to be on a migration project every 1-2 yrs-ish - Backfilling will become one of your least favorite word - A lot of DE work is SQL and not big data frameworks(unless its SQL abstracting away a big data framework) - We tend to be the middle child between SWEs and DS - Vendors will say their products are "turn-key" or sell you stories that are too good to be true...but they won't be If you're looking to stay up on data engineering and infrastructure, then you should follow the over 100k subscribers on my newsletter here - https://lnkd.in/gNwgxpkk

  • Acheron Analytics转发了

    查看Seattle Data Guy的组织主页

    54,069 位关注者

    Are you a data engineer looking to improve your skills and streamline your data infrastructure? Look no further. In this hands-on tutorial, we’ll explore?Apache Iceberg, the revolutionary open table format transforming how data is managed in large-scale analytics environments. Whether you’re navigating schema evolution, optimizing partitioning strategies, or ensuring ACID compliance in your data lakes, this guide will equip you with practical insights and actionable steps to harness the full potential of Apache Iceberg. From understanding its core capabilities to implementing best practices, you’ll gain the knowledge needed to elevate your data engineering workflows and master the intricacies of modern data lake management. Let's take a look at what makes Iceberg a game-changer for data engineers and why it’s becoming a must-have tool in data analytics. by Daniel Palma https://lnkd.in/gyDRhFBN

  • Acheron Analytics转发了

    查看Seattle Data Guy的组织主页

    54,069 位关注者

    As data analysts or scientists, we often find ourselves working downstream in the data lifecycle. Most of the time, our role involves transforming and analyzing data that has already been prepared and served to us by upstream processes. However, having a deeper understanding of the entire data pipeline—from ingestion to transformation and storage—can empower us to optimize workflows, ensure data quality, and unlock new insights. Another advantage of gaining understanding—and hands-on experience—in data engineering processes is the?empathy?we build with our data engineers. These are the colleagues we work closely with and rely on, making a strong,?collaborative relationship?essential. Hence, in this hands-on article, we will explore the Python ecosystem by examining tools such as Mage, Polars, and DuckDB. We’ll demonstrate how these tools can help us build efficient, lightweight data pipelines that take data from the source and store it in a format that is well-suited for high-performance analytics. by José Pablo Barrantes https://lnkd.in/g7CzRMR7

相似主页

查看职位