登录查看更多内容

AI for the rest of us

GitHub

The AI-powered developer platform to build, scale, and deliver secure software.

发布日期: 2023年2月2日

You’ve probably heard that artificial intelligence (AI) and machine learning (ML) are the future. But what does that mean for your career in tech?

The ReadME Project senior editor Klint Finley asked three AI/ML experts to weigh in on some of the most pressing questions facing developers today:

Thomas J. Fan is a core maintainer of the Python machine learning library scikit-learn, a contributor to the scikit-learn compatible neural network library Scorch, and a staff software engineer at Quansight Labs.

Ines Montani maintains the Python natural language processing library spaCy and develops Prodigy, a modern annotation tool for creating training data for machine learning models. She’s also the co-founder and CEO of Explosion.

Noah Gift has taught AI and ML at the Duke MIDS Graduate Data Science Program, the Graduate Data Science program at UC Berkeley, and the UC Davis Graduate School of Management MSBA program, among others. He is the author of Implementing MLOps in the Enterprise, Practical MLOps, Cloud Computing for Data Analysis, and many other books.

Klint: Who needs to “do” AI/ML? Do front-end developers? What about DevOps?

Thomas: I always preface this by saying that if you don’t need ML in your product, or you don’t have the data to build and run the models, you probably shouldn’t use ML. It will only complicate your system. The people who need to “do” AI/ML are those who have a use case and the data to support it. What you actually need to do and know will depend on your role, your organization, and your application. Front-end developers might only end up consuming an ML-based API, while the ops team might need to be versed in the whole AI lifecycle so they can maintain all the necessary systems.

Ines: Exactly, you don’t do machine learning for the sake of doing it. You do it for a reason connected to what you’re already doing. Once you have an application for it, you’ll find that AI/ML is an interdisciplinary field with the need for skills from many different disciplines. An ML project may call on domain experts in relevant fields, both front-end and back-end developers, data scientists, data visualization experts, and communications professionals. Not everyone needs to be an expert in every aspect of ML, they just need to be able to bring their skillset to bear on the project.

Noah: That’s true, but it feels like we’re reaching the point that AI/ML tools, like GitHub Copilot or Hugging Face’s pair-programming module, are mature enough that AI/ML could be a part of every software engineering process, regardless of whether the software you’re writing has an AI/ML component. So at a certain point, all developers might be using AI/ML for some part of their work. It will help to understand a little about the different models these systems use, though you can take advantage of the tools without this deeper knowledge. GitHub Copilot uses the OpenAI Codex via the API, so you can look at the OpenAI documentation to get a better sense of what GitHub Copilot is doing.

Klint: How deep do developers need to go? Should everyone be brushing up on nonlinear algebra so we can work directly with algorithms?

Thomas: Only a fraction of the people working with an AI/ML system need to know the mathematics involved—primarily research scientists or research engineers. In most organizations, the majority of AI/ML work involves data analysis and software engineering: building software systems to support the flow of data or making business decisions based on the outputs of models.

Ines: Algorithms are what everyone talks about because it’s where most of the research happens. It’s certainly helpful to have a high-level understanding of the different major algorithms that you might use and their advantages and disadvantages, but you don’t need to go deep on the mathematics. One thing that I think is more important than what algorithms you use is the data that your model is trained on. You need to know its origin and its reliability. You should also know about evaluation methods and how to determine whether your system is actually doing something that’s useful—which is not always the same as a high accuracy score.

Noah: I agree that the math is unnecessary for most people. It doesn’t hurt to know a little calculus or linear algebra, but you don’t need it to use AI/ML. I think learning by doing is a good practice. Try out different models and see what produces useful results. Hugging Face is a great place to get started because there are so many different libraries available.

Klint: How can developers who aren’t necessarily deep into the algorithmic side of AI/ML make sure they are creating ethical systems?

Thomas: Much of the responsibility for ethical AI falls on those building the algorithms and models, and selecting the training data. If you’re downstream of these concerns, then finding resources to learn about ethical systems is important.?

One resource I recommend is Fairlearn, a Python package for assessing biases or unfairness in AI systems. It’s a good starting point for understanding what it means to be fair, how to measure it, and the upsides and downsides of different approaches.

Ines: Exactly, and to add to this: In a real-world context, like within a company, you also can’t easily separate the algorithms and models from their application. The same model trained on the same data can be very useful and unproblematic in one context, and produce extremely harmful outcomes in another. You should assess this for every use case. This requires diverse teams and a lot of attention to the data.

Noah: It is up to all humans to look out for other humans. We shouldn’t just “follow orders,” but be aware of the impact our work has on the larger world. If you see something, you should say something. What you may find out is that the majority of humans want to live in a safe, unbiased world that’s free from algorithmic harm. In particular, recommendation engines and social media companies are great examples of products with good theoretical use cases that, once out in the wild, have produced great harm to the world. For employees of companies that create these products, if you see great harm being done, you can voice your concerns about it internally to your peers and others in the organization. Ethical leaders will listen to your concerns.

Data Science Dojo 1 年前

The Future of AI Tech Stacks

Udit Goenka 1 个月前

Top Data Analytics Skills and Platforms for 2023…

Open Data Science Conference (ODSC) 1 年前

Similarly, for users of products that cause great harm, you can make your voice heard by disengaging with harmful companies that act unethically. One framework I would propose is a simple Positive, Neutral, and Negative score for products and companies. If a product or company is Neutral or Positive in the impact it has on the world, then keep using the product or service. If the product is Negative or Neutral - (i.e. trending negative), avoid using it unless there is no alternative. In some cases, even if a product has no alternative, you may want to consider avoiding it anyway. This framework is flexible because each person can decide how they rank products and services and is an easy way to make the world a better place.

Klint: What are some of the different areas that people can get involved with AI/ML, apart from studying algorithms or passively using an AI-based app. For example, in a previous Q&A we talked with Karthik Iyer, a software engineer who decided he didn’t want to get into the mathematics of deep learning, but contributed to TensorFlow’s data visualization library.

Ines: Most of the difficult decisions in machine learning are about asking the right questions. If you have domain knowledge in a particular field, like medicine or manufacturing, you can help AI experts build models and tools in an advisory role. Visualization tools are a great place to get involved. So is documentation. Almost every project can always use more help with documentation, tutorials, and things like that.

Thomas: Accessibility is another great place to pitch in. It’s extremely valuable work. If AI is going to be the future, we need more people to share their perspectives. Over the past few years, we’ve been working on making the graphs used in our scikit-learn examples accessible to color blind users. We’ve had some people who just can’t see the graphs. Helping out with features like that are a fairly easy way to get involved.

Attending or organizing meetups is another way to find opportunities to get involved with AI/ML. You can organize internal meetups in your organization to find other people who are interested, find ways to apply your skills to existing projects, or come up with places to start using AI/ML if your organization doesn’t use it already.

Noah: You can always become a citizen data scientist, even if you’re not an expert in deep learning. You can seek out datasets on anything you’re interested in, say gun violence or inflation, and start putting together notebooks on Google Colaboratory so that other people can use them to explore those issues. It’s a way to learn more about machine learning tools while also potentially doing some social good.

Klint: What are some good open source projects that are welcoming to contributors who are new to AI/ML and looking to get their feet wet?

Noah: Hugging Face is really open to new contributors. A great way to contribute to the community is to build new libraries. Also, any of the Python ecosystem projects, such as Pandas, scikit-learn, and Seaborn, are good places for anyone to get started in contributing to open source, whether that’s through documentation or something else. Some of those projects might not be exclusively focused on AI/ML, but you’d be contributing to things that are foundational to the whole data science and AI/ML ecosystem, and they tend to be open to new contributors.

Thomas: Many of the major Python projects like NumPy and SciPy have community hours at least once a month—usually more frequently. They’re an opportunity for new? and potential contributors to learn about the issues the maintainers are working on, meet other contributors, and basically just put a face to all the names you interact with on GitHub. It’s really inviting and a good way to get started in open source in general.

Ines: There are a lot of open source projects with communities and initiatives to onboard contributors. But like Noah said, the best way to get started in AI/ML is to build something. Building your own library and applying it to something is a great starting point towards contributing to the open source AI/ML ecosystem. You can say “I built this project’ instead of just “I fixed this bug.” We designed spaCy to make it easy to build plugins and we encourage people to contribute in that form. That gives you agency over your own project and you don’t have to worry as much about the internals of the project.

Klint: What are some adjacent skills that developers working with AI/ML need to learn?

Thomas: Software engineers and researchers need to learn different things. They both need to understand each other’s skill sets. The most important skill for most developers, regardless of their role, is communication. If you’re working on AI/ML, you’re going to need to communicate with a lot of different people in different ways. The way you talk with an engineering manager who’s well-versed in the technical aspects of AI/ML might be different from a skip manager who isn’t as well-versed and just needs to know how things affect the business.

Noah: A lot of data scientists and AI experts who come out of academia don’t have software engineering best practices in mind. If you look at their code in Jupyter Notebooks it can be pretty rugged. A way to make a name for yourself would be applying best practices, like linting, formatting, continuous delivery, and those sorts of things. Almost no one is against these, but it’s lacking quite a bit in the data science world.

Ines: If you work in applied machine learning, I think it’s relevant to have product-focused thinking. Think about what your system does and how the work you’re doing connects to a larger organizational goal.

You also need to develop a good BS detector. A lot of things are presented as AI or as cutting-edge machine learning but simply can’t work as magically as advertised—or might not be cutting-edge at all. To distinguish fact from fiction, you need to learn to ask the right questions, like what data a model is trained on or where the system is most likely to encounter problems. If someone can’t answer those questions, your alarm bell should ring.

Do you have a burning question about Git, GitHub, or open source software development? Drop your question to us on your social channel of choice using #askRMP, and it may be answered by a panel of experts in an upcoming newsletter or a future episode of?The ReadME Podcast!

Want to get The ReadME Project right in your inbox??Sign up for our newsletter?to receive new stories, best practices and opinions developed for The ReadME Project, as well as great listens and reads from around the community.

Branching Out_

1,890,471 位关注者

INTELLITHING

1 年

Digital transformation is essential, and our no-code platform can help you make the process smoother. You can quickly create AI solutions tailored to your needs. Ready to transform? ??

Lisa Williams

Senior Human Resources Generalist - Washington State Bar Association (WSBA)! Are you ready to take the next step in your career? Let's connect and discuss the positions our team is currently hiring for!

1 年

Developer opening - Remote in WA State https://www.wsba.org/career-center/work-at-the-wsba #WorkLifeBalance Don't FEAR Layoffs! #jobstability

Douglas Rono

Software Developer

1 年

Thanks for sharing

Armand Ang

Business Owner

1 年

Love this

Vered Zimmerman

Founder | FinText: Automating financial storytelling

1 年

Fantastic interview. Just like excel isn't just for "financial analysts", it's important AI doesn't stay "just for engineers" with everyone else just using the apps someone else built.

1 次回应

查看更多评论

要查看或添加评论，请登录

AI for the rest of us

GitHub

The AI-powered developer platform to build, scale, and deliver secure software.

领英推荐

Branching Out_

1,890,471 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

Why LLMs Hallucinate; GraphGPT; Inside Microsoft’s small LLM; Deploy Tiny Llama on AWS EC2; Fine-Tune LLM using PyTorch; and More

Deep Learning Approaches to Sentiment Analysis, Data Integrity, and Dolly?2.0

Tensorflow

Blueprints for Success: Episode 25 with Riddhi Dasani

OpenAI o1 Is Out: Embracing Inference-Time Scaling and the Future of AI Reasoning

7 Awesome and Free AI Tools You Should Know

How to Learn AI on Your Own

15 Machine Learning Libraries and Tools for Java

How to Create an AI Writer or Chatbot Tool using GPT-3

Economic Value of Learning and Why Google Open Sourced TensorFlow

领英推荐

Branching Out_

1,890,471 位关注者

Future-proofing your development: 5 ways to keep your skills fresh ????

2024年10月10日

What to expect at GitHub Universe 2024

2024年9月12日

?? How to become a certified code champion

2024年8月8日

How to create delight in the developer workplace

2024年7月11日

AI is here. How is it changing the way developers work?

2024年6月27日

An introduction to Git for beginners

2024年6月13日

Maintaining your joy in open source

2024年5月30日

6 tips to supercharge your career in 2024

2024年5月9日

Q&A: Expert advice on getting started in platform engineering

2024年4月25日

How to glitz up your GitHub profile and advance your career

2024年4月11日

社区洞察

其他会员也浏览了

Why LLMs Hallucinate; GraphGPT; Inside Microsoft’s small LLM; Deploy Tiny Llama on AWS EC2; Fine-Tune LLM using PyTorch; and More

Deep Learning Approaches to Sentiment Analysis, Data Integrity, and Dolly?2.0

Tensorflow

Blueprints for Success: Episode 25 with Riddhi Dasani

OpenAI o1 Is Out: Embracing Inference-Time Scaling and the Future of AI Reasoning

7 Awesome and Free AI Tools You Should Know

How to Learn AI on Your Own

15 Machine Learning Libraries and Tools for Java

How to Create an AI Writer or Chatbot Tool using GPT-3

Economic Value of Learning and Why Google Open Sourced TensorFlow