Batching in Clojure is boring

Batching in Clojure is boring

Sometimes you need to process data. And sometimes it's a lot of data so you want to do the processing in batches.

In a "normal" programming language you might do a loop, collecting N records first and then passing them to the processing machinery, taking care of combining the results. You'll need to count and to reset that counter and so on.

In Clojure though... it's actually boring to do batching.

Demo time

Imagine you have a little AI and you want it to read some books to become smart just like its parents.

We have our books here:

  (def a-lot-of-books
    [{:title "Dune"          :read? false}
     {:title "Infinite Jest" :read? false}
     {:title "Naked Lunch"   :read? false}
     {:title "On the Road"   :read? false}
     {:title "Neuromancer"   :read? false}])        

Let's just pretend it's a lot of books, OK?

And you have the reading machinery. Very complex:

  (defn read-book
    [book]
    (assoc book :read? true))

  (defn read-books
    [books]
    (map read-book books))        

A function that goes through all the books and sets the read? flag to true.

Now you give AI the books to read like this:

(read-books a-lot-of-books)

;; Result:

({:title "Dune", :read? true}
 {:title "Infinite Jest", :read? true}
 {:title "Naked Lunch", :read? true}
 {:title "On the Road", :read? true}
 {:title "Neuromancer", :read? true})        

And it works. But maybe all the books at once is too much. Not enough minerals, sorry, memory.

What if the little AI can only handle two books at a time. Do we loop and count?

Of course not. This is Clojure. We just.... read the documentation of the standard library until we find something that fits.

So we need to break a collection into sub-collections. We need to... partition it.

(partition-all 2 a-lot-of-books)

;; Result:

(({:title "Dune", :read? false}
  {:title "Infinite Jest", :read? false})

 ({:title "Naked Lunch", :read? false}
  {:title "On the Road", :read? false})

 ({:title "Neuromancer", :read? false}))        

Cool. We have a 3 groups. And because we used partition-all the last group is incomplete. If we use partition we get 2 groups and the last element is lost cause it doesn't fit the requirement. But this is not that kind of post.

Because of the way we wrote our processing function, we can simply reuse it. Just need to make sure we're giving it a collection of books.

(->> a-lot-of-books
     (partition-all 2)
     (map read-books))

;; Result:

(({:title "Dune", :read? true}
  {:title "Infinite Jest", :read? true})
 ({:title "Naked Lunch", :read? true}
  {:title "On the Road", :read? true})
 ({:title "Neuromancer", :read? true}))        

Great! It's working. But the results are also returned in groups. The map function returns a collection so we get a collection of collections. But we just want a list, just like in the non-partition version.

You mean you want a flat list? flatten then!

(->> a-lot-of-books
     (partition-all 2)
     (map read-books)
     flatten)

;; Result:

({:title "Dune", :read? true}
 {:title "Infinite Jest", :read? true}
 {:title "Naked Lunch", :read? true}
 {:title "On the Road", :read? true}
 {:title "Neuromancer", :read? true})        

Done!

This post is brought to you by: me, typing stuff in NeoVim.


Howard Lewis Ship

Seasoned Clojure/Functional Developer

11 个月

clojure.core/flatten can do more than you want; you should use mapcat instead of map when your steps generate a collection and you want the concatenated collections. "cat" is short for "concatenate".

要查看或添加评论,请登录

Evgheni Kondratenko的更多文章

  • Share utilities, not flow

    Share utilities, not flow

    Sometimes a program you're working on have multiple flows that look very similar. It could be something like this:…

    1 条评论
  • With or without str

    With or without str

    There are useful functions in Clojure that do not return anything but print text to out (usually the REPL, the log, or…

  • Clj-kondo in a monorepo

    Clj-kondo in a monorepo

    This season monorepos are back on the streets. You can see monorepos everywhere: in a startup, in a scale up, in a…

  • A book about Hackers

    A book about Hackers

    The last episode of the CoRecursive podcast has a story about a veteran game developer and designer Mick West…

  • Your desk is not a mess it's a playground

    Your desk is not a mess it's a playground

    Software engineers' work desks used to be interesting and fun. They looked like chaos at a first glance, but if you…

  • Single-header file libraries

    Single-header file libraries

    When I've started writing C/C++ programs twenty years ago I've learned that there are two types of files in my program:…

  • Source code is the ultimate documentation

    Source code is the ultimate documentation

    So I've been coding a custom Sentry SDK. While developing a Sentry SDK they recommend you to run a Sentry Relay - the…

  • Life before LSP

    Life before LSP

    You know, there was a time when LSP didn't exist. Yeah, I know.

  • Flatten with caution

    Flatten with caution

    In one of my previous Clojure posts I've used flatten in my examples to concatenate collection of collections after a…

    3 条评论
  • There is more than one way

    There is more than one way

    In one of my recent Clojure posts Dave Liepmann has commented: ..

社区洞察

其他会员也浏览了