Deployment
In the past two weeks, I’ve focused on streamlining cell code, so they can be readily deployed in a concurrent execution environment. Additionally, I was exploring feasible technologies that can 1) run CMUs + cortices 2) collect real-time data to generate knowledge and 3) have a periodic management mechanism which corrects stored knowledge.
Evaluating Technologies for CMUs and Cortices
Now that the algorithms are developed for two Beads components, the below evaluation helps in choosing the right technologies for deployment -
| Design Requirement | Local python threads with in-memory data | Redis streams / Celery | Kafka with microservices |
|---|---|---|---|
| Integration with existing code | Trivial / excellent | Excellent (redis-py, celery) | Excellent (containers can run python code) |
| End-to-end latency | Excellent | Good (local workers have low latency, networked workers don’t) | Moderate (distributed computing adds overhead) |
| Throughput | Limited by a single host’s CPU / memory. Might cause lags for millions of cells | Good (celery can scale workers but the network can be a bottleneck) | Excellent |
| Resiliency | Poor by default; data is lost if system shuts down unexpectedly | Moderate (Redis persistence helps) | Excellent when deployed well |
| Operational complexity | Lowest | Moderate (simple monitoring; run Redis and workers) | High (needs Kafka broker configuration, monitoring, Kubernetes, etc) |
| Scaling | Poor | Moderate (simply add workers) | Excellent (scale producers / consumers independently) |
| Cost | Lowest | Moderate (needs cloud VMs) | High (multiple cloud nodes and ops effort) |
| Handling large binary streams | Best for passing in-memory arrays but memory requirements add up quickly | Message serialization overhead | Can store pointers in Kafka and objects in store, but impacts latency |
| Suitability for real-time closed loop experiments | Best (as processes can be restarted quickly) | Reasonable if workers are colocated; still extra latency vs in-process but workable for many closed-loop timings | Network hops and broker overhead can severely impact latency |
| Integration with GPU | Good | Good (depends on worker nodes) | Good (depends on worker nodes) |
Based on the above, I think it is prudent to host cells that need extreme concurrency (like photoreceptors) on machines with GPU support. However, to ensure the end-to-end pipeline works, the ability to run real-time closed loop experiments is more important.
Algorithms / Development
CMU Development
Minor tweaks across the retina, to streamline cell methods into distinct cell creation, organization and functioning. Please find the changes in the commits (linked below for reference).
Development Activity - https://github.com/akhil-reddy/beads/graphs/commit-activity
Please note that some code (class templates, function comments, etc) is AI generated, so that I spend more of my productive time thinking and designing. However, I cross-verify each block of generated code with its corresponding design choice before moving ahead.
Next Steps
Deployment
- Overlaying video frames onto the retina, including code optimization for channel processing
- Post processing in the visual cortex
- Overlaying audio clips onto the cochlea, including optimization for wave segment processing
- Post processing in the auditory cortex
- Parallelization / streaming of cellular events via Flink or equivalent
Building the Environmental Response System (ERS)
- Building the ERUs
- Neurotransmitters - Fed by vision’s bipolar and amacrine cells, for example, to act on contrasting and/or temporal stimulus
- Focus - Building focus and its supporting mechanisms (of which acetylcholine is one)