Arnesh Banerjee

I like to train models, build agents, and tinker with multi-agent systems for autonomy and safety.

I am an undergraduate researcher (B.Tech CSE, Data Science) at Heritage Institute of Technology, Kolkata. I am interested in multi-agent reinforcement learning, computer vision, AI safety, LLMs, and Applied ML. Past projects include applied ML for cancer prognosis, thermographic image segmentation for cancer detection using hybrid CNN-Transformer architectures, and a modified Safe RLHF pipeline for safety benchmarking and safer alignment of large language models. I also worked on identifying failure modes in LLMs in the context of mathematical reasoning.

My ongoing work is based on RL environments and simulation for autonomous systems — a MARL drone simulator for defence applications. I am currently interning at IIT Kharagpur, working on India's first genomic language model (IgLM).

Research Experience

IIT Kharagpur — May 2026 – present, on-site [offer, KRITI portal]

Advisor: Dr. Sourangshu Bhattacharya

Working on IgLM, India's first population-specific genomic foundational model (StripedHyena2 architecture), under the GRISHMA Summer Internship Program. Reached pooled XGBoost ROC-AUC 0.933 on single-sample oral-cavity cancer detection from RNA-seq (58,147 genes, 1,206 samples across 1,160 patients) by training an L1-embedded classifier family under patient-grouped 5-fold stratified cross-validation with predictions pooled out-of-fold.
Verified the signal is biological — not tissue-of-origin — by designing a layered TCGA negative class (non-oral HNSC, solid-tissue normal, 8 unrelated cancers), reporting per-stratum sub-AUCs of 0.800, 0.900, and 0.987; confirmed robustness to class skew via a 1.5:1 balanced re-run preserving the headline (XGBoost 0.928).
Ported the Cedars-Sinai Molecular-Twin (MTPilot) L1-embedded model family into a shared module reused across the IgLM detection and downstream survival-prediction pipelines.

Jadavpur University — Nov 2025 – May 2026, remote

Advisor: Prof. Debotosh Bhattacharjee

Designed a hybrid CNN–Transformer segmentation model (ResNet-34 encoder + ASPP + Transformer bottleneck + SE-gated skip connections) with a differentiable Chan–Vese level-set loss, reaching 0.9716 Dice and 0.9463 IoU on the DMR-IR dataset (357 thermograms, 119 patients) under patient-stratified 5-fold cross-validation.
Surfaced an annotation-quality ceiling in weakly supervised thermography by benchmarking against four SOTA baselines (Attention U-Net, UNet++, DeepLabV3+, TransUNet) on five metrics (Dice, IoU, HD95, ASSD, BF1) and showing all models converge to statistically indistinguishable Dice (≈0.97, p > 0.05, paired Wilcoxon with 1000-resample bootstrap CIs).
Built a robustness battery (label-noise injection at 10–30%, augmentation regimes, 25–100% training subsets) and an explainability suite (Grad-CAM, attention maps, Monte-Carlo dropout uncertainty) for clinician-facing decision support.

New Jersey Institute of Technology — Jun – Nov 2025, on-site / virtual [certificate]

Advisor: Dr. Arnob Ghosh

Built a 2,500-pair safe/unsafe prompt–response dataset spanning jailbreak strategies, indirect requests, role-play, multi-step instructions, and ethical/unethical educational queries; assigned absolute binary harm labels (replacing the Bradley–Terry pairwise scheme) and fine-tuned the final six layers of LLaMA-2-7B-chat-hf with a dense classification head as the CS-RLHF cost model.
Validated semantic grounding of the cost model on the held-out test split and the external XS-Test benchmark, reaching ≈92% alignment with human safety judgments and XS-Test scores of 0.91–0.96 (matching human verdict 0.89–0.92), versus 0.07–0.32 for the Safe-RLHF baseline cost model.
Demonstrated the trained policy is 8× more efficient at flagging unsafe responses than Safe-RLHF and is preferred by humans in ≈60% of head-to-head comparisons (+70 Elo) over 1,000 sampled prompts; co-authored the resulting COLM 2026 submission (arXiv:2510.03520).

Heritage Institute of Technology — Oct 2024 – Mar 2025, on-site [AGC 2026 certificate]

Advisor: Ms. Arpita Talukdar

Improved WPBC accuracy to 93.67% (SVM + RFE) and WDBC to 97.77% (LogReg + RFE) by adding RFE/SFS feature selection, SMOTE class balancing, and GridSearchCV hyperparameter tuning over a dual-stage diagnosis-and- recurrence ML framework benchmarking five classifiers (RF, SVM, Logistic Regression, MLP, XGBoost) under stratified 10-fold cross-validation.
Identified clinically relevant nuclear features through a comparative analysis across model–feature-selection combinations, reported with bootstrap confidence intervals.

Publications

Recursive and Wrapper-Based Feature Selection for Breast Cancer Diagnosis and Prognosis [oral, certificate]
Ayushi Bhattacharjee, Arnesh Banerjee, Arpita Talukdar.
4th Analytics Global Conference (AGC 2026), March 2026.

Preprints

Certifiable Safe RLHF: Semantic Grounding and Fixed Penalty Constraint Optimization for Safer LLM Alignment [under review, COLM 2026]
Kartik Pandit, Sourav Ganguly, Arnesh Banerjee, Shaahin Angizi, Arnob Ghosh. 2025.
arXiv:2510.03520

An Intelligent Weakly Supervised Framework for Breast Thermography Segmentation Using Hybrid CNN–Transformer Networks [in prep]
Arnesh Banerjee, Debotosh Bhattacharjee.
In preparation for Expert Systems with Applications.

Ongoing Research

Co-evolutionary Multi-Agent RL for Autonomous Drones
Arnesh Banerjee.
With the AI for Defence Lab, ULiège, Belgium.

Understanding the Limitations of LLMs in Mathematical Reasoning
Arnesh Banerjee, Ayushi Bhattacharjee, Subhajit Datta.
Advisor: Prof. Subhajit Datta. B.Tech coursework.

Analyzing Historical Revisionism in LLMs in the Context of Indian History
Kartik Pandit, Sourav Ganguly, Arnesh Banerjee, Ayushi Bhattacharjee, Avirup Chakraborty, Arnob Ghosh.
Advisor: Dr. Arnob Ghosh.

Blogs

Coming soon — I plan to write about RL environments, MARL, interpretability, and notes from papers I find interesting.