I am an undergraduate researcher (B.Tech CSE, Data Science) at
Heritage Institute of Technology, Kolkata. I am interested in
multi-agent reinforcement learning, computer vision, AI safety,
LLMs, and Applied ML. Past projects include applied ML for cancer
prognosis, thermographic image segmentation for cancer detection
using hybrid CNN-Transformer architectures, and a modified Safe RLHF
pipeline for safety benchmarking and safer alignment of large
language models. I also worked on identifying failure modes in LLMs
in the context of mathematical reasoning.
My ongoing work is based on RL environments and simulation for
autonomous systems — a MARL drone simulator for defence
applications. I am currently interning at IIT Kharagpur, working on
India's first genomic language model (IgLM).
Interested in joining a PhD program after my Bachelors.
Working on IgLM, India's first population-specific genomic
foundational model (StripedHyena2 architecture), under the
GRISHMA Summer Internship Program. Reached pooled XGBoost
ROC-AUC 0.933 on single-sample oral-cavity cancer
detection from RNA-seq (58,147 genes, 1,206 samples across
1,160 patients) by training an L1-embedded classifier
family under patient-grouped 5-fold stratified
cross-validation with predictions pooled out-of-fold.
Verified the signal is biological — not tissue-of-origin —
by designing a layered TCGA negative class (non-oral HNSC,
solid-tissue normal, 8 unrelated cancers), reporting
per-stratum sub-AUCs of 0.800, 0.900, and 0.987; confirmed
robustness to class skew via a 1.5:1 balanced re-run
preserving the headline (XGBoost 0.928).
Ported the Cedars-Sinai Molecular-Twin (MTPilot) L1-embedded
model family into a shared module reused across the IgLM
detection and downstream survival-prediction pipelines.
Jadavpur University
— Nov 2025 – May 2026, remote
Advisor: Prof. Debotosh Bhattacharjee
Designed a hybrid CNN–Transformer segmentation model
(ResNet-34 encoder + ASPP + Transformer bottleneck +
SE-gated skip connections) with a differentiable Chan–Vese
level-set loss, reaching 0.9716 Dice and
0.9463 IoU on the DMR-IR dataset (357 thermograms,
119 patients) under patient-stratified 5-fold
cross-validation.
Surfaced an annotation-quality ceiling in weakly supervised
thermography by benchmarking against four SOTA baselines
(Attention U-Net, UNet++, DeepLabV3+, TransUNet) on five
metrics (Dice, IoU, HD95, ASSD, BF1) and showing all
models converge to statistically indistinguishable Dice
(≈0.97, p > 0.05, paired Wilcoxon with 1000-resample
bootstrap CIs).
Built a robustness battery (label-noise injection at
10–30%, augmentation regimes, 25–100% training subsets)
and an explainability suite (Grad-CAM, attention maps,
Monte-Carlo dropout uncertainty) for clinician-facing
decision support.
New Jersey Institute of Technology
— Jun – Nov 2025, on-site / virtual[certificate]
Advisor: Dr. Arnob Ghosh
Built a 2,500-pair safe/unsafe prompt–response dataset
spanning jailbreak strategies, indirect requests,
role-play, multi-step instructions, and ethical/unethical
educational queries; assigned absolute binary harm labels
(replacing the Bradley–Terry pairwise scheme) and
fine-tuned the final six layers of LLaMA-2-7B-chat-hf with
a dense classification head as the CS-RLHF cost model.
Validated semantic grounding of the cost model on the
held-out test split and the external XS-Test benchmark,
reaching ≈92% alignment with human safety judgments and
XS-Test scores of 0.91–0.96 (matching human verdict
0.89–0.92), versus 0.07–0.32 for the Safe-RLHF baseline
cost model.
Demonstrated the trained policy is 8× more efficient
at flagging unsafe responses than Safe-RLHF and is
preferred by humans in ≈60% of head-to-head comparisons
(+70 Elo) over 1,000 sampled prompts; co-authored the
resulting COLM 2026 submission (arXiv:2510.03520).
Heritage Institute of Technology
— Oct 2024 – Mar 2025, on-site[AGC 2026 certificate]
Advisor: Ms. Arpita Talukdar
Improved WPBC accuracy to 93.67% (SVM + RFE) and
WDBC to 97.77% (LogReg + RFE) by adding RFE/SFS
feature selection, SMOTE class balancing, and GridSearchCV
hyperparameter tuning over a dual-stage diagnosis-and-
recurrence ML framework benchmarking five classifiers (RF,
SVM, Logistic Regression, MLP, XGBoost) under stratified
10-fold cross-validation.
Identified clinically relevant nuclear features through a
comparative analysis across model–feature-selection
combinations, reported with bootstrap confidence intervals.
* * *
Publications
Recursive and Wrapper-Based Feature Selection for Breast Cancer
Diagnosis and Prognosis
[oral,
certificate]
Ayushi Bhattacharjee, Arnesh Banerjee, Arpita Talukdar. 4th Analytics Global Conference (AGC 2026), March 2026.
An Intelligent Weakly Supervised Framework for Breast
Thermography Segmentation Using Hybrid CNN–Transformer
Networks
[in prep] Arnesh Banerjee, Debotosh Bhattacharjee. In preparation for Expert Systems with Applications.
* * *
Ongoing Research
Co-evolutionary Multi-Agent RL for Autonomous Drones Arnesh Banerjee. With the AI for Defence Lab, ULiège, Belgium.
Understanding the Limitations of LLMs in Mathematical
Reasoning Arnesh Banerjee, Ayushi Bhattacharjee, Subhajit Datta. Advisor: Prof. Subhajit Datta. B.Tech coursework.
Analyzing Historical Revisionism in LLMs in the Context of
Indian History
Kartik Pandit, Sourav Ganguly, Arnesh Banerjee, Ayushi
Bhattacharjee, Avirup Chakraborty, Arnob Ghosh. Advisor: Dr. Arnob Ghosh.
* * *
Blogs
Coming soon — I plan to write about RL environments, MARL,
interpretability, and notes from papers I find interesting.
* * *
Achievements
Selected for IIT Kharagpur, 3× IIT Patna, and IIT Dhanbad summer
research internships, and IIM Ahmedabad AI Venture Summer
Internship 2026. Offer emails:
IIT KGP,
IIT Patna 1,
IIT Patna 2,
IIT Patna 3,
IIM Ahmedabad.
Department Third, 4th Semester · SGPA 9.46 · B.Tech CSE(DS),
Heritage Institute of Technology.
Selected as one of 10 Institutional Innovation Council (IIC)
members representing the CSE(DS) department, HIT Kolkata.