On Thu, Jun 1, 2023 at 5:54 AM Hoch,Jeffrey <hoch@uchc.edu> wrote:

Hi Jared –

I love your “few things”! Those are a mouthful 😉. I’ll introduce our team members, and it would be great if you could summarize the discussions that you had in your initial meeting. We can offer an NMR perspective on the topics. As this will be our first all-hands meeting, I might share a small slide deck that I used to make the case to the wwPDB PIs that there is a need and an opportunity to make structure validation “more Bayesian”. I think I shared some of those slides with you all when I visited, but the deck is short and would be a good way of getting us all on the same wavelength (although it’s pretty clear were very close, if not already there).

I’m attaching a manuscript that’s currently undergoing final revisions. Please share it among your group, but not outside. A bit of context – when I took over as head of BMRB, there was already a validation task force working on revamping the validation pipeline for NMR structures. Unfortunately, although the effort is/was very well-intentioned, it retains much ad hoc and archaic language, e.g. “violations” of NMR “restraints” are dealt with in a way that simply isn’t consistent or applicable in a broader sense to any other type of empirical data. Rather than move the goalposts on the task force, I suggested they complete their work to achieve the original goal, and we would start a Bayesian initiative afresh. It serves to highlight some of the issues we will have to deal with – such as how do you convert hard upper and lower distance bounds into something that can yield a realistic distribution of errors/structures? This will be necessary for retrospective analysis of NMR structures in the PDB because distance bounds are in most cases all that was supplied by depositors. Going forward, BMRB will need to require peak tables with intensities of NOESY cross-peaks, or perhaps even raw time-domain data for NOESY experiments.

I’m cc’ing the rest of our team so you can capture their email addresses if you haven’t already. They are

Kumaran Baskaran – BMRB liaison to wwPDB and BMRB representative of the NMR VTF

Michael Gryk – associate director of BMRB and our bona fide data scientist

Hamid Eghbalnia – lead of the analytics technology development component of the NMRbox P41 grant, and our bona fide statistician/Bayesian

Yulia Pustovalova and Sasha Pozhidaeva, NMR spectroscopists par excellence, who have been utilizing AlphaFold in their workflows and trying out ways to validate computed structures based on prior knowledge.

Joseph Courtney – NMR spectroscopist and developer of the COMPASS package from Chad Rienstra’s group – COMPASS used MODELER, chemical shift prediction, and integrated some other software packages to determine protein structures from unassigned carbon-carbon correlation spectra (solid-state NMR). The use of forward-modeling of chemical shifts and unassigned peak lists to drive/constrain the structure determination was ahead of its time and very pertinent to Bayesian validation.

Looking forward to seeing you tomorrow.

Yours, Jeff

From: Jared Sagendorf <jared.sagendorf@rcsb.org>
Date: Wednesday, May 31, 2023 at 2:01 PM
To: "Hoch,Jeffrey" <hoch@uchc.edu>
Subject: Re: First meeting on Bayesian model validation (05/26)

*** Attention: This is an external email. Use caution responding, opening attachments or clicking on links. ***

Hi Jeff, just looping back to this! A few things that came up during the initial meeting with Andrej:

- Development of improved metrics for model quality

- Standardization of priors, forward functions, and likelihoods

- Establishment of a standardized vocabulary

- How to get different communities involved in all of the above decisions

In addition to any of the above, I'd be keen to learn more about what your group has been working in w.r.t. model validation, Bayesian or otherwise!

Let me know if you have any thoughts!

- Jared

On Thu, May 18, 2023 at 11:49 AM Hoch,Jeffrey <hoch@uchc.edu> wrote:

Hi Jared – thanks for touching base. Let me work on assembling some background information. First topic would be to introduce you all to the UConn team. More shortly... J

From: Jared Sagendorf <jared.sagendorf@rcsb.org>
Date: Thursday, May 18, 2023 at 2:34 PM
To: "Hoch,Jeffrey" <hoch@uchc.edu>
Subject: First meeting on Bayesian model validation (05/26)

*** Attention: This is an external email. Use caution responding, opening attachments or clicking on links. ***

Hi Jeff, this is Jared Sagendorf from Andrej's lab (sorry for the change in e-mail, still transitioning from my previous institution...). I'm looking forward to the discussion next Friday! I was wondering if you would like to set an agenda for the meeting, or keep the format relatively open?

I am also interested to know your feelings on what the scope of these discussions should be – a more broad discussion on Bayesian methods for validation, or a more narrow discussion on what is practical specifically in the context of the PDB/PDB-Dev.

Finally, if there's any papers from your group or otherwise you'd like to frame the discussion around feel free to let me know and I'll share with the others on our end.

Thanks!

- Jared