Home

Tutorial: Primary Data analysis

Contributors: Andreas Haahr Larsen, Martin Cramer Pedersen, Jeppe Breum Jacobsen.

Cover image adapted from Balu et al. 2016, J. Phys. Chem. B 120: 6490-6503 with permission.

Before you start

No requirements, but for some of the tutorial parts you can opt for using:
(1) Python (including numpy, scipy and matplotlib) or similar for importing and plotting data (free and open source software, installation required), or
(2) ATSAS (free software for academic users, installation required).
(3) BayesApp (free and no installation required, but requires internet connection).

Learning outcomes

Be able to retrieve structural information from a SAXS or SANS dataset by primary data analysis, before doing detailed modelling, e.g. with form factors. This includes:

Learn how to do a Guinier analysis and retrieve the radius of gyration.
Use Kratky plot to assess whether a particle is compact/well-folded or disordered/unfolded.
Generate a $p(r)$ to retrieve structural parameters, $R_g$ and $d_\mathrm{max}$.
Estimate particle shapes, such as spheres and cylinders from looking at the $p(r)$.
Recognize the $p(r)$ from a sample of particles with positive and negative scattering internal contrasts.

Part I: Guinier plot and Guinier analysis for $R_g$ determination

The Guinier approximation is valid at low values of $q$: $$I(q) \approx I(0)\exp(-q^2 R_g^2/3)$$ From a Guinier analysis you can derive the (contrast-weighted) radius of gyration, $R_g$ of the particles, i.e., a Guinier analysis provides the overall size of the measured particles. Extended particles have larger $R_g$ than compact particles (with the same volume), so $R_g$ also contains shape information.
The Guinier approximation can be expressed as a linear relation: $$\ln(I) = \ln(I_0) - (R_g^2/3)q^2$$ Therefore, a plot of $\ln(I)$ vs $q^2$ (for small values of $q$) is denoted a Guinier plot, and $R_g$ may be determined from the slope of this plot. This relation is only true for small values of $q$, which is the denoted the Guinier region of data. As a rule of thumb, the Guinier approximation is valid up to $qR_g = 1.3$ for globular particles (i.e. sphere-like particles), and up to $qR_g = 1.0$ for elongated particles (e.g., cylinders).

Try determining the radius of gyration for the protein lysozyme from the small-angle scattering biological data bank (Download files > curve (dat)).
We here provide guides to do it (1) manually (python or similar required, good if you would like to train some data handling), (2) using BayesApp (no installation required, fast and easy), (3) using AutoRg (ATSAS software required, used a lot in the field of biological SAXS).
Your estimation of $R_g$ should be consistent with the reported value in the SASBDB entry (within +/- 1-2 Å). However, sometimes $R_g$ is larger than expected.

Large $R_g$ values may be a sign of:

Aggregation.
Oligomerization
Change of shape (more elongated particles have larger $R_g$ than compact particles).

lower

If a particle is too large, compared to the minimum $q$ value, there is no Guinier region (or rather, the Guinier region is at lower values of $q$ than those measured). Consequently, $R_g$ for that particle cannot be determined from Guinier analysis.

Part II: Kratky plot

The Kratky plot ($q^2I$ vs $q$) carries information about the structural flexibility of a particle. This is, e.g., used for studies of the foldedness of proteins:

This interpretation assumes that the particle is homogeneous, meaning the scattering length density (SLD) is approximately the same throughout the particle. Proteins are an example of approximately homogeneous particles. Detergent micelles or block copolymer micelles are examples of particles that are not homogeneous.
Make a Kratky plot for the SAXS lysozyme data. You may use BayesApp (enable Kratky plot by clicking the box) or this python script. Either way, you may need to adjust the background level. In BayesApp this is done by clicking Advanced options -> Subtract constant before...
Lysozyme measured under these conditions is (predominantly) folded - you can see that from the Kratky plot by comparing with the idealized Kratky plots.

Part III: Indirect Fourier Transformation

As the coherent scattering can be calculated by a Fourier transformation of the particle, then an inverse Fourier transformation of the data should give back the structure. Unfortunately this is not the case for small-angle scattering as structural information is lost.

There is a loss of structural information in small-angle scattering due to:

Random orientation of the sample (as opposed to aligned particles).
The phase problem. The wave amplitude is measured by the detector, but the phase is not.
Noise in data.
The data is measured in a finite range (between $q_\mathrm{min}$ and $q_\mathrm{max}$).
Smearing effects, from instrumental smearing or polydispersity.

However, by doing an indirect Fourier transformation, you can get a 1D representation of the particle. This representation is called the pair distance distribution and is written $p(r)$. It is informally called the "p-of-r-function". For a homogeneous particle, it is simply a histogram of how many pairs of scatterers that have a specific distance, $r$, between them. This pair distance distribution is characteristic for specific shapes:

Thus, without any modelling, you may get an idea of the structure of your particle of interest.
To see an example of this, go to Shape2SAS and press the green "Submit" button with the default values. Then you will see the $p(r)$ for a 50 Å sphere, along with the calculated scattering. A couple of observations

The pair distance distribution is symmetric, as a sphere is symmetric in all directions.
$p(r)$ starts in zero and goes to zero at some value of $r$. The latter is the maximum distance between any pair in the particle and is denoted $d_\mathrm{max}$. For a sphere, $d_\mathrm{max}$ equals the diameter.
The most common distance between pairs of scatterers in a sphere is equal to its radius.

If you Fourier transform $p(r)$, you get the calculated scattering. This is what the Shape2SAS program does.
To see and compare the $p(r)$ from various particles, go back to Shape2SAS and click the box "Calculate scattering for Model 2" to choose another shape (e.g. a cylinder) or a larger/smaller sphere as Model 2. Press "Submit"

$p(r)$ for non-homogeneous particles.
If the particle is not homogeneous (the contrast is not approximately the same in the whole particle), then the $p(r)$ is harder to interpret.
This is because $p(r)$ is a distance distribution weighted with the product of the excess scattering length (or contrast or $\Delta\mathrm{SLD}$), i.e. the difference between the scattering length of a given scatterer and the scattering length of the solvent. This product is negative if one has negative and one has positive contrast. This often leads to bumps or oscillations in the $p(r)$. In Shape2SAS, simulate a spherical core-shell particle with negative core contrast ($\Delta\mathrm{SLD}=-1$) and and positive shell contrast ($\Delta\mathrm{SLD}=1$)

In this case, you will get oscillations, and you may even get negative parts in the $p(r)$. The negative parts come if, at a given distance, there are most pairs having contrasts with opposing signs (i.e. one has positive and one has negative contrast.)

Generate a $p(r)$
Try to do an indirect Fourier transformation of the lysozome data. This can be done in BayesApp (no installation required) or in many software packages, including ATSAS (guide for generating $p(r)$ in ATSAS). You may give an estimate of the $d_\mathrm{max}$ (ca 50 Å, note that $d_\mathrm{max}$ is called "Rmax" in ATSAS). This may help the algorithms find a better solution faster.
By comparing with the $p(r)$ from a sphere, you can see that lysozyme is globular (i.e., roughly spherical), but not fully symmetric. More like an almost-spherical ellipsoid.

The radius of gyration can also be determined from $p(r)$, as $R_g$ is defined as:
$$R_g^2 = \frac{1}{2}\frac{\int_0^{dmax} r^2 p(r)}{\int_0^{dmax} p(r)}$$ A good quality check of data is to compare the $R_g$ obtained from $p(r)$ with that obtained by Guinier analysis.

For non-homogeneous particles, you may get negative contributions to $p(r)$. In that case, it is necessary to allow negative values in $p(r)$:

To see the effect of NOT doing this, try to run BayesApp on the simulated micelle data from Shape2SAS, which had negative contrast in the core (Shape2SAS>or download this simulated dataset). You may help the algorithm by providing an approximative value for the maximum distance, $d_\mathrm{max}$.
You will notice that the $p(r)$ does not resemble the one you generated in Shape2SAS.
Now, allow BayesApp to use negative values in $p(r)$ by setting Transformation/Regularization to "Negative", and rerun.

Challenges

Challenge 1: You have measured a sample of RNase A in D₂O with SANS (download SANS data). What is the overall radius of gyration of this protein?
Challenge 2: You have measured the microtubule-associated protein 2 (MAP2) in SAXS (download SAXS data). Is it folded or unfolded?
Challenge 3: A sample of discoidal particles (i.e. shaped like discs/short cylinders) was measured in SAXS at neutral pH (download neutral pH SAXS data) and pH 5.0 (download low pH SAXS data). What effect does the pH change have on the particles and their interaction?

Help and feedback

Help us improve the tutorials by

Reporting issues and bugs via our GitHub page. This could be typos, dead links etc., but also insufficient information or unclear instructions.
Suggesting new tutorials/additions/improvements in the SAStutorials forum.
Posting or answering questions in the SAStutorials forum.