{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "(longitudinal_models)=\n", "# Longitudinal Models of Change\n", "\n", ":::{post} April, 2023\n", ":tags: hierarchical, longitudinal, time series\n", ":category: advanced, reference\n", ":author: Nathaniel Forde\n", ":::" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "tags": [] }, "outputs": [], "source": [ "import arviz as az\n", "import bambi as bmb\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import pandas as pd\n", "import preliz as pz\n", "import pymc as pm\n", "import statsmodels.api as sm\n", "import xarray as xr\n", "\n", "lowess = sm.nonparametric.lowess" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "tags": [] }, "outputs": [], "source": [ "%config InlineBackend.figure_format = 'retina' # high resolution figures\n", "az.style.use(\"arviz-darkgrid\")\n", "rng = np.random.default_rng(42)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The study of change involves simultaneously analysing the individual trajectories of change and abstracting over the set of individuals studied to extract broader insight about the nature of the change in question. As such it's easy to lose sight of the forest for the focus on the trees. In this example we'll demonstrate some of the subtleties of using hierarchical bayesian models to study the change within a population of individuals - moving from the *within individual* view to the *between/cross individuals* perspective. \n", "\n", "We'll follow the discussion and iterative approach to model building outlined in Singer and Willett's *Applied Longitudinal Data Analysis*. For more details {cite:t}`singer2003`). We'll differ from their presentation in that while they focus on maximum likelihood methods for fitting their data we'll use PyMC and Bayesian methods. In this approach we're following Solomon Kurz's work with brms in {cite:t}`kurzAppliedLongitudinalDataAnalysis2021`. We strongly recommend both books for more detailed presentation of the issues discussed here. \n", "\n", "### Structure of the Presentation\n", "\n", "- Analysis of the Change over time in Alcohol consumption among teens\n", "- Interlude: Alternative Model Specification with Bambi\n", "- Analysis of the Change over time in Externalising Behaviours among teens\n", "- Conclusion" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exploratory Data Analysis: The Chaos of Individual Differences\n", "\n", "For any longitudinal analysis we need three components: (1) multiple waves of data collection (2) a suitable definition of time and (3) an outcome of interest. Combining these we can assess how the individual changes over time with respect that outcome. In this first series of models we will look at how adolescent alcohol usage varies between children from the age of 14 onwards with data collected annually over three years." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "
| \n", " | id | \n", "age | \n", "coa | \n", "male | \n", "age_14 | \n", "alcuse | \n", "peer | \n", "cpeer | \n", "ccoa | \n", "peer_hi_lo | \n", "
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | \n", "1 | \n", "14 | \n", "1 | \n", "0 | \n", "0 | \n", "1.732051 | \n", "1.264911 | \n", "0.246911 | \n", "0.549 | \n", "1 | \n", "
| 1 | \n", "1 | \n", "15 | \n", "1 | \n", "0 | \n", "1 | \n", "2.000000 | \n", "1.264911 | \n", "0.246911 | \n", "0.549 | \n", "1 | \n", "
| 2 | \n", "1 | \n", "16 | \n", "1 | \n", "0 | \n", "2 | \n", "2.000000 | \n", "1.264911 | \n", "0.246911 | \n", "0.549 | \n", "1 | \n", "
| 3 | \n", "2 | \n", "14 | \n", "1 | \n", "1 | \n", "0 | \n", "0.000000 | \n", "0.894427 | \n", "-0.123573 | \n", "0.549 | \n", "0 | \n", "
| 4 | \n", "2 | \n", "15 | \n", "1 | \n", "1 | \n", "1 | \n", "0.000000 | \n", "0.894427 | \n", "-0.123573 | \n", "0.549 | \n", "0 | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
| 241 | \n", "81 | \n", "15 | \n", "0 | \n", "1 | \n", "1 | \n", "0.000000 | \n", "1.549193 | \n", "0.531193 | \n", "-0.451 | \n", "1 | \n", "
| 242 | \n", "81 | \n", "16 | \n", "0 | \n", "1 | \n", "2 | \n", "0.000000 | \n", "1.549193 | \n", "0.531193 | \n", "-0.451 | \n", "1 | \n", "
| 243 | \n", "82 | \n", "14 | \n", "0 | \n", "0 | \n", "0 | \n", "0.000000 | \n", "2.190890 | \n", "1.172890 | \n", "-0.451 | \n", "1 | \n", "
| 244 | \n", "82 | \n", "15 | \n", "0 | \n", "0 | \n", "1 | \n", "1.414214 | \n", "2.190890 | \n", "1.172890 | \n", "-0.451 | \n", "1 | \n", "
| 245 | \n", "82 | \n", "16 | \n", "0 | \n", "0 | \n", "2 | \n", "1.000000 | \n", "2.190890 | \n", "1.172890 | \n", "-0.451 | \n", "1 | \n", "
246 rows × 10 columns
\n", "\n",
"\n"
],
"text/plain": [
"\n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 5 seconds.\n",
"Sampling: [outcome]\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "5cbf2af8950a4e8c922f36b6eb11ea3e",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Output()"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n"
],
"text/plain": []
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
"\n"
],
"text/plain": [
"\n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n",
"\n",
"\n"
],
"text/plain": [
"| \n", " | mean | \n", "sd | \n", "hdi_3% | \n", "hdi_97% | \n", "mcse_mean | \n", "mcse_sd | \n", "ess_bulk | \n", "ess_tail | \n", "r_hat | \n", "
|---|---|---|---|---|---|---|---|---|---|
| subject_intercept_sigma | \n", "0.766 | \n", "0.084 | \n", "0.604 | \n", "0.917 | \n", "0.002 | \n", "0.001 | \n", "2852.0 | \n", "2579.0 | \n", "1.0 | \n", "
| global_intercept | \n", "0.913 | \n", "0.100 | \n", "0.709 | \n", "1.088 | \n", "0.003 | \n", "0.002 | \n", "1032.0 | \n", "1798.0 | \n", "1.0 | \n", "
| global_sigma | \n", "0.757 | \n", "0.043 | \n", "0.676 | \n", "0.833 | \n", "0.001 | \n", "0.001 | \n", "3442.0 | \n", "2960.0 | \n", "1.0 | \n", "
\n",
"\n"
],
"text/plain": [
"\n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 13 seconds.\n",
"Sampling: [outcome]\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "7b967bcf9c6d477b930e87711730f18e",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Output()"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n"
],
"text/plain": []
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
"\n"
],
"text/plain": [
"\n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n",
"\n",
"\n"
],
"text/plain": [
"| \n", " | mean | \n", "sd | \n", "hdi_3% | \n", "hdi_97% | \n", "mcse_mean | \n", "mcse_sd | \n", "ess_bulk | \n", "ess_tail | \n", "r_hat | \n", "
|---|---|---|---|---|---|---|---|---|---|
| global_intercept | \n", "0.644 | \n", "0.103 | \n", "0.441 | \n", "0.830 | \n", "0.002 | \n", "0.002 | \n", "2100.0 | \n", "2584.0 | \n", "1.00 | \n", "
| global_sigma | \n", "0.612 | \n", "0.045 | \n", "0.526 | \n", "0.696 | \n", "0.001 | \n", "0.001 | \n", "1293.0 | \n", "2791.0 | \n", "1.00 | \n", "
| global_age_beta | \n", "0.271 | \n", "0.062 | \n", "0.158 | \n", "0.389 | \n", "0.001 | \n", "0.001 | \n", "4585.0 | \n", "3257.0 | \n", "1.00 | \n", "
| subject_intercept_sigma | \n", "0.754 | \n", "0.084 | \n", "0.597 | \n", "0.909 | \n", "0.001 | \n", "0.001 | \n", "3178.0 | \n", "3130.0 | \n", "1.00 | \n", "
| subject_age_sigma | \n", "0.347 | \n", "0.069 | \n", "0.217 | \n", "0.477 | \n", "0.003 | \n", "0.002 | \n", "694.0 | \n", "1058.0 | \n", "1.01 | \n", "
\n",
"\n"
],
"text/plain": [
"\n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 16 seconds.\n",
"Sampling: [outcome]\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "376a0aec01ee4fcdaf69f2287919e31c",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Output()"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n"
],
"text/plain": []
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
"\n"
],
"text/plain": [
"\n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n",
"\n",
"\n"
],
"text/plain": [
"| \n", " | median | \n", "mad | \n", "eti_3% | \n", "eti_97% | \n", "mcse_median | \n", "ess_median | \n", "ess_tail | \n", "r_hat | \n", "
|---|---|---|---|---|---|---|---|---|
| global_intercept | \n", "0.320 | \n", "0.087 | \n", "0.088 | \n", "0.565 | \n", "0.004 | \n", "2284.281 | \n", "2552.0 | \n", "1.00 | \n", "
| global_sigma | \n", "0.613 | \n", "0.030 | \n", "0.537 | \n", "0.705 | \n", "0.002 | \n", "1577.315 | \n", "2186.0 | \n", "1.01 | \n", "
| global_age_beta | \n", "0.290 | \n", "0.057 | \n", "0.129 | \n", "0.449 | \n", "0.002 | \n", "2820.643 | \n", "2815.0 | \n", "1.00 | \n", "
| global_coa_age_beta | \n", "-0.046 | \n", "0.081 | \n", "-0.268 | \n", "0.182 | \n", "0.003 | \n", "2861.193 | \n", "2935.0 | \n", "1.00 | \n", "
| subject_intercept_sigma | \n", "0.663 | \n", "0.054 | \n", "0.524 | \n", "0.814 | \n", "0.002 | \n", "2662.958 | \n", "2366.0 | \n", "1.00 | \n", "
| subject_age_sigma | \n", "0.351 | \n", "0.045 | \n", "0.215 | \n", "0.469 | \n", "0.002 | \n", "1046.679 | \n", "752.0 | \n", "1.01 | \n", "
\n",
"\n"
],
"text/plain": [
"\n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 19 seconds.\n",
"Sampling: [outcome]\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "60b3741005a54325b7b221a04f65dbb6",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Output()"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n"
],
"text/plain": []
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
"\n"
],
"text/plain": [
"\n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n",
"\n",
"\n"
],
"text/plain": [
"| \n", " | mean | \n", "sd | \n", "hdi_3% | \n", "hdi_97% | \n", "mcse_mean | \n", "mcse_sd | \n", "ess_bulk | \n", "ess_tail | \n", "r_hat | \n", "
|---|---|---|---|---|---|---|---|---|---|
| global_intercept | \n", "0.395 | \n", "0.112 | \n", "0.177 | \n", "0.601 | \n", "0.002 | \n", "0.002 | \n", "2210.0 | \n", "2599.0 | \n", "1.0 | \n", "
| global_sigma | \n", "0.595 | \n", "0.042 | \n", "0.522 | \n", "0.676 | \n", "0.001 | \n", "0.001 | \n", "1589.0 | \n", "2884.0 | \n", "1.0 | \n", "
| global_age_beta | \n", "0.273 | \n", "0.085 | \n", "0.117 | \n", "0.434 | \n", "0.002 | \n", "0.001 | \n", "2382.0 | \n", "2805.0 | \n", "1.0 | \n", "
| global_coa_age_beta | \n", "-0.007 | \n", "0.128 | \n", "-0.245 | \n", "0.225 | \n", "0.003 | \n", "0.002 | \n", "2364.0 | \n", "2734.0 | \n", "1.0 | \n", "
| global_peer_beta | \n", "0.683 | \n", "0.114 | \n", "0.483 | \n", "0.904 | \n", "0.002 | \n", "0.001 | \n", "3072.0 | \n", "2872.0 | \n", "1.0 | \n", "
| global_peer_age_beta | \n", "-0.146 | \n", "0.090 | \n", "-0.313 | \n", "0.022 | \n", "0.002 | \n", "0.001 | \n", "3080.0 | \n", "2617.0 | \n", "1.0 | \n", "
| subject_intercept_sigma | \n", "0.500 | \n", "0.077 | \n", "0.356 | \n", "0.647 | \n", "0.002 | \n", "0.002 | \n", "1023.0 | \n", "1231.0 | \n", "1.0 | \n", "
| subject_age_sigma | \n", "0.384 | \n", "0.061 | \n", "0.271 | \n", "0.497 | \n", "0.002 | \n", "0.001 | \n", "1080.0 | \n", "1407.0 | \n", "1.0 | \n", "
| \n", " | rank | \n", "elpd_loo | \n", "p_loo | \n", "elpd_diff | \n", "weight | \n", "se | \n", "dse | \n", "warning | \n", "scale | \n", "
|---|---|---|---|---|---|---|---|---|---|
| COA_Peer_Model | \n", "0 | \n", "-280.547108 | \n", "88.317101 | \n", "0.000000 | \n", "0.919592 | \n", "12.286282 | \n", "0.000000 | \n", "True | \n", "log | \n", "
| COA growth Model | \n", "1 | \n", "-289.267096 | \n", "89.708607 | \n", "8.719988 | \n", "0.000000 | \n", "12.723690 | \n", "4.226971 | \n", "True | \n", "log | \n", "
| Unconditional Growth | \n", "2 | \n", "-289.776375 | \n", "91.650347 | \n", "9.229267 | \n", "0.080408 | \n", "12.808780 | \n", "4.784305 | \n", "True | \n", "log | \n", "
| Grand Mean | \n", "3 | \n", "-315.943445 | \n", "58.611975 | \n", "35.396337 | \n", "0.000000 | \n", "12.426100 | \n", "8.431326 | \n", "True | \n", "log | \n", "
\n",
"\n"
],
"text/plain": [
"\n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 12 seconds.\n",
"/home/osvaldo/anaconda3/envs/pymc/lib/python3.11/site-packages/bambi/models.py:858: FutureWarning: 'pps' has been replaced by 'response' and is not going to work in the future\n",
" warnings.warn(\n"
]
},
{
"data": {
"text/html": [
"\n",
" <xarray.Dataset>\n",
"Dimensions: (chain: 4, draw: 1000, id__factor_dim: 82, __obs__: 246)\n",
"Coordinates:\n",
" * chain (chain) int64 0 1 2 3\n",
" * draw (draw) int64 0 1 2 3 4 5 6 ... 993 994 995 996 997 998 999\n",
" * id__factor_dim (id__factor_dim) <U2 '1' '2' '3' '4' ... '80' '81' '82'\n",
" * __obs__ (__obs__) int64 0 1 2 3 4 5 6 ... 240 241 242 243 244 245\n",
"Data variables:\n",
" 1|id (chain, draw, id__factor_dim) float64 0.5433 ... -0.2393\n",
" 1|id_sigma (chain, draw) float64 0.4957 0.577 0.5026 ... 0.4642 0.4398\n",
" Intercept (chain, draw) float64 0.4438 0.3648 ... 0.4879 0.3531\n",
" age_14 (chain, draw) float64 0.3028 0.3717 ... 0.2674 0.1804\n",
" age_14:coa (chain, draw) float64 0.0516 -0.03368 ... 0.04597 0.1363\n",
" age_14:cpeer (chain, draw) float64 -0.1768 -0.1898 ... -0.1679 -0.1466\n",
" age_14|id (chain, draw, id__factor_dim) float64 -0.05741 ... 0.04858\n",
" age_14|id_sigma (chain, draw) float64 0.4117 0.3575 ... 0.4522 0.3759\n",
" coa (chain, draw) float64 0.674 0.7293 0.6766 ... 0.5017 0.6076\n",
" cpeer (chain, draw) float64 0.8057 0.8209 ... 0.6051 0.6673\n",
" sigma (chain, draw) float64 0.5183 0.6357 ... 0.5916 0.6073\n",
" mu (chain, draw, __obs__) float64 1.86 2.113 ... 0.9535 1.011\n",
"Attributes:\n",
" created_at: 2024-08-17T17:08:19.195670+00:00\n",
" arviz_version: 0.20.0.dev0\n",
" inference_library: pymc\n",
" inference_library_version: 5.16.2+20.g747fda319\n",
" sampling_time: 11.552397727966309\n",
" tuning_steps: 1000\n",
" modeling_interface: bambi\n",
" modeling_interface_version: 0.14.0<xarray.Dataset>\n",
"Dimensions: (chain: 4, draw: 1000, __obs__: 246)\n",
"Coordinates:\n",
" * chain (chain) int64 0 1 2 3\n",
" * draw (draw) int64 0 1 2 3 4 5 6 7 8 ... 992 993 994 995 996 997 998 999\n",
" * __obs__ (__obs__) int64 0 1 2 3 4 5 6 7 ... 238 239 240 241 242 243 244 245\n",
"Data variables:\n",
" alcuse (chain, draw, __obs__) float64 1.217 2.066 2.677 ... 1.751 0.3636\n",
"Attributes:\n",
" modeling_interface: bambi\n",
" modeling_interface_version: 0.14.0<xarray.Dataset>\n",
"Dimensions: (chain: 4, draw: 1000)\n",
"Coordinates:\n",
" * chain (chain) int64 0 1 2 3\n",
" * draw (draw) int64 0 1 2 3 4 5 ... 994 995 996 997 998 999\n",
"Data variables: (12/17)\n",
" acceptance_rate (chain, draw) float64 0.6227 0.6105 ... 0.9299 0.6517\n",
" diverging (chain, draw) bool False False False ... False False\n",
" energy (chain, draw) float64 538.9 541.1 ... 553.3 541.8\n",
" energy_error (chain, draw) float64 0.4803 0.632 ... -0.7672 0.3856\n",
" index_in_trajectory (chain, draw) int64 -15 6 -6 -8 -7 10 ... 8 7 6 -11 7\n",
" largest_eigval (chain, draw) float64 nan nan nan nan ... nan nan nan\n",
" ... ...\n",
" process_time_diff (chain, draw) float64 0.003959 0.004457 ... 0.003284\n",
" reached_max_treedepth (chain, draw) bool False False False ... False False\n",
" smallest_eigval (chain, draw) float64 nan nan nan nan ... nan nan nan\n",
" step_size (chain, draw) float64 0.3006 0.3006 ... 0.286 0.286\n",
" step_size_bar (chain, draw) float64 0.2747 0.2747 ... 0.2719 0.2719\n",
" tree_depth (chain, draw) int64 4 4 4 4 4 4 4 4 ... 4 4 4 4 4 4 4\n",
"Attributes:\n",
" created_at: 2024-08-17T17:08:19.205959+00:00\n",
" arviz_version: 0.20.0.dev0\n",
" inference_library: pymc\n",
" inference_library_version: 5.16.2+20.g747fda319\n",
" sampling_time: 11.552397727966309\n",
" tuning_steps: 1000\n",
" modeling_interface: bambi\n",
" modeling_interface_version: 0.14.0<xarray.Dataset>\n",
"Dimensions: (__obs__: 246)\n",
"Coordinates:\n",
" * __obs__ (__obs__) int64 0 1 2 3 4 5 6 7 ... 238 239 240 241 242 243 244 245\n",
"Data variables:\n",
" alcuse (__obs__) float64 1.732 2.0 2.0 0.0 0.0 ... 0.0 0.0 0.0 1.414 1.0\n",
"Attributes:\n",
" created_at: 2024-08-17T17:08:19.209059+00:00\n",
" arviz_version: 0.20.0.dev0\n",
" inference_library: pymc\n",
" inference_library_version: 5.16.2+20.g747fda319\n",
" modeling_interface: bambi\n",
" modeling_interface_version: 0.14.0| \n", " | mean | \n", "sd | \n", "hdi_3% | \n", "hdi_97% | \n", "mcse_mean | \n", "mcse_sd | \n", "ess_bulk | \n", "ess_tail | \n", "r_hat | \n", "
|---|---|---|---|---|---|---|---|---|---|
| Intercept | \n", "0.390 | \n", "0.107 | \n", "0.170 | \n", "0.579 | \n", "0.002 | \n", "0.001 | \n", "2855.0 | \n", "3083.0 | \n", "1.0 | \n", "
| age_14 | \n", "0.277 | \n", "0.084 | \n", "0.123 | \n", "0.438 | \n", "0.002 | \n", "0.001 | \n", "2283.0 | \n", "2994.0 | \n", "1.0 | \n", "
| coa | \n", "0.581 | \n", "0.162 | \n", "0.268 | \n", "0.873 | \n", "0.003 | \n", "0.002 | \n", "2784.0 | \n", "3038.0 | \n", "1.0 | \n", "
| cpeer | \n", "0.692 | \n", "0.116 | \n", "0.480 | \n", "0.914 | \n", "0.002 | \n", "0.002 | \n", "2959.0 | \n", "3040.0 | \n", "1.0 | \n", "
| age_14:coa | \n", "-0.013 | \n", "0.123 | \n", "-0.237 | \n", "0.219 | \n", "0.002 | \n", "0.002 | \n", "2441.0 | \n", "2928.0 | \n", "1.0 | \n", "
| age_14:cpeer | \n", "-0.150 | \n", "0.087 | \n", "-0.303 | \n", "0.020 | \n", "0.002 | \n", "0.001 | \n", "2890.0 | \n", "3103.0 | \n", "1.0 | \n", "
| 1|id_sigma | \n", "0.502 | \n", "0.078 | \n", "0.356 | \n", "0.645 | \n", "0.002 | \n", "0.002 | \n", "1315.0 | \n", "2130.0 | \n", "1.0 | \n", "
| age_14|id_sigma | \n", "0.379 | \n", "0.060 | \n", "0.262 | \n", "0.488 | \n", "0.002 | \n", "0.001 | \n", "1244.0 | \n", "1860.0 | \n", "1.0 | \n", "
| sigma | \n", "0.595 | \n", "0.043 | \n", "0.516 | \n", "0.674 | \n", "0.001 | \n", "0.001 | \n", "1124.0 | \n", "2355.0 | \n", "1.0 | \n", "
| \n", " | ID | \n", "EXTERNAL | \n", "FEMALE | \n", "TIME | \n", "GRADE | \n", "
|---|---|---|---|---|---|
| 0 | \n", "1 | \n", "50 | \n", "0 | \n", "0 | \n", "1 | \n", "
| 1 | \n", "1 | \n", "57 | \n", "0 | \n", "1 | \n", "2 | \n", "
| 2 | \n", "1 | \n", "51 | \n", "0 | \n", "2 | \n", "3 | \n", "
| 3 | \n", "1 | \n", "48 | \n", "0 | \n", "3 | \n", "4 | \n", "
| 4 | \n", "1 | \n", "43 | \n", "0 | \n", "4 | \n", "5 | \n", "
\n",
"\n"
],
"text/plain": [
"\n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 11 seconds.\n",
"Sampling: [outcome]\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "9bd6e5259aaf4b899db2194aa5559cad",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Output()"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n"
],
"text/plain": []
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
"\n"
],
"text/plain": [
"\n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n",
"\n",
"\n"
],
"text/plain": [
"| \n", " | mean | \n", "sd | \n", "hdi_3% | \n", "hdi_97% | \n", "mcse_mean | \n", "mcse_sd | \n", "ess_bulk | \n", "ess_tail | \n", "r_hat | \n", "
|---|---|---|---|---|---|---|---|---|---|
| global_intercept | \n", "7.347 | \n", "0.754 | \n", "5.950 | \n", "8.769 | \n", "0.019 | \n", "0.013 | \n", "1612.0 | \n", "2336.0 | \n", "1.0 | \n", "
| global_sigma | \n", "6.810 | \n", "0.380 | \n", "6.118 | \n", "7.546 | \n", "0.006 | \n", "0.004 | \n", "4726.0 | \n", "2892.0 | \n", "1.0 | \n", "
| subject_intercept_sigma | \n", "6.802 | \n", "0.895 | \n", "5.158 | \n", "8.459 | \n", "0.016 | \n", "0.011 | \n", "3045.0 | \n", "2681.0 | \n", "1.0 | \n", "
\n",
"\n"
],
"text/plain": [
"\n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 33 seconds.\n",
"There were 60 divergences after tuning. Increase `target_accept` or reparameterize.\n",
"The rhat statistic is larger than 1.01 for some parameters. This indicates problems during sampling. See https://arxiv.org/abs/1903.08008 for details\n",
"The effective sample size per chain is smaller than 100 for some parameters. A higher number is needed for reliable rhat and ess computation. See https://arxiv.org/abs/1903.08008 for details\n",
"Sampling: [outcome]\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "56f10460020a44d290fc6759a4f8bd69",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Output()"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n"
],
"text/plain": []
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
"\n"
],
"text/plain": [
"\n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n",
"\n",
"\n"
],
"text/plain": [
"| \n", " | mean | \n", "sd | \n", "hdi_3% | \n", "hdi_97% | \n", "mcse_mean | \n", "mcse_sd | \n", "ess_bulk | \n", "ess_tail | \n", "r_hat | \n", "
|---|---|---|---|---|---|---|---|---|---|
| global_intercept | \n", "5.246 | \n", "0.746 | \n", "3.952 | \n", "6.829 | \n", "0.078 | \n", "0.061 | \n", "143.0 | \n", "2169.0 | \n", "1.09 | \n", "
| global_sigma | \n", "6.875 | \n", "0.279 | \n", "6.343 | \n", "7.456 | \n", "0.004 | \n", "0.003 | \n", "3986.0 | \n", "1885.0 | \n", "1.39 | \n", "
| global_beta_grade | \n", "-0.262 | \n", "0.242 | \n", "-0.673 | \n", "0.253 | \n", "0.028 | \n", "0.020 | \n", "99.0 | \n", "1903.0 | \n", "1.07 | \n", "
| subject_intercept_sigma | \n", "5.320 | \n", "0.887 | \n", "3.416 | \n", "6.789 | \n", "0.133 | \n", "0.095 | \n", "55.0 | \n", "976.0 | \n", "1.06 | \n", "
| subject_beta_grade_sigma | \n", "0.625 | \n", "0.444 | \n", "0.019 | \n", "1.266 | \n", "0.166 | \n", "0.123 | \n", "8.0 | \n", "5.0 | \n", "1.49 | \n", "
\n",
"\n"
],
"text/plain": [
"\n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 79 seconds.\n",
"There were 849 divergences after tuning. Increase `target_accept` or reparameterize.\n",
"The rhat statistic is larger than 1.01 for some parameters. This indicates problems during sampling. See https://arxiv.org/abs/1903.08008 for details\n",
"The effective sample size per chain is smaller than 100 for some parameters. A higher number is needed for reliable rhat and ess computation. See https://arxiv.org/abs/1903.08008 for details\n",
"Sampling: [outcome]\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "5e7e397c5ca64ca8b5bd2bddfdf8c4b0",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Output()"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n"
],
"text/plain": []
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
"\n"
],
"text/plain": [
"\n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n",
"\n",
"\n"
],
"text/plain": [
"| \n", " | mean | \n", "sd | \n", "hdi_3% | \n", "hdi_97% | \n", "mcse_mean | \n", "mcse_sd | \n", "ess_bulk | \n", "ess_tail | \n", "r_hat | \n", "
|---|---|---|---|---|---|---|---|---|---|
| global_intercept | \n", "6.160 | \n", "1.136 | \n", "4.591 | \n", "8.464 | \n", "0.306 | \n", "0.221 | \n", "16.0 | \n", "1359.0 | \n", "1.17 | \n", "
| global_sigma | \n", "6.978 | \n", "0.333 | \n", "6.329 | \n", "7.650 | \n", "0.013 | \n", "0.010 | \n", "809.0 | \n", "1315.0 | \n", "1.53 | \n", "
| global_beta_grade | \n", "-0.117 | \n", "0.591 | \n", "-1.328 | \n", "0.958 | \n", "0.040 | \n", "0.065 | \n", "590.0 | \n", "1156.0 | \n", "1.53 | \n", "
| global_beta_grade2 | \n", "0.071 | \n", "0.093 | \n", "-0.118 | \n", "0.246 | \n", "0.006 | \n", "0.004 | \n", "445.0 | \n", "1403.0 | \n", "1.53 | \n", "
| subject_intercept_sigma | \n", "1.174 | \n", "1.173 | \n", "0.010 | \n", "3.273 | \n", "0.424 | \n", "0.312 | \n", "6.0 | \n", "4.0 | \n", "1.79 | \n", "
| subject_beta_grade_sigma | \n", "1.337 | \n", "0.293 | \n", "0.745 | \n", "1.731 | \n", "0.083 | \n", "0.063 | \n", "13.0 | \n", "393.0 | \n", "1.22 | \n", "
| subject_beta_grade2_sigma | \n", "0.081 | \n", "0.076 | \n", "0.001 | \n", "0.212 | \n", "0.027 | \n", "0.020 | \n", "6.0 | \n", "4.0 | \n", "1.67 | \n", "
\n",
"\n"
],
"text/plain": [
"\n"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"id_indx, unique_ids = pd.factorize(df_external[\"ID\"])\n",
"coords = {\"ids\": unique_ids, \"obs\": range(len(df_external[\"EXTERNAL\"]))}\n",
"with pm.Model(coords=coords) as model:\n",
" grade = pm.MutableData(\"grade_data\", df_external[\"GRADE\"].values)\n",
" grade2 = pm.MutableData(\"grade2_data\", df_external[\"GRADE\"].values ** 2)\n",
" grade3 = pm.MutableData(\"grade3_data\", df_external[\"GRADE\"].values ** 3)\n",
" external = pm.MutableData(\"external_data\", df_external[\"EXTERNAL\"].values + 1e-25)\n",
" female = pm.MutableData(\"female_data\", df_external[\"FEMALE\"].values)\n",
"\n",
" global_intercept = pm.Normal(\"global_intercept\", 6, 2)\n",
" global_sigma = pm.Normal(\"global_sigma\", 7, 1)\n",
" global_beta_grade = pm.Normal(\"global_beta_grade\", 0, 1)\n",
" global_beta_grade2 = pm.Normal(\"global_beta_grade2\", 0, 1)\n",
" global_beta_grade3 = pm.Normal(\"global_beta_grade3\", 0, 1)\n",
" global_beta_female = pm.Normal(\"global_beta_female\", 0, 1)\n",
" global_beta_female_grade = pm.Normal(\"global_beta_female_grade\", 0, 1)\n",
" global_beta_female_grade2 = pm.Normal(\"global_beta_female_grade2\", 0, 1)\n",
" global_beta_female_grade3 = pm.Normal(\"global_beta_female_grade3\", 0, 1)\n",
"\n",
" subject_intercept_sigma = pm.Exponential(\"subject_intercept_sigma\", 1)\n",
" subject_intercept = pm.Normal(\"subject_intercept\", 3, subject_intercept_sigma, dims=\"ids\")\n",
"\n",
" subject_beta_grade_sigma = pm.Exponential(\"subject_beta_grade_sigma\", 1)\n",
" subject_beta_grade = pm.Normal(\"subject_beta_grade\", 0, subject_beta_grade_sigma, dims=\"ids\")\n",
"\n",
" subject_beta_grade2_sigma = pm.Exponential(\"subject_beta_grade2_sigma\", 1)\n",
" subject_beta_grade2 = pm.Normal(\"subject_beta_grade2\", 0, subject_beta_grade2_sigma, dims=\"ids\")\n",
"\n",
" subject_beta_grade3_sigma = pm.Exponential(\"subject_beta_grade3_sigma\", 1)\n",
" subject_beta_grade3 = pm.Normal(\"subject_beta_grade3\", 0, subject_beta_grade3_sigma, dims=\"ids\")\n",
"\n",
" mu = pm.Deterministic(\n",
" \"growth_model\",\n",
" global_intercept\n",
" + subject_intercept[id_indx]\n",
" + global_beta_female * female\n",
" + global_beta_female_grade * (female * grade)\n",
" + global_beta_female_grade2 * (female * grade2)\n",
" + global_beta_female_grade3 * (female * grade3)\n",
" + (global_beta_grade + subject_beta_grade[id_indx]) * grade\n",
" + (global_beta_grade2 + subject_beta_grade2[id_indx]) * grade2\n",
" + (global_beta_grade3 + subject_beta_grade3[id_indx]) * grade3,\n",
" )\n",
"\n",
" outcome_latent = pm.Gumbel.dist(mu, global_sigma)\n",
" outcome = pm.Censored(\n",
" \"outcome\", outcome_latent, lower=0, upper=68, observed=external, dims=\"obs\"\n",
" )\n",
"\n",
" idata_m7 = pm.sample_prior_predictive()\n",
" idata_m7.extend(\n",
" pm.sample(\n",
" draws=5000,\n",
" random_seed=100,\n",
" target_accept=0.99,\n",
" nuts_sampler=\"numpyro\",\n",
" idata_kwargs={\"log_likelihood\": True},\n",
" )\n",
" )\n",
" idata_m7.extend(pm.sample_posterior_predictive(idata_m7))"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n",
"\n",
"\n"
],
"text/plain": [
"| \n", " | mean | \n", "sd | \n", "hdi_3% | \n", "hdi_97% | \n", "mcse_mean | \n", "mcse_sd | \n", "ess_bulk | \n", "ess_tail | \n", "r_hat | \n", "
|---|---|---|---|---|---|---|---|---|---|
| global_intercept | \n", "6.557 | \n", "1.297 | \n", "4.027 | \n", "8.916 | \n", "0.016 | \n", "0.011 | \n", "6847.0 | \n", "10276.0 | \n", "1.00 | \n", "
| global_sigma | \n", "6.633 | \n", "0.408 | \n", "5.871 | \n", "7.393 | \n", "0.010 | \n", "0.007 | \n", "1777.0 | \n", "8061.0 | \n", "1.00 | \n", "
| global_beta_grade | \n", "0.009 | \n", "0.878 | \n", "-1.577 | \n", "1.735 | \n", "0.011 | \n", "0.008 | \n", "6155.0 | \n", "9098.0 | \n", "1.00 | \n", "
| global_beta_grade2 | \n", "-0.139 | \n", "0.390 | \n", "-0.840 | \n", "0.621 | \n", "0.006 | \n", "0.004 | \n", "4503.0 | \n", "7911.0 | \n", "1.00 | \n", "
| subject_intercept_sigma | \n", "5.786 | \n", "1.239 | \n", "3.392 | \n", "8.144 | \n", "0.054 | \n", "0.038 | \n", "573.0 | \n", "550.0 | \n", "1.01 | \n", "
| subject_beta_grade_sigma | \n", "0.552 | \n", "0.379 | \n", "0.003 | \n", "1.218 | \n", "0.047 | \n", "0.033 | \n", "50.0 | \n", "166.0 | \n", "1.07 | \n", "
| subject_beta_grade2_sigma | \n", "0.098 | \n", "0.072 | \n", "0.001 | \n", "0.220 | \n", "0.009 | \n", "0.006 | \n", "60.0 | \n", "171.0 | \n", "1.07 | \n", "
| subject_beta_grade3_sigma | \n", "0.019 | \n", "0.013 | \n", "0.000 | \n", "0.042 | \n", "0.001 | \n", "0.001 | \n", "131.0 | \n", "111.0 | \n", "1.05 | \n", "
| global_beta_female | \n", "-0.212 | \n", "0.940 | \n", "-1.968 | \n", "1.573 | \n", "0.009 | \n", "0.007 | \n", "9904.0 | \n", "10717.0 | \n", "1.00 | \n", "
| global_beta_female_grade | \n", "-0.132 | \n", "0.898 | \n", "-1.804 | \n", "1.565 | \n", "0.010 | \n", "0.007 | \n", "7308.0 | \n", "10362.0 | \n", "1.00 | \n", "
| global_beta_female_grade2 | \n", "0.020 | \n", "0.495 | \n", "-0.903 | \n", "0.958 | \n", "0.007 | \n", "0.005 | \n", "4831.0 | \n", "7979.0 | \n", "1.00 | \n", "
| global_beta_female_grade3 | \n", "-0.008 | \n", "0.071 | \n", "-0.141 | \n", "0.124 | \n", "0.001 | \n", "0.001 | \n", "4882.0 | \n", "8440.0 | \n", "1.00 | \n", "
| \n", " | rank | \n", "elpd_loo | \n", "p_loo | \n", "elpd_diff | \n", "weight | \n", "se | \n", "dse | \n", "warning | \n", "scale | \n", "
|---|---|---|---|---|---|---|---|---|---|
| Linear Model | \n", "0 | \n", "-913.492285 | \n", "32.270139 | \n", "0.000000 | \n", "8.146420e-01 | \n", "13.556027 | \n", "0.000000 | \n", "True | \n", "log | \n", "
| Polynomial + Gender | \n", "1 | \n", "-915.895242 | \n", "49.043791 | \n", "2.402957 | \n", "8.394507e-16 | \n", "13.449388 | \n", "2.597256 | \n", "True | \n", "log | \n", "
| Minimal Model | \n", "2 | \n", "-917.384833 | \n", "33.632784 | \n", "3.892548 | \n", "7.475052e-16 | \n", "13.823280 | \n", "2.375968 | \n", "True | \n", "log | \n", "
| Polynomial Model | \n", "3 | \n", "-921.859743 | \n", "27.035846 | \n", "8.367458 | \n", "1.853580e-01 | \n", "14.762473 | \n", "5.305436 | \n", "True | \n", "log | \n", "