#Blogpost: Intro to R Workshop.

On Thursday, October 11, 2024, at about 6 pm, I attended an insightful introduction to R workshop organized by Zachary Lloyd and Chen Zou, fellows at the Graduate Center Digital Initiatives (GCDI). The session was aimed at beginners and focused on the essentials of using R, an open-source programming language widely utilized for statistical analysis, data transformation, survey analysis, Machine Learning, etc.

While the w/shop was ongoing, I took minutes of the session, noting down highlights which made me miss some initial concepts in setting up the software for practice. I privately chatted with Chen during the workshop who was truly patient in carrying me along irrespective of my slow-paced learning curve of the tool. We were asked to create a posit account, which enabled us to access R on a cloud for practice.

The session began by distinguishing between R and RStudio, where R serves as the engine – the core programming language, while RStudio serves as the user-friendly interface (UI) where coding, data visualization, and manipulation take place that allows users to interact with R seamlessly. The instructor also introduced foundational concepts like Boolean operations _{(1/0, TRUE/FALSE, YES/NO)}, arithmetic functions _{(+,-, /,*),} and vectors – which she mentioned are lists of items or variables of the same type. We also learned basic R syntax such as how to assign values to variables using the “<-” operator and how logical operations like `==` (equal to) and `!=` (not equal to) work in R.

One of the key takeaways was the instructor’s reiteration of the importance of practicing typing out codes instead of often ~~falling for the temptation of copying and pasting~~, as it helps build muscle memory. The workshop also emphasized the value of making mistakes, which is a major part of the learning process in programming. As she mentioned, “Every programming language takes time to understand. It just takes consistent practice to become the best at it”.

We explored libraries like Tidyverse, a collection of packages designed to simplify data wrangling and visualization in R. Functions such as `view()` and `head()` were demonstrated for examining data, with the former opening a new page in RStudio and the latter displaying default the first six rows of a dataset in the console.

After being shown several examples, we did some practice exercises that enabled us to get a glimpse of how R functions. Most of my colleagues in the workshop seemed to be getting it, as there were multiple correspondences on the Zoom chat bar, which also meant that the workshop was hands-on.

On a final note, the workshop was a good introduction to R, emphasizing the importance of continuous practice and exploration. Whether for data visualization, machine learning, or statistical analysis, R offers a flexible and powerful toolset for data professionals.

Interestingly, the workshop ended at 8:04 pm with a Google digital evaluation form, which made me reflect on the increasing importance of digital documentation and feedback within digital humanities. As a potential digital humanist, every evaluation or rating not completed on a digital platform might as well be considered non-existent, reinforcing the role of digital tools in modern academic and research processes.

– Kelechi Iwuagwu (A Data Analysis & Viz CUNY GradCenter Candidate).

Intro to R Workshop

I attended the “Intro to R” workshop led by Chen Zhou and it was interesting to see, as a complete beginner how R is used. We discussed some of the top applications it’s used for which included statistical analyses, visualization, and data transformation. Our main program was the online version of RStudio on Posit Cloud. We also got to see how Chen is currently using R for a project she’s working on which addresses the difference between how men vs women pronounce vowels.

I did appreciate doing some basic exercises just to get familiar and practice with the syntax. We looked at basic functions and the use cases they could be applied to. Some of the syntax reminded me of other coding languages but other parts of it were completely new to me. For example, using pull and pipe to get the maximum value for a particular column in a dataset. Truthfully, unless it ends up being required for work or something equally important, I don’t plan on practicing R on my own time. I was more interested in seeing how it could be used for data viz, but we didn’t get that far since the workshop was only two hours and the majority of it was spent doing the exercises and discussing the answers. I would’ve loved to see how it compares to other languages like d3.js or Seaborn in Python.

Overall, I liked the introduction to the coding language but wish we had more time to explore the data visualization side of it.

R workshop

Reflecting on “The Possibly Impossible Research Project”, while embracing Failure for intellectual progress.

In “The Possibly Impossible Research Project,” Rebekah Fitzsimmons and Suzan Alteri introduced first-year students at Georgia Tech to archival research focused on forgotten women authors of science books from the Romantic and Victorian (1770s – 1830s) periods. This collaboration with the Baldwin Library of Historical Children’s Literature aimed to expose students to the systemic marginalization of women in the sciences and challenge the idea that women were historically uninvolved in STEM. At an institute that admitted female students into regular classes only in 1952, this feminist-oriented assignment provided an eye-opening opportunity for students to research the overlooked contributions of women to scientific advancement.

Students encountered numerous frustrations as they struggled to locate biographical information for the assigned authors. The challenge reflected real-world research setbacks, pushing students to ask why so little documentation existed about these intellectual lionesses. Their frustrations often became evident through class discussions and social media exchanges. Alongside conventional research methods, students used digital tools like Twitter and WordPress to document their process, fostering collaboration and building professional research skills.

Through this process, students contributed to the recovery of forgotten histories, shedding light on how women’s writing in science education shaped scientific discourse. This initiative illustrates the value of archival research in undergraduate education, promoting both discovery and the importance of documenting the research journey, even when it’s incomplete.

Moreover, Fitzsimmons’ varying approach challenged the conventional classroom dynamic, where students expect professors to have all the answers. This new model allowed students to grapple with failure and setbacks, learning to persist through “brick walls” and to explore creative solutions—valuable lessons that they will carry into future research and professional endeavors.

Protected: History and the Archive Response

Pan Dulce, Chicana, & Possibly Impossible: Guiding Science

The Guiding Science project and Chicana Por Mi Raiz are two more examples of Data Feminism. It is crucial to use this lens in creating archives to break away from the masculine lens of data collection. The Guiding Science project was interesting in the practice of failing, running into problems and documenting those issues for other researchers to see, interpret and potentially solve. In this nature, data collection becomes very collaborative even outside the scope of the initial staff on projects. What draws me to the project is not just the objective of documenting female authors and their contributions to scientific community, but also the effort to compile biographical information. This key point in the projects research shows a clear connection to Chicana Por Mi Raiz. Although the history and research methods differs from books and biographical information found online in Guiding Science to the collection of oral history via interviews in Chicana Por Mi Raiz, a stark similarity is the effort to recover the story and life of these two groups of women and reclaiming their history.

The Guiding Science Project collects information on women posthumously. Since woman of that time faced many barriers and restraints to publize their work, about 40% of writers were difficult to track. Although, Chicana Por Mi Raiz does not face this issue, in Pan Dulce, a question was posed on how they would proceed with their archive once the women in the project have passed. Again, both projects are very different, but share a similar problem on working with a person’s data or ‘digital life’ in the physical after-life.

The Guiding Science Project is embracing the problem, allowing future researchers, scholars, and students to tackle the issue of missing data, documenting their process and preparing them to make mistakes. Dr. Maria Cotera states it is an issue that ‘keeps her up at night’ and is one of the limitations of an autonomous projects.It will be interesting to see how Chicana Por Mi Raiz, navigates through this issue and how they will continue their collection for a digital life of Chicana History in the physical after life.

History and the Archive Response

In this week’s readings about history and the archives, similar themes emerge that contribute to my ongoing understanding of the Digital Humanities. Like data and maps, history and archives are not neutral. Rather they exist within systems of power and inequality. Additionally, they should not be stagnant preservations of the past. Rather, they should aim to be built through communities, embodied, and connect the past with the present.

In Pan Dulce: Breaking Bread with the Past, Maria Cotera’s mention of generations really stood out to me. Three representational generations were present during the ethnographic interviews (veteranas, ‘daughter’, and students). Through traditions and oral histories, this archive went from something stagnant to something living and transitional, moving from the past and into the present. Johnson highlights the uses of the Trans-Atlantic Slave Trade Database such as school lesson plans and genealogical pursuits. The data became an open access resource, allowing students to connect with the history and communities to connect with their own identity and experience of the world. Christian-Anderson discusses Murkutu CMS that aims to center Indigenous communities’ ways of producing knowledge and values around access. They write about protocols, rules about access and use of knowledge, that can be written by the communities themselves. This aims to contrast the history of colonialist collection in which Indigenous communities are marginalized; made to seem savage and on the verge of extinction, and an object of the colonist’s expertise.

The Possibly Impossible research project was a very interesting example of constructionist teaching practices. The students involved in this process actively built their understandings through exploration, with their teacher as a collaborator rather than an authority. This sense of ‘play’ is another recurring theme in my understanding of Digital Humanities. Through social media and email correspondence with historians and others in the field, students in the project made online and in person connections. This brought the lives of those they researched into the present, through memes and community.

Depression and Gender Identity Data Visualization

Carol Harris

In the article “Against Cleaning” Katie Rawson and Trevor Munoz makes that point that what is seen as a substantial part of the process of acquiring knowledge through data i.e. the “cleaning” of the data has implications for how we present the data and what we are able to ascertain from it. This is particularly important when it comes to sensitive data that gives us insight into the psychological and mental states of individuals particularly individuals who are struggling with affirming their identity in a society that is often hostile at times. It is with from this backdrop that I decide to look into the mental health experiences and struggles for Transgender individuals. The data used in this analysis came from the Household Pulse Survey which was conducted by the Census Bureau during COVID. The survey is conducted online from a random sample of households and measures of anxiety, depression and a combination of both are tabulated for the individuals completing the survey. The results are weighted in accordance with the demographic breakdown of the US population. For my analysis I focus on depressive symptoms and I looked at three groups. Individuals who were identified as male or female at birth and transgender. The first graph is the overall with all three groups shown in accordance with the percent of depressive symptoms they experienced during the time period. Males and females have similar rates of depressive symptoms with females being slightly higher than males. The transgender group showed the highest depressive symptoms for the the ten month period in 2023. It is important to point out that the data was collected in one week waves and then the average was tabulated for the month. While the line graph looked identical for the male and female the transgender graph had more peaks and troughs. In other words it is less stable and more erratic during this period. It is also important to point out that when it comes to identity people sometimes occupy several different identities. So while this is a snapshot of sexual identity, other identities such as race and ethnicity may also be a contributing factor. This gets to why this type of data is so hard to separate and pull apart. The way the data is collected I had to basically strip away other identities and focus on a single identity which is not a realistic reflection on how people interact in their daily lives.

Workshop: How Do You DH?

I attended a workshop called How Do You DH? that was kind of a “lunch and learn” session put on by two digital fellows on 10/7/24. I attended this session because I thought it would be helpful as I am thinking about our impending DH project proposals. I am most interested in text analysis so thus far I have been thinking of how I can do a project using that DH methodology. But, since I don’t know all that much about text analysis it has been hard to come up with a project idea. The workshop helped me to think about what research question(s) I might want to ask and then presented an array of potential DH tools to choose from in order to answer that question.

I appreciated the high level overview of DH , which was laid out really clearly and in a straightforward way. I am still struggling to conceptualize exactly what DH is and especially struggle when trying to explain it to the people in my life. The facilitators had us break into small groups to look at a DH project and assess what the format of the project was (Digital Methods Approach, Traditional Monograph + Digital Project Approach, or Wholly Digital Approach), what DH methodology was used, and what tools were used to create the project. Again, this just helped me to conceptualize what will go into my own DH project and also to better understand how DH is translated into the “real world”.

This was the first workshop I attended and I am eager to go to more!

Introduction to Digital Humanities Fall 2024

thinking, writing, and reading digitally