Cannabis Ruderalis

Back PopoDameron • Reinforcement learning from human feedback • en.wikipedia.org

Top edits to an article

JSON

All edits made to a page by one user, in chronological order.

Article	Reinforcement learning from human feedback (Log · Page History)
User	PopoDameron (Edit Counter· Top Edits)
Total edits	91
Minor edits	14 (15.4%)
(Semi-)automated edits	5 (5.5%)
Reverted edits	0 (0%)
atbe¹	4.3
Added (bytes)²	35,350
Deleted (bytes)	-2,613

Minor edits · 14 (15.4%)

Major edits · 77 (84.6%)

(Semi-)automated edits · 5 (5.5%)

Manual edits · 86 (94.5%)

Reverted edits · 0 (0%)

Unreverted edits · 91 (100%)

¹ Average time between edits (days)

² Added text is any positive addition that wasn't reverted (approximate)

Date	Links	Size	Edit summary
2024-04-01 20:21	Diff · History	1,055	fixing minor things and adding clarifications
2024-03-31 02:12	Diff · History	156	clarification
2024-03-31 02:06	Diff · History	-81	not the first time the term was used
2024-03-30 21:55	Diff · History	12	ziegler 19 also continues text
2024-03-30 21:53	Diff · History	36	first rlhf
2024-03-28 03:22	Diff · History	5	ce
2024-03-28 03:21	Diff · History	3	ce
2024-03-28 03:19	Diff · History	0	ce
2024-03-28 03:19	Diff · History	426	background
2024-03-28 02:16	Diff · History	10	ce
2024-03-28 02:15	Diff · History	344	simplification and delineation
2024-03-28 02:08	Diff · History	-249	ce
2024-03-28 02:07	Diff · History	1,080	added new source and discussion
2024-03-28 01:57	Diff · History	3	ce
2024-03-28 01:56	Diff · History	-13	added limitations
2024-03-28 01:26	Diff · History	136	added a bit of detail
2024-03-28 01:20	Diff · History	-148	remove weakly sourced statement and explain overfitting
2024-03-27 22:56	Diff · History	1,354	improved training section
2024-03-27 05:55	Diff · History	188	→Applications: stability clarification
2024-03-27 05:49	Diff · History	36	clarity
2024-03-27 05:46	Diff · History	842	more on video game bots
2024-03-27 05:18	Diff · History	642	about the amount of comparison data
2024-03-27 04:56	Diff · History	140	delineation
2024-03-27 04:45	Diff · History	-1	ce
2024-03-27 04:39	Diff · History	445	→Collecting human feedback: layperson explanation
2024-03-27 04:18	Diff · History	221	suggested clarifications
2024-03-27 03:47	Diff · History	316	clarity
2024-03-27 03:26	Diff · History	1,047	change to sourced example
2024-03-26 23:01	Diff · History	79	improved lede
2024-03-24 17:55	Diff · History	0	moved sources
2024-03-15 04:33	Diff · History	-11	ce
2024-03-15 04:25	Diff · History	3	ce
2024-03-15 04:18	Diff · History	0	ugly split infinitive
2024-03-15 04:15	Diff · History	97	ce, links, and switched online & offline because the latter is more important/common
2024-03-15 02:34	Diff · History	1	ce
2024-03-15 02:31	Diff · History	2	ce
2024-03-15 02:25	Diff · History	0	ce
2024-03-15 02:24	Diff · History	2	ce
2024-03-15 02:23	Diff · History	4	ce
2024-03-15 02:12	Diff · History	771	added limitation summary to the lede
2024-03-15 01:59	Diff · History	410	made lede more accessible and moved some things around
2024-03-15 00:10	Diff · History	0	change figure placement again. makes more sense here for phones
2024-03-14 04:27	Diff · History	30	added template
2024-03-14 04:24	Diff · History	6	wrong cite templates
2024-03-14 04:23	Diff · History	103	added overview diagram
2024-03-14 01:51	Diff · History	-19	remove template
2024-03-14 01:51	Diff · History	897	wrapping up
2024-03-14 01:33	Diff · History	1,711	started RL policy training. just need to add the second term and probably do some ce
2024-03-14 00:31	Diff · History	-16	in use
2024-03-12 02:56	Diff · History	16	to be continued later
2024-03-12 02:54	Diff · History	-30	→Training: ce
2024-03-12 02:53	Diff · History	-8	ce
2024-03-12 02:50	Diff · History	1,013	finished reward model training. next: training the policy using the RM
2024-03-12 01:57	Diff · History	-16	in use
2024-03-10 23:49	Diff · History	1,154	started training section. still incomplete and needs a lot more on the reward model, plus haven't started the actual policy training
2024-03-10 22:23	Diff · History	421	added another good source
2024-03-10 21:58	Diff · History	191	clarity and ce
2024-03-10 21:44	Diff · History	751	explained online vs offline distinction
2024-03-08 08:48	Diff · History	-422	Undid revision 1212481439 by Aldopacchiano (talk) citation is not relevant + WP:SELFCITE
2024-03-06 21:26	Diff · History	307	→Applications: +claude
2024-03-05 19:43	Diff · History	-29	→See also: already in the lede, +alphabetical
2024-03-01 00:31	Diff · History	2,169	added CV applications. will add more contextual technical detail soon
2024-02-29 22:47	Diff · History	333	added gemini to applications
2024-02-29 18:20	Diff · History	320	fixing up nlp applications
2024-02-28 18:04	Diff · History	-4	Undid revision 1210766352 by Ibnu Fulan (talk) I don't see enough evidence that this would be notable enough for an article
2024-02-28 18:04	Diff · History	771	mostly ce
2024-02-26 02:09	Diff · History	2,121	improved motivation
2024-02-21 23:46	Diff · History	122	improved alternatives section
2024-02-21 23:29	Diff · History	280	→Collecting human feedback: improving section based on paper
2024-02-21 22:59	Diff · History	79	ce & sections
2024-02-21 22:48	Diff · History	329	clarifying and improving lede
2024-02-21 21:08	Diff · History	21	ml template param
2024-02-21 21:08	Diff · History	21	add machine learning template
2024-01-11 16:00	Diff · History	-20	acronym is fine since it was already expanded above + capitalization
2023-07-17 03:29	Diff · History	-1,132	cleanup and removing some unsourced examples etc
2023-07-17 03:16	Diff · History	-324	This might be simple, but it is not technically accurate
2023-03-30 00:08	Diff · History	244	I believe that this should make things accessible enough to a more casual reader, but feel free to readd the tag upon disagreement
2023-03-29 22:58	Diff · History	-61	Oh, I see
2023-03-29 22:06	Diff · History	22	wikilink
2023-03-29 22:04	Diff · History	-29	I can't imagine how someone might misinterpret this sentence. How could it be any more direct? (assuming at least very basic RL knowledge, of course)
2023-03-06 19:53	Diff · History	9	clarified reward model
2023-03-06 07:19	Diff · History	56	other common name
2023-03-04 17:59	Diff · History	2	Cleaned up using AutoEd
2023-03-04 07:50	Diff · History	313	some elaboration on NLP difficulties
2023-03-04 07:37	Diff · History	166	minor improvements
2023-03-04 07:27	Diff · History	0
2023-03-04 01:24	Diff · History	55	added see also section
2023-03-04 01:22	Diff · History	104	+Category:Reinforcement learning; +Category:Language modeling; +Category:Artificial intelligence using HotCat
2023-03-04 01:20	Diff · History	31	added Category:Machine learning using HotCat
2023-03-04 01:18	Diff · History	49	Adding short description: "Machine learning technique"
2023-03-04 01:18	Diff · History	11,267	started article on RLHF. missing more technical details, which I plan to work on soon.

All times are in UTC.

THC Science

Bringing Science to the Cannabis Conversation!

Top edits to an article

JSON

All edits made to a page by one user, in chronological order.

Leave a Reply