Show simple item record

dc.contributor.authorMowbray, Max; orcid: 0000-0003-1398-0469; email: max.mowbray@manchester.ac.uk
dc.contributor.authorSmith, Robin; email: robin.smith@manchester.ac.uk
dc.contributor.authorDel Rio‐Chanona, Ehecatl A.; email: a.del-rio-chanona@imperial.ac.uk
dc.contributor.authorZhang, Dongda; orcid: 0000-0001-5956-4618; email: dongda.zhang@manchester.ac.uk
dc.date.accessioned2021-05-20T16:38:38Z
dc.date.available2021-05-20T16:38:38Z
dc.date.issued2021-05-15
dc.date.submitted2020-10-04
dc.identifierhttps://chesterrep.openrepository.com/bitstream/handle/10034/624580/aic.17306.xml?sequence=2
dc.identifierhttps://chesterrep.openrepository.com/bitstream/handle/10034/624580/aic.17306.pdf?sequence=3
dc.identifier.citationAIChE Journal, page e17306
dc.identifier.urihttp://hdl.handle.net/10034/624580
dc.descriptionFrom Wiley via Jisc Publications Router
dc.descriptionHistory: received 2020-10-04, rev-recd 2021-04-23, accepted 2021-05-03, pub-electronic 2021-05-15
dc.descriptionArticle version: VoR
dc.descriptionPublication status: Published
dc.description.abstractAbstract: Reinforcement learning (RL) is a data‐driven approach to synthesizing an optimal control policy. A barrier to wide implementation of RL‐based controllers is its data‐hungry nature during online training and its inability to extract useful information from human operator and historical process operation data. Here, we present a two‐step framework to resolve this challenge. First, we employ apprenticeship learning via inverse RL to analyze historical process data for synchronous identification of a reward function and parameterization of the control policy. This is conducted offline. Second, the parameterization is improved online efficiently under the ongoing process via RL within only a few iterations. Significant advantages of this framework include to allow for the hot‐start of RL algorithms for process optimal control, and robust abstraction of existing controllers and control knowledge from data. The framework is demonstrated on three case studies, showing its potential for chemical process control.
dc.languageen
dc.publisherJohn Wiley & Sons, Inc.
dc.rightsLicence for VoR version of this article: http://creativecommons.org/licenses/by/4.0/
dc.sourceissn: 0001-1541
dc.sourceissn: 1547-5905
dc.subjectPROCESS SYSTEMS ENGINEERING
dc.subjectapprenticeship learning
dc.subjectinverse reinforcement learning
dc.subjectmachine learning
dc.subjectoptimal control
dc.subjectreinforcement learning
dc.titleUsing process data to generate an optimal control policy via apprenticeship and reinforcement learning
dc.typearticle
dc.date.updated2021-05-20T16:38:37Z
dc.date.accepted2021-05-03


Files in this item

Thumbnail
Name:
aic.17306.xml
Size:
7.245Kb
Format:
XML
Thumbnail
Name:
aic.17306.pdf
Size:
3.595Mb
Format:
PDF

This item appears in the following Collection(s)

Show simple item record